The Android keyboard dictionaries are (at the time of writing, August '10) embedded in the LatinIME.apk - The binary files live in the /res/ directory; the LatinIME app path on your phone is /system/app/LatinIME.apk and the source lives at /packages/inputmethods/LatinIME/ of the Android source tree. AOSP doesnt seem to contain any dict files, CyanogenMod has them at /vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/
More about this in this blog post: Where are the Keyboard Dictionaries in #Android?
Creating new dictionaries for LatinIME is described in this blog post: Creating Dutch dictionary for android
Check out [HOWTO] Create Gingerbread Keyboard Dictionary for your Language (Croatian example) by Drazen Navratil on XDA for a croatian dict, different source and a windows approach to it all!
This is the script that analyses wads of text and pours it into a weighed xml file. Code from Softkeyboard, contributed there by Jacob Nordfalk
bzcat archive.bz2 | grep -v '<[a-z]*\s' | grep -v '&[a-z0-9]*;' | tr '[:punct:][:blank:][:digit:]' '\n' | tr 'A-Z' 'a-z' | tr 'ÆØÅŜĴĤĜŬ' 'æøåŝĵĥĝŭ' | uniq | sort -f | uniq -c | sort -nr | head -50000 | tail -n +2 | awk '{print "<w f=\""$1"\">"$2"</w>"}' > dict.xml
The post uses the following perl script.
reweigh.pl
# Reweigh the words
# Sample: <w f="8671269">de</w>
# Source: from.txt
# Output to term
# Biggest value = # of lines.
# Divide this by 255 and round up
# Divide all values (lines left in the list) by that number and round down.
# All values should now be between 0 and 254.
# Open original file
open FILE, "from.txt" or die $!;
my $count=0;
# Count the # of lines
while (<FILE>) {
$count++;
}
# Calculate the divider to ensure results between 0 and 254
my $divider = int( $count / 255) +1 ;
# Re-open the source file and update the weight
open FILE, "from.txt" or die $!;
while (my $line = <FILE>) {
$count--;
# Replace the weight if its a word line,
# otherwise print without actions
if ($line =~ /<w f=/) {
my $weighed = int( $count / $divider);
$line =~ s/".*"/"$weighed"/g;
}
# print the line
print $line;
}
close FILE;

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
Use this code at your own peril; I am not responsible for anything that happens to you or your devices. You re a big boy or girl, please do some research before using it if you have any concerns!