Android Dictionary Files

  • Language: Perl
  • Written: 2010

The Android keyboard dictionaries are (at the time of writing, August '10) embedded in the LatinIME.apk - The binary files live in the /res/ directory; the LatinIME app path on your phone is /system/app/LatinIME.apk and the source lives at /packages/inputmethods/LatinIME/ of the Android source tree. AOSP doesnt seem to contain any dict files, CyanogenMod has them at /vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/
More about this in this blog post: Where are the Keyboard Dictionaries in #Android?

Creating new dictionaries for LatinIME is described in this blog post: Creating Dutch dictionary for android

Check out [HOWTO] Create Gingerbread Keyboard Dictionary for your Language (Croatian example) by Drazen Navratil on XDA for a croatian dict, different source and a windows approach to it all!

This is the script that analyses wads of text and pours it into a weighed xml file. Code from Softkeyboard, contributed there by Jacob Nordfalk

bzcat archive.bz2 | grep -v '<[a-z]*\s' | grep -v '&[a-z0-9]*;' | tr '[:punct:][:blank:][:digit:]' '\n' | tr 'A-Z' 'a-z' | tr 'ÆØÅŜĴĤĜŬ' 'æøåŝĵĥĝŭ' | uniq | sort -f | uniq -c | sort -nr | head -50000 | tail -n +2 | awk '{print "<w f=\""$1"\">"$2"</w>"}'  > dict.xml

The post uses the following perl script.
reweigh.pl

# Reweigh the words
# Sample: <w f="8671269">de</w>
# Source: from.txt
# Output to term

# Biggest value = # of lines.
# Divide this by 255 and round up
# Divide all values (lines left in the list) by that number and round down.
# All values should now be between 0 and 254.


# Open original file
open FILE, "from.txt" or die $!;
my $count=0;

# Count the # of lines
while (<FILE>) {
        $count++;
        }

# Calculate the divider to ensure results between 0 and 254
my $divider = int( $count / 255) +1 ;

# Re-open the source file and update the weight
open FILE, "from.txt" or die $!;

while (my $line = <FILE>) {
        $count--;

        # Replace the weight if its a word line,
        # otherwise print without actions       
        if ($line =~ /<w f=/) {
                my $weighed = int( $count / $divider);
                $line =~ s/".*"/"$weighed"/g;
        }

        # print the line
        print $line;
        }

close FILE;

More on this later.

  • *mouth open in amazement* xbmc just downloaded the subtitles on the fly from some site; who knows WHERE! This is Spectacular software! 18 weeks 11 hours ago
  • Very unrealistic dialogue and plot 18 weeks 21 hours ago

Site QR Code

Use this code at your own peril; I am not responsible for anything that happens to you or your devices. You re a big boy or girl, please do some research before using it if you have any concerns!