Using Babylon-based dictionaries on your Kindle

Never get lost in translation

UPDATE! A Follow-Up Post on this Project
Since this post got wide attention, I've decided to follow-up on this project.
See my new Babylon-based dictionaries on Kindle - Round 2 post.
Now the project is shared as open-source and pre-built dictionaries are organized and shared.

Lost in translation

The problem

Addressing this issue started by by trying to purchase an Italian-English dictionary for my 2nd generation Kindle, running Kindle software v2.5.3.
One dictionary was offered for sale (as an ebook) on Amazon's website. The problem was that the dictionary was not actually available for the device for another whole year..

Good translations

Babylon, on the other hand, offers high-quality dictionaries, spanning over pretty much every language. Babylon Translator is a paid software for Windows. Its dictionary files (.BGL) are offered as free downloads.

In a perfect universe

If I only had a way to import Babylon's free content dictionary into my Kindle and use it as the built-in dictionary, it would have been perfect..

The solution presented here was tested on my Kindle 2. I'm pretty sure it should work on newer versions of Kindle as well.

The same Babylon dictionary, used on my PC (Left) and on my Kindle (Right)

Article level: Reasonably moderate

Reasonably moderate

Cracking the Unicode codepage code

Spoilt Kindle 2

There are a few things to know about multilingual support and Kindle (if you wish to view non-Latin international texts):
Kindle 2 does not natively support non-Latin unicode characters. This means if you try to view an ebook which contains non-Latin text (e.g. Cyrillic), you will see blank squares instead of letters.
This is a huge miss on Amazon's side for 2 reasons:

  1. Unicode characters are already supported on all platforms, computers, tablets, phones, websites, etc. All modern devices can natively display any character set. All except the Kindle 2, that is.
  2. Kindle is not a laptop, nor a tablet, nor a smartphone.
    It's one and only purpose is to be an electronic book reader. The only thing it should do well is display texts. Why not have it natively support any text in any language? Especially since the resources for that are so common and so obvious already.. It isn't 1994 anymore...
    There is a workaround (a hack) which enables Kindle 2 to display all unicode characters. It's described in detail in this great blog post, which includes links to all the necessary files to make it work, elaborate instructions and links to alternative fonts which may be installed for improved readability as well.
    I am not sure how right-to-left books are displayed (e.g. ebooks in Hebrew), in terms of text-alignment and order of characters, because I have not tested such books yet. For left-to-right (e.g. in Bulgarian) languages everything seems to be OK.

And there's more..

Three more points to take into account:
  • Kindle models of generation 3 and above do support unicode natively.
    This means that they properly display ebooks in any non-Latin language.
  • Even after hacking my Kindle 2 to display non-Latin characters, I didn't manage to use the integrated dictionary to look up words in non-Latin languages.
    For example, if I'm reading a Bulgarian book and I wish to use a Bulgarian to English translator as the default integrated dictionary (i.e. point the cursor on a word to look it up), the solution described in this post doesn't seem to work (the lookup functionality does not look up).
    It seems that the integrated dictionary look-up functionality supports Latin characters only. Perhaps newer generations of Kindle don't suffer from this problem.
    I'd love to get enlightened by anyone who has succeeded to achieve this with a Kindle of any generation.
  • Setting a new default dictionary worked nicely on my Kindle device itself.
    However, I found it difficult to use my custom dictionary on my computer running Kindle for PC or on my phone running Kindle for Android app.
My Kindle 2

Ingredients

A quick download list to the tools you will need:

Step 1: Get the dictionary file

In order to create your custom Babylon dictionary file for Kindle you will need a Babylon dictionary file.
Go to Babylon's free dictionaries page, choose one (or more) and download it. All done, right? Not quite.

The dictionary file you've downloaded from Babylon's site is actually an .EXE installer, which contains the dictionary file archived in it.
There are some suggestions that it may be possible to extract the .BGL file from the installer with 7-Zip, but I did not manage to do so. The easiest way to get the dictionary file out is to run the installer, which will install Babylon (at least in trial mode).
Once Babylon is installed the .BGL file resides in %LOCALAPPDATA%\Babylon (Windows Vista/7). You may repeat the process for as many dictionaries as you require. Copy out the precious .BGL file(s) and keep or uninstall Babylon as you wish.

Step 2: Use my magic tool: BabylonToHtml

The next step is to convert the binary .BGL dictionary to textual HTML file (of a very specific structure, of course) which will be used as the source of the eBook.

About my magic tool

The binary structure of .BGL files has already been cracked (not by me). This knowledge is commonly out in the open and shared across various open-source projects. I have combined a few of those resources into one easy-to-use command-line utility.

  • One source was dictconv, a dictionary conversion tool for Linux which comes with its full C++ source. I used parts of this code (ported by me into C#) in order to analyse the meta-data of the dictionary file (text encoding, author etc).
  • Another resource is is an open-source project named ThaiLanguageTools. It's written in C# but the contents of the code looks suspiciously similar to the code of dictconv mentioned above (similar variable names, comments etc) which suggests it's a porting as well.
  • The content of Babylon's .BGL files is encoded in compressed GZip format. In order to decompress the data, I have incorporated the free open-source SharpZipLib into the project as well (as source code, so there is only one executable needed to run my app in the end. no additional DLLs).
To all the above I added my very own simple HTML generator. It structures the entries from the dictionary file in a markup compatible with the next step (converting it into an eBook).

Get the tool (with or without the source code)

If you wish to browse through the sources (and improve them!), you can download in the full Visual Studio solution from this link.
You may just want to get the executable itself and this can be done with this link.

Use it

You'll need to run my BabylonToHtml tool in a command prompt window.
If you run it without any additional parameters, you'll receive some basic help:

A handy message for the perplexed user..

Command line parameters:

  • In most cases all you have to provide is the name (and potentially the path) of your .BGL file.
  • The output .HTML is encoded in UTF-8 (Unicode).
    However, the entries read from the .BGL dictionary are encoded with specific character sets (and sometimes with more than one).For example: in a Chinese - Bulgarian dictionary the source language entries are encoded with Chinese characters and the target language entries are encoded in Cyrillic.
    BabylonToHtml will try, by default, to get the right encoding (this info is available in the meta-data of the .BGL file in most cases), but it may make mistakes.
    These encodings can be enforced:
    It is possible to set the codepage of the source language by specifying the -se command line argument.
    It is possible to set the codepage of the target language by specifying the -te command line argument.

So something like the following should be sufficient in most cases:


BabylonToHtml.exe English_Bulgarian.BGL

If your .BGL file does not reside in the same folder with the .EXE, a full path should be specified (may be wrapped with double-quotes if needed).

The encoding (and other information about your dictionary) is be parsed and progress of the process is presented...

Running...

Once the process is done, a new HTML file resides next to the original .BGL file
The new file's name matches the original .BGL file (just with .HTML extension):

All done. A new HTML file is generated. Magic!!

Step 3: Convert the dictionary to a Kindle compatible eBook

For this you will need to download, install and run the free Mobipocket Creator. The process itself is fairly simple. Here is the illustrated version:

On the main window, under "Import From Existing File" click the "HTML document" link.

Import from: HTML (duh!)

On the next screen:
Click "Browse..." on the "Choose a file" field and select the HTML file generated by BabylonToHtml.
In the "Encoding" drop-down select "International (UTF8)".
Click the "Import" button..

Import the HTML file

Click "Book settings" on the left-hand-side list and set the fields:
Set the "Encoding" drop-down to "International (UTF8)".
Check the "This eBook is a dictionary" box.
Set the Input language and the output language of your dictionary appropriately.
Click the "Update" button..

Dictionary settings..

Click "Metadata" on the left-hand-side list and set the mandatory fields:
Give a title for your eBook, set the author, language and main subject.
Now scroll all the way down...

Metadata(1/2): Fill a title, author, language and main subject

At the bottom of the "Metadata" screen, fill the "Suggested Retail Price" field (it cannot be left empty, "0" is also fine).
Click the "Update" button..

Metadata(2/2): Set the retail price :-)

On the top bar click the "Build" icon...

Build(1/4): Click Build

In the "Build Publication" screen click the "Build" button...

Build(2/4): Click Build

Wait for the build.
Depending on the size of your dictionary (and the size of the generated HTML file) this may take some time.

Build(3/4): Wait...

Once the process is finished, select the "Open folder containing eBook" radio button and click "OK" to get your dictionary eBook.

Build(4/4): All is done!

Your dictionary-eBook is a file with .prc extension:

Your eBook is produced with a .prc extetnsion

Step 4: Transfer the dictionary to your Kindle and start using it

Transfer

Plug the Kindle to the computer (duh!). Transfer the new eBook to the usual Documents folder, alongside your other books, and unplug.

Note: In some newer versions of Kindle, the dictionaries have been moved from the Documents folder to the Documents/Dictionaries subfolder. If the dictionary is not recognized by your Kindle device, move it there.

Set as default

Click the "Home" button, then click "Menu" and go to "Settings" and Enter:
Home screen > Menu > Settings

In the Settings screen click "Menu" again and go to "Change Primary Dictionary":

Settings screen > Menu > Change Primary Dictionary

Your newly created dictionary should appear next to the default Oxford one.
Select it and Enter:

Choose your custom dictionary

Then Click Home to leave the Settings page.
Your dictionary is now the default translator whenever you select a word in a book:

Babylon dictionary on Kindle!

You may also manually look up words in your custom dictionary as you do with the default English one.

Bonus tip: Take screenshots from the Kindle To take a screenshot from the Kindle device: Press the Shift key () + ALT key + G simultaneously. The screen will flicker.

Plug the kindle to the computer, your screenshot files are in the Documents folder, named screen_shot*.gif. Note: This process sometimes needs to be repeated. You may not find your screenshots every time. Not sure why.

UPDATE! A Follow-Up Post on this Project
Since this post got wide attention, I've decided to follow-up on this project.
See my new Babylon-based dictionaries on Kindle - Round 2 post.
Now the project is shared as open-source and pre-built dictionaries are organized and shared.