Babylon-based dictionaries on Kindle - Round 2

Never get lost in translation - again!

My post "Using Babylon-based dictionaries on your Kindle" seems to have gotten relatively wide attention. Currently the statistics show that 2595 users have seen it, 31 of them commented, 2 more via Facebook and a number of emails I've received from readers. People's interest in pre-built dictionaries which they can run on their Kindle devices (specifically English-Hebrew dictionaries, but not just) is understandable. There are some issues with the BabylonToHtml conversion tool I've put together.

Due to lack of time for resolving the known issues with the project, or for production of pre-built dictionaries, I've decided to share BabylonToHtml publicly as an open-source project, for anyone who wishes to update/improve it.

Pre-Built Dictionaries in this post!

I'm sharing some pre-built dictionaries in this post. Some were produced by me and some by others. They are shared here so that they can be freely downloaded and used by anyone. This section will be updated with dictionaries which the readers share with me over time.

Jump to the pre-built dictionaries section!

BabylonToHtml is Now Open to the Public

Why?

My lack of time to address the currently known issues with BabylonToHtml, got me to a decision to make this an open-source project for the benefit (and intervention) of the public.

Known issues

At this time there are a few known issues which need to be addressed for better production of dictionaries. Here they are AFAIK:
  1. Unresolved characters #1:
    There are unresolved portions of the produced dictionary, which are wrapped by
    <charset c=T>****;</charset> blocks. Those are probably unicode characters which need to be resolved. If left unresolved they appear as gibberish in the output dictionary (and distort the HTML structure).
  2. Unresolved characters #2:
    There are unresolved portions of the produced dictionary, which are delimited with dollar-signs, e.g: $506274$ or cos$531761$. Not 100% sure how those should be treated/resolved.
    This would require some research (see "Additional resources for code-contributors" below).
  3. Bug in the project: Encoding is never really applied as requested:
    The encoding is actually force-resolved in the code, no matter what the user says (it gets overridden by the dictionary analysis code). This needs to be fixed.
  4. Potentially unresolvable bug with Hebrew and right-to-left dictionaries:
    Specifically for Hebrew dictionaries: Kindle (at least Kindle 2) always aligns the text from left-to-right.
    Not just the letters (this is a feature which would be relatively easy to implement), but also the order of the words themselves. This means that no matter what you do, the words will be placed in their original order, only starting from the left and going right (as if they were written in English).
    Simply reversing the word order will not work, because then the outcome would give the text from end to start (may work only if the definition's text is a single-line).
    To know which word will appear where is impossible (this depends on the screen's size and selected Unicode font (in the case of Kindle 2, a hacked font needs to be installed, as explained here).
    Some problems with Hebrew dictionaries
  5. Image handling
    Some .bgl files seem to contain embedded images. This thread may have a hint about the right way to extract resources, written in C++.

Where?

The project's sources are now shared as open-source on GitHub, which supports unlimited open-source repositories (to learn a bit more about source-control, see my post "Setting Up Subversion Source-Control with Assembla and TortoiseSVN". Although that post refers to Subversion and not Git, you can still get a general idea. Besides using GitHub's client on Windows or Mac is quite intuitive).
Everyone is now allowed to get a copy of the repository, apply changes and sync-in improvements.

Here's a link to the repository: BabylonToHtml (or click the screenshot below).

BabylonToHtml on GitHub

Additional resources for code-contributors

As I stated in the original post, Babylon's .bgl format is well known and there are other online projects which parse it, some of which were suggested by commentators of the post directly and some by email.
Personally I suspect they all yield from one single reverse-engineered implementation, because of the suspicious similarity in names of variables and embedded comments.
Here are a few of those projects, which may be useful for anyone who'd like to contribute to this project's code and its known issues listed above:
  • dictconv: This is the code I've used in my BabylonToHtml, with slight modifications and changes.
  • Forum thread: BGL (babylon glossary) to GLS (babylon glossary source) (C++ implementation). This one may also handle resources, such as images, embedded into the .bgl file.
  • BGL-Reverse - another open-source reverse engineered BGL parser (Python).
  • PyGlossary - yet another open-source reverse engineered BGL parser, this one is also in Python.

Pre-Built Dictionaries!

I'll share here links to pre-built .prc files which were converted from Babylon .bgl files using my tool.
To deploy/use them, simply transfer them to your Kindle device, as explained on step 4 of the previous post, see this link.

FileSource LanguageDestination LanguageNotes
Babylon English Hebrew - Reversed Words.prc
EnglishBulgarianBy Alon Rotem
Babylon English-Hebrew Dictionary.prc
EnglishHebrew22/03/2013 Contributed by a reader of the blog, Made with pyglossary.
Text that has both Hebrew and non-Hebrew (like numbers, latin characters, parenthesis, etc.) in the same paragraph is handled better, word aliasing (e.g. write/wrote/written --> write) is added. The Hebrew text is right aligned (still words still need to be read from bottom to top).
Tested on: Kindle 3 with v3.4, Kindle Keyboard with v3.4
Babylon English-Hebrew Dictionary (Kindle Voyage version).prc
EnglishHebrew25/05/2015 Contributed by a reader of the blog, Made with pyglossary.
This dictionary had layout issues (lines were overflowing the width of the screen) when using the default Kindle Voyage font (Caecilia size 4), thus it was forced to break 40 character lines. It may have layout issues if a non-default font is used.
Tested on: Kindle Voyage, Kindle Paperwhite (Firmware 5.6.1.1)
Babylon English-Hebrew Dictionary - MG Reversed Words.prc
EnglishHebrew25/03/2013 Contributed by a reader of the blog, Made with pyglossary.
Similar to "Babylon English-Hebrew Dictionary.prc" from 22/03/2013 but the letters are reversed in each word (still words still need to be read from bottom to top).
Should be working on: Kindle Paperwhite.
Babylon Russian English.prc
RussianEnglish09/07/2013 Contributed by Simon Brenncke.
Tested on: Kindle 4
A Spanish English Dictionary Granada University Spain.prc
SpanishEnglish15/05/2013 Contributed by Claudio Acevedo.
Tested on: Kindle Paperwhite
An English Spanish Dictionary Granada University Spain.prc
EnglishSpanish15/05/2013 Contributed by Claudio Acevedo.
Tested on: Kindle Paperwhite