Chinese translation has garbage for index entries (also Mif2Go issue)

Jeremy H. Griffith jeremy at omsys.com
Tue May 17 20:58:35 PDT 2011


On Tue, 17 May 2011 16:18:24 -0400, Celine Deguire <celdeguire at gmail.com> wrote:

>I'm reviewing Chinese Simplified files returned from my Chinese translator
>and everything is fine except for the index entries displayed as garbage in
>the marker dialog. 

That should not happen.  In Frame 7.2, Simplifiesd Chinese is encoded
as the double-byte code page 936, GBK.  That's the same as the rest of
the text encoding, so it should look like Chinese, not garbage.

>In the generated index, the text appears to be Chinese
>after updating the book. Upon conversion to HTML Help 1.x output, the *.hhk
>file contains garbage characters for the index entries. 

You need the most recent Mif2Go DLLs and EXEs, from our Dropbox site,
for this to work.  Since you are a registered user of Mif2Go, you
can access those.  (Sorry, no demo versions of them yet.)  You also
must have the ICU DLLs from our site.

In addition, you need to make a few .ini settings not in the previous
User's Guide, like:

[MSHtmlHelpOptions]
; HelpFileLanguage = LCID to put in project file, default is for
: US English.
HelpFileLanguage = 0x804 Chinese (Simplified)

[HtmlOptions]
; IndexSortType = Numeric (default, code-point order),
;  Lexical (using MS strcoll functions), or
;  Alpha (sort accented letters as though they are unaccented).
IndexSortType=Lexical
; IndexSortLocale = language to use for sorting index.
;  When IndexSortType is Lexical, default is current
;  OS country setting. Uses MS language names.
;IndexSortLocale=Chinese (Simplified)

>As a test, we saved the files in FM 8.0  p277 format and the index entries
>were garbage (mostly question marks) in the marker dialog and generated
>index. As expected, after Mif2Go conversion *.hhk file shows entries as
>garbage.

You may have experienced a Frame bug there.  When Frame converts
a pre-8.0 file to 8 or later, it converts the content to Unicode
in UTF-8 encoding.  However, we recently found that the index
markers are not converted correctly, at least for Japanese and
probably for all DBCS encodings (Chinese, Korean). Instead of 
converting character by character, Frame converts byte by byte,
encoding each byte of each double-byte character in UTF-8 
individually.  This is not valid in any sense, and is not a
recoverable error.  You can either replace all index entries by 
hand with new ones *created* in Frame 8, or stay with 7 forever.

We have not investigated what Frame 8+ does with index entries
when saving back to FM7, but it may yield unexpected results,
like the garbage you observed in the FM7 files you got back
from the translator...

If you could send us a page from one of the files you got back,
with an index entry, in FM 7 MIF, we can look at the encoding 
more closely.  It's easy to spot the double-encoding when you
know exactly what you are looking for.

HTH!

-- Jeremy H. Griffith, at Omni Systems Inc.
  <jeremy at omsys.com>  http://www.omsys.com/



More information about the framers mailing list