2013-08-24

Japanese character problem

=====
2103-12-19: I confirmed that almost all the problems  related to Japanese characters has been solved in FME 2014 Beta Build 14220.
> Japanese Character Problems in FME 2014 Beta
=====

The Japanese Windows usually uses the code page 932 to process character strings. In this environment, a standard Japanese character is represented with two byte code.
There are some characters whose second byte is equal to 0x5c. Since 0x5c indicates a back slash (\) in ascii character code set, those characters could cause problem in string processing. So, some people often call them as だめ文字 (damemoji: dumb characters).

Almost all the damemoji are seldom used, but one of them - 表 (hyou) exceptionally very often occurs in FME use cases, because it means "table".
What is worse, FME currently doesn't work if a Reader / Writer Feature Type Name ends with a damemoji. In fact, there are many database tables, Excel worksheets, csv files named "****表" (foobar_table). We have to replace "表" with a non-damemoji character or remove it in such a case. This is the only workaround in the meantime.
It's a serious problem for the Japanese FME users.
Hope strongly that the problem is solved as soon as possible.

=====
2013-11-29: On Writers, it seems that this problem has been solved in FME 2013 SP2+.
Many thanks, Safe!

=====
2013-11-29: A friend asked me. "There should be many countries other than Japan using 2-byte character code set. Are there Damemoji in those countries?"
Probably, yes. According to "Shift_JIS" (Wikipedia Japanese version), GBK (Chinese, cp936) and Big5 (Traditional Chinese, cp950) also have some Damemoji - i.e. characters whose second byte is 0x5C.
But I have not heard that the problem similar to Shift JIS (Japanese, cp932) occurs in FME use cases. If there is an opportunity, I would like to discuss with Chinese FME users about this problem.

Next: Japanese Character Problems in FME 2014 Beta

No comments:

Post a Comment