2013-09-30

FME stores all attributes as character strings: Part 2

I quoted this description from the FME Workbench documentation in the previous post.
> FME Workbench > FME Architecture (Reference) > Attributes
"Feature attributes are usually a primitive type: integers, floats, characters. Internally, FME stores all attributes as character strings and automatically converts between a string representation and a numeric representation as needed."
=====
2015-05-07: Oops, the link is invalid now. Maybe it has been removed...
=====

However, some attributes created by certain transformers seem to be non-string objects. For example, the type of _part_number created by the Deaggregator will be integer. So, I think the description should be understood as a "principle".

As long as processing attributes via existing transformers, this will cause a problem rarely. But we should be aware that there could be different data types when processing attributes in a Python script.

I believe the most important thing is that Python uses two types to represent character strings. i.e. "str" and "unicode".
I think FME users especially with non-English (non-ASCII) locale often encounter errors saying <UnicodeEncodeError> when trying to process character strings in Python script. Such an error seems to occur when a "unicode" instance can not be interpreted to a "str" instance.
To avoid the error, we should encode explicitly the string if it is a unicode instance:
-----
    if isinstance(s, unicode):
        s = s.encode("<encoding name>")
-----

Type the actual encoding name to <encoding name>, e.g. "cp932" in Japanese Windows standard locale.
More generally, we can also get the appropriate encoding name with locale.getdefaultlocale function.

> PythonCaller: Use logMessageString Problems with Encoding
Revised just a little:
-----
import fmeobjects
import locale

class FeatureProcessor(object):
    def __init__(self):
        loc = locale.getdefaultlocale()
        self.enc = loc[1] # save the default encoding name
        self.logger = fmeobjects.FMELogFile()

    def input(self, feature):
        s = feature.getAttribute('attr')
        if isinstance(s, unicode):
            s = s.encode(self.enc)
        self.logger.logMessageString(str(s))

    def close(self):
        pass
-----

Although there is a workaround, it's troublesome a little. Hope the functions of FME Objects Python API can interpret always "unicode" instances automatically.

> Unicode HOWTO

No comments:

Post a Comment