2014-01-26

Efficiency of List Attribute Manipulation with Python

Important
2014-01-29: I heard that Safe is planning to change the implementation of Python API fmeobjects.FMEFeature.getAttribute() method, so that it returns an empty string when specified attribute stores <null>. Currently - FME 2014 build 14234 - it returns "None" in that case.
After confirming the change, I will revise related descriptions (underlined) in this article.
-----
2014-02-14: I noticed that the method in FME 2014 SP1 Beta (build 14255) returns an empty string for <null>. The change of implementation seems to be done for SP1.
-----
2014-02-25: The change about FME Objects Python API has been announced. I revised related descriptions in this article (underlined).

(FME 2014 build 14234)
=====
2015-12-23: I tested again the same scripts shown in this article with FME 2015.1.3.1 build 15573. Contrary to my expectations, the "copyList1" was always faster than others in both Result1 and Result2. The order of others was "copyList2" < "copyList3" < "copyList4" in almost all runs but not always.
=====

There are several Python scripting ways to get or set values of list attribute elements. I tested difference in processing time efficiency among them.
Note: This article describes just an experiment result in my environment (FME 2014 32-bit build 14234, Windows 7 64-bit). The result may be different in other environments and / or conditions.

At first, create a feature which has a list attribute with this script in a PythonCreator. The number of elements is 100000 and every element value is a character string.
-----
import fmeobjects
class FeatureCreator(object):
    def __init__(self):
        pass
     
    def close(self):
        feature = fmeobjects.FMEFeature()
        for i in range(100000):
            feature.setAttribute('_src{%d}' % i, 'string value')
        self.pyoutput(feature)
-----

Then, I measured processing time of these 4 scripts in PythonCaller. All of them copies every element of "_src{}" to "_dest{}". Assume the source list has no <missing> element.
-----
import fmeobjects, time
def copyList1(feature):
    s = time.clock()
    feature.setAttribute('_dest{}', feature.getAttribute('_src{}'))
    feature.setAttribute('_time_diff', '%.3f' % (time.clock() - s))
-----
import fmeobjects, time
def copyList2(feature):
    s = time.clock()
    for i, v in enumerate(feature.getAttribute('_src{}')):
        feature.setAttribute('_dest{%d}' % i, v)
    feature.setAttribute('_time_diff', '%.3f' % (time.clock() - s))
-----
import fmeobjects, time
def copyList3(feature):
    s = time.clock()
    i = 0
    while True:
        value = feature.getAttribute('_src{%d}' % i)
        if value == None:
            break
        feature.setAttribute('_dest{%d}' % i, value)
        i += 1
    feature.setAttribute('_time_diff', '%.3f' % (time.clock() - s))
-----
import fmeobjects, time
def copyList4(feature):
    s = time.clock()
    i = 0
    while True:
        isNull, isMissing, type = feature.getAttributeNullMissingAndType('_src{%d}' % i)
        if isMissing:
            break
        elif isNull:
            feature.setAttributeNullWithType('_dest{%d}' % i, type)
        else:
            feature.setAttribute('_dest{%d}' % i, feature.getAttribute('_src{%d}' % i))
        i += 1
    feature.setAttribute('_time_diff', '%.3f' % (time.clock() - s))
-----
FME 2014 SP1+ (build 14252 or later):
Note: If there were <null> elements in the source list, copyList1, copyList2 and copyList3 will copy <null> elements as empty strings.
FME 2014 without SP*:
Note: If there were <null> elements in the source list, copyList1 and copyList2 will copy <null> elements as empty strings; copyList3 stops copying when it got the first <null> element.
See also "Null in FME 2014: Handling Null with Python / Tcl".

Result 1:  faster <== copyList3 < copyList2 < copyList4 < copyList1 ==> slower












Next, I tested the case where every element is numeric value (really numeric, is not a string which represents number). The source list was created by this script in the PythonCreator.
-----
import fmeobjects
class FeatureCreator(object):
    def __init__(self):
        pass
     
    def close(self):
        feature = fmeobjects.FMEFeature()
        for i in range(100000):
            feature.setAttribute('_src{%d}' % i, 100)
        self.pyoutput(feature)
-----

Result 2: faster <== copyList3 < copyList4 < copyList2 < copyList1 ==> slower











As long as seeing the result, "copyList3" looks always faster than others.
In FME 2014 (without SP*)be aware that "copyList3" cannot be used if the source list can contain <null> elements.

-----
When writing Python script in FME 2014 (without SP*), we always have to be conscious of <null>. Maybe it's troublesome a little. It might be better to convert <null> to a certain value (e.g. empty string etc.) before processing with Python, if possible. I think the NullAttributeMapper is effective to do that.
In FME 2014 SP1+ (build 14252 or later), be aware that FMEFeature.getAttribute method returns an empty string for <null>. When it's necessary to distinguish <null> from an empty string in Python script, consider using FMEFeature.getAttributeNullMissingAndType method.

No comments:

Post a Comment