2013-12-21

Default Attribute Names of XLSXR in FME 2014

The Excel reader (XLSXR) creates default attribute names when the user doesn't specify "Field Names Row".
In FME 2013, those are formatted in "col_**" (** is 1-based sequential number), but the manner has been changed in FME 2014. The default attribute names will be equal to Excel column names, i.e. A, B, C, ...
Since many Excel users are familiar with the column names, this change would be welcomed. I also basically welcome it.

Now, I have several workspaces which have Excel readers. Since source Excel spread sheets don't have any available field names row, the workspaces rename old style default names (col_**) to appropriate names using the SchemaMapper. And I have created common schema definition tables for the SchemaMappers and Dynamic Schema writers.
The same way could be effective to new similar workspace which has to read Excel spread sheets without field names row.
In such a case, I want to use old style default attribute names. Because A, B, C, ... have to be typed manually when creating schema definition tables; it's not only troublesome but also easy to make typos. "col_**" can be input easily by drag copying on an Excel spread sheet.

As a workaround, I'm thinking of a processing with a PythonCaller which renames the Excel column names (A, B, C) to the old style default attribute names (col_**).
This script is a prototype, not tested enough.
-----
import fmeobjects, re

def replaceXlsColumnNames(feature):
    for name in feature.getAllAttributeNames():
        if re.match('^[A-Z]+$', name):
            num, m = 0, 1
            for i in range(len(name) - 1, -1, -1):
                num += (ord(name[i]) - 64) * m
                m *= 26
            value = feature.getAttribute(name)
            feature.setAttribute('col_%d' % num, value)
            feature.removeAttribute(name)
-----

Important, 2014-02-05: I heard that Safe is planning to change the implementation of Python API fmeobjects.FMEFeature.getAttribute() method, so that it returns an empty string when specified attribute stores <null>. Currently - FME 2014 build 14234 - it returns "None" in that case.
After confirming the change, I will revise related descriptions (underlined) in this article.
-----
2014-02-14: I noticed that the method in FME 2014 SP1 Beta (build 14255) returns an empty string for <null>. The change of implementation seems to be done for SP1.
-----
2014-02-25: The change about FME Objects Python API has been announced. I revised related descriptions in this article (underlined).
=====
2013-12-22: There are two issues in the script example above.

1) It cannot distinguish null and missing attributes from others.
FME 2014 SP1+ (build 14252 or later):
If specified attribute is missing, fmeobljects.FMEFeature.getAttribute method returns None, and setAttribute method throws an error when receiving it. getAttribute method returns an empty string for <null>.
FME 2014 without SP*:
If specified attribute is null or missing, fmeobjects.FMEFeature.getAttribute method returns None, and setAttribute method throws an error when receiving it.

Note: XLSXR has "Read blank cells as" parameter, user can select "Null" or "Missing".

2) When the schema of input features is the same (not vary), it's inefficient to create new attribute names for every input feature. But any feature may have missing attributes, so it's necessary to check all attribute names for every feature.

This is an improved version. Also a little more Pythonic? (FME 2014 Beta build 14223)
-----
import fmeobjects, re

class XlsColumnNamesReplacer(object):
    def __init__(self):
        self.mapper = []
        self.oldNames = set([])

    def input(self, feature):
        allNames = [a for a in feature.getAllAttributeNames() if re.match('^[A-Z]+$', a)]
        for name in set(allNames) - self.oldNames:
            num, m = 0, 1
            for i in [ord(c) - 64 for c in name[::-1]]:
                num += i * m
                m *= 26
            self.mapper.append((name, 'col_%d' % num))
            self.oldNames.add(name)
        for oldName, newName in self.mapper:
            isNull, isMissing, type = feature.getAttributeNullMissingAndType(oldName)
            if isNull:
                feature.setAttributeNullWithType(newName, type)
            elif not isMissing:
                feature.setAttribute(newName, feature.getAttribute(oldName))
            feature.removeAttribute(oldName)
        self.pyoutput(feature)
     
    def close(self):
        pass
-----
fmeobjects.FMEFeature.getAttributeNullMissingAndType and setAttributeNullWithType are new methods added in FME 2014. > Null in FME 2014: Handling Null with Python / Tcl

=====
2103-12-23: The TclCaller can be also used. Since FME pre-defines FME_RenameAttribute procedure, it's not necessary to care about <null> and <missing> when renaming attributes.
-- Why doesn't Python API provide a method to rename attributes?
Anyway, this is a Tcl script example.
-----
set oldNames {}
set newNames {}

proc replaceXlsColumnNames {} {
  global oldNames newNames
  foreach name [FME_AttributeNames] {
    if {[lsearch -exact $oldNames $name] < 0 && [regexp {^[A-Z]+$} $name]} {
      set num 0
      set m 1
      foreach ch [lreverse [split $name {}]] {
        set num [expr $num + ([scan $ch %c] - 64) * $m]
        set m [expr $m * 26]
      }
      lappend oldNames $name
      lappend newNames "col_$num"
    }
  }
  for {set i 0} {$i < [llength $oldNames]} {incr i} {
    FME_RenameAttribute [lindex $newNames $i] [lindex $oldNames $i]
  }
}
-----
For processing <null> attributes in Tcl script, FME_IsAttributeNull and FME_SetAttributeNull procedures have been added in FME 2014. > Null in FME 2014: Handling Null with Python / Tcl

No comments:

Post a Comment