2014-01-11

xfMap: Generate Common Feature Type for Different Schemas

Many fundamental geographic datasets are provided by Japanese government in XML format.
> 基盤地図情報 Fundamental Geospatial Data
> 国土数値情報 National  Land Numerical Information (English)
Naturally, XML schema for every data type is strictly defined, but in fact, there are some datasets created in different (incorrect) schema.

This is a simplified example describing administrative areas in certain year. The actual data contains some more attributes and geometry elements (and of course contains Japanese characters), but those are omitted for simplifying. Assume this is the correct schema.
-----
<?xml version="1.0"?>
<Dataset>
    <AdministrativeBoundary>
        <prefectureName>Chiba</prefectureName>
        <cityName>Matsudo</cityName>
        <administrativeAreaCode>12207</administrativeAreaCode>
    </AdministrativeBoundary>
    <AdministrativeBoundary>
        <prefectureName>Chiba</prefectureName>
        <cityName>Kashiwa</cityName>
        <administrativeAreaCode>12217</administrativeAreaCode>
    </AdministrativeBoundary>
</Dataset>
-----
# I'm living in Kashiwa city :-)

This is the same area data created in a different year. Contents are same as above, but schema is incorrect.
-----
<?xml version="1.0"?>
<Dataset>
    <AdministrativeArea>
        <prn>Chiba</prn>
        <cn2>Matsudo</cn2>
        <acc>12207</acc>
    </AdministrativeArea>
    <AdministrativeArea>
        <prn>Chiba</prn>
        <cn2>Kashiwa</cn2>
        <acc>12217</acc>
    </AdministrativeArea>
</Dataset>
-----

I need to read both datasets as same feature type with an XML Reader. Although those schemas are simple, the XML Reader with "Feature Paths" option cannot generate a common feature type from different schemas. How should I do that?

I solved the issue by defining an xfMap like this.
-----
<?xml version="1.0"?>
<xfMap>
    <feature-map>
        <!-- For correct schema -->
        <mapping match="AdministrativeBoundary">
            <feature-type>
                <literal expr="AdministrativeBoundary" />
            </feature-type>
            <structure matched-prefix="no" cardinality="*/+" />
        </mapping>

        <!-- For incorrect schema -->
        <mapping match="AdministrativeArea">
            <feature-type>
                <literal expr="AdministrativeBoundary" />
            </feature-type>
            <attributes>
                <attribute>
                    <name><literal expr="prefectureName" /></name>
                    <value><extract expr="./prn" /></value>
                </attribute>
                <attribute>
                    <name><literal expr="cityName" /></name>
                    <value><extract expr="./cn2" /></value>
                </attribute>
                <attribute>
                    <name><literal expr="administrativeAreaCode" /></name>
                    <value><extract expr="./acc" /></value>
                </attribute>
            </attributes>
        </mapping>
    </feature-map>
</xfMap>
-----

The first <mapping> element is for the correct schema. The functionality is similar to flattening <AdministrativeBoundary> element with "Feature Paths" option.
The second <mapping> element will map XML element values described in incorrect elements to the correct schema.

The XML Reader with the xfMap generates a feature type named "AdministrativeBoundary" like this (FME 2014 Beta build 14227). It can read any dataset created in both correct schema and incorrect schema.











I'm not sure why the government published the data with incorrect schema, and I'm wondering that nobody has noticed that. In fact, the government is also providing datasets in Esri Shape format simultaneously, so I suspect nobody uses XML data. But some parts of the original XML data cannot be stored in Shape datasets by limitation of the format specification. I think it is a loss to the nation if those cannot be used effectively.
FME can read them flexibly with no loss even though the schema is not correct. That's great.

In addition, <schema-type> element can be added to the xfMap to define data types of attributes and the order of them explicitly. For example:
-----
<?xml version="1.0"?>
<xfMap>
    <schema-type>
        <inline>
            <schema-feature type="AdministrativeBoundary">
                <schema-attribute name="fme_geometry{0}" type="xml_no_geom" />
                <schema-attribute name="prefectureName" type="xml_buffer" />
                <schema-attribute name="cityName" type="xml_buffer" />
                <schema-attribute name="administrativeAreaCode" type="xml_buffer" />
            </schema-feature>
        </inline>
    </schema-type>
    <feature-map>
        ...
    </feature-map>
</xfMap>
-----

No comments:

Post a Comment