The mapping of flat file-data seems to be a piece of cake: read data, read maps, validate data, map data, write data.
But what if you have to map data from various and very (very!) heterogeneous international sources such as exports from different SAP systems and various Web- and Excel-Tools into a single structure?
Create a mapping table for each different source as the customer proposed? This would be quite simple – the problem, however, is, that the maps themselves are dynamic data, so you would need an entire battalion of guys maintaining it. The use of commercial ETL-Tools was disregarded too, as the mapping was considered to be too complex (which would actually be a point in favor of the use of an ETL-tool – not against it ;)), the data is profoundly country-specific and subject to changes as the customer’s business-processes are currently in a phase of re-definition.
The solution is a two-phased mapping process:
The first-phase mapping uses XML-files to validate the data and map it into a single, unified structure which will be mapped into the final data structure in phase two.
The XML-files contain the description, validation and formatting information for each data source and look like this:
…
<field source=“somePart“ target=“some_part“ type=“str“/>
<field source=“someDate“ target=“some_date“ type=“date“ rexexp=“/^[0-9]{8}$/“/ format=“sapGerDate“>
<field source=“somePrice“ target=“some_price“ type=“float“/ format=“sapGerFloat“>
…
After the data has been validated it is formatted and buffered in an array with the XML file’s target fields as keys.
Now, we have the desired unified data structure, the real mapping can be started. Hooray!! From here things start to get really complicated – but this is another story for 10 more postings.
However, this way only one single mapping table has to be maintained for all different data sources. All what has to be maintained in addition is a XML file for each data source, describing source data and target fields. For future structural changes of the source files, only the XML-files have to be adjusted, not the maps themselves.