-
Brutzman, Don authoredBrutzman, Don authored
Data Format Description Language (DFDL) "Attribution" Project
This project is working to show that additional DFDL support for XML attributes is feasible by using a "pipeline" approach to processing.
Good initial progress has been made that allows use of an attribute-aware XML schema. Pre- and post-processing XSLT stylesheets can convert XML documents and schemas into equivalent element-only form that DFDL can use to parse/unparse data documents.
DFDL Problem
DFDL Specification Section 1.3 "What DFDL is not" has long stated that DFDL design goals are constrained so that DFDL "cannot use XML attributes in the data model." This constraint helped multiple codebases implement the DFDL specification and reach an exceptionally high level of expressive power and capability.
Nevertheless, lack of XML attribute support remains a significant shortfall in Data Format Description Language (DFDL) design that unnecessarily inhibits broader adoption and usage. This functional gap means that many existing XML schemas (both standardized and informal) cannot be used directly with DFDL. Further side effects include inconsistent applicability of quality assurance (QA) and data-processing tools that are expressed in XPath, XSLT and XML Schematron.
Goal, Motivation, Rationale
Goal. Gain ability to apply DFDL markup to common XML schemas that contain both elements and attributes.
Motivation. Adding such a capability can enable archivability of unified, useful XML schemas that integrate DFDL mappings for data encodings while retaining regular XML validation of information-model requirements. This approach is repeatable across a wide range of data formats and information models.
Rationale. Strong validation prevents Garbage In Garbage Out (GIGO) syndromes for data streams. Having a single authoritative XML schema for data validation avoids versionitis challenges that prevent coherent interoperability and archival reusability for a given information model.
Approach
The
potential workaround being explored in these experiments is to precisely
modify original XML documents along with their corresponding schemas for DFDL use,
temporarily handling XML attributes as XML elements within an automatable
DFDL preprocessing and postprocessing pipeline.
Potential end result is to support any original XML element-attribute document design with all DFDL data-format capabilities, along with complete validation rigor throughout.
Preliminary test results are positive for this conversion pipeline. Diagrams for XML test examples compare element-attribute to element-only forms using XML Schema view, XML tree view and XML document view.
TODO
- Further document this approach with a slideset. Improve DFDL markup definitions and handling in examples.
- Continue with round-tripping of example XML documents to/from DFDL data.
- Add further structural-validity checks in conversion stylesheets and schemas.
- Create Ant macros for simple pipeline repeatability across multiple data formats.
- Document DFDL usage patterns with Apache Netbeans and XMLSpy authoring tools.
- Add Efficient Extensible Interchange (EXI) compression and compare file sizes.
- Demonstrate ability to similarly round-trip JSON via this XML approach.