by Darcy Thomas
The Open XML Developer site has an article by Bryce Telford at http://Open XMLdeveloper.org/articles/4283.aspx on transforming HL7 formatted XML file to an Open XML document. This technique is valid for transforming any XML file into an Open XML document. He used C# in his example, so I decided to try replicating what he had done in Java.
Below is an example document where the content was programmatically created, from an xml data source:
HL7 is a standard (based on XML) for transmitting medical data between hospital information systems. Refer to http://en.wikipedia.org/wiki/HL7 for further details.
An Open XML document is a collection of zipped XML files. In theory, you can navigate around the document and make changes using just standard java zip and XML tools. Ted Neward discusses this in more detail: http://www.infoq.com/articles/cracking-office-2007-with-java . This seemed complicated, which lead me to the Open XML4J library (you can download it here). This library smoothes out all the fiddly details. Originally created by Julien Chable it has now part of the Apache POI project http://poi.apache.org.
Open XML4J has a couple of dependencies, Dom4J (you can download it here) and Log4J (you can download it here), which you will need to add to the class path. The Log4J also needs a log4j.properties file in the root source folder. This is just for Log4J’s configuration. In the attached zip file, there is the log4j.properties file which I used (which only logs ‘Fatal errors’). You can run a program without this file, but you will get an error message printed to the console when you run it.
The implementation which I made, takes a few command line arguments: A template document, a HL7 xml data file, an xslt transform file, and the file name for the new document.
In my Eclipse development project, I set arguments in the build configuration in Eclipse as follows: click Run => Run Configurations, click the ‘Arguments’ tab, type the arguments you want in the ‘Program arguments’ text box, click ‘Run’ to finish.
The C# example mentioned earlier goes into greater detail in comparison as I’m focusing on how to do this in Java. I just copied the template document, xslt transform file and HL7 xml data file. Bryce discussed in detail how he created these and I would recommend you reading how he achieved this.
There is some simple data checking and error handling which I won’t go into. Review how I achieved this in the attached zip file if you need to.
Open and retrieve the core of the template document
The first step was to extract the core of the template document, which I want to modify using the Open XML4J methods:
// Open the package
Package pkgToModify = Package.open(args, PackageAccess.READ_WRITE);
// Get documents core document part relationship
PackageRelationship coreDocumentRelationship = pkgToModify .getRelationshipsByType(PackageRelationshipTypes.CORE_DOCUMENT)
// Get core document part from the relationship.
PackagePart coreDocumentPart = pkgToModify.getPart(coreDocumentRelationship);
InputStream inStream = coreDocumentPart.getInputStream();
SAXReader docReader = new SAXReader();
Document documentToModify = docReader.read(inStream);
Then load in the Hl7 xml data and the xslt file and perform a transform of the HL7 data. This converts the data, into a run, for the body of the document.
// --Transform HL7 to Open XML document body--
//get the xslt, load into transformer
StreamSource xsltStream = new StreamSource(new File( args) );
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer( xsltStream );
//get the HL7 (xml) file
File hl7xmlFile = new File(args);
Document docSource = docReader.read(hl7xmlFile);
//Transform the HL7 data into the custom body of our
DocumentSource source = new DocumentSource(docSource);
DocumentResult result = new DocumentResult();
transformer.transform( source, result );
Document transformedDoc = result.getDocument(); //the transformed hl7 data
The xslt used here is created by taking the xml from the template document; and making a few changes to convert it to an xslt. See step 3, detailed here.
Insert data into the template and save to file.
I now have the HL7 data formatted in a way which I can insert into the appropriate place in the template document. Using the Open XML4J methods we can easily insert that into the correct place:
// Find the body of the template document
Namespace namespaceWordProcessingML = new Namespace("w",
Element bodyElement = documentToModify.getRootElement().element(
new QName("body", namespaceWordProcessingML));
// Retrieves paragraph childs from body element
List run = bodyElement.content();
Element newBodyRun = transformedDoc.getRootElement().element(
new QName("body", namespaceWordProcessingML));
run.clear(); // remove old info from template body
run.addAll(newBodyRun.content()); //add the new content
// Save back the content into the part
/*note changes made to 'run' (which was derived from documentToModify
/ will be incorporated into the saved file when the next two methods are run
pkgToModify.save(new File(args)); //get the name of the file, to save new document as, and write to disk
A gotcha which threw me was when you get the body content and assign it to a variable (in this example ‘run’) changes made to that variable will be automatically taken and incorporated when you call: StreamHelper.saveXmlInStream( documentToModify,
So there you have it: xslt transforming xml to Open XML, in Java. The Open4J library makes it uncomplicated. There is a lot of potential here. Referring to http://Open XML4j.org/Documentation/Scenarios.html, there are a number of other scenarios where the Open XML4J can be put to use.
Check out the example (zip file) below. In the QuickDemonstration folder there is a packaged Jar file, test data files and a batch file so you can see the transformation in action. I have also included an eclipse project ready for import.