wordpress hit counter
XSLT transforming XML to Open XML using Java - OpenXML Developer - Blog - OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

XSLT transforming XML to Open XML using Java

XSLT transforming XML to Open XML using Java

  • Comments 11

by Darcy Thomas

 

The Open XML Developer site has an article by Bryce Telford at http://Open XMLdeveloper.org/articles/4283.aspx on transforming HL7 formatted XML file to an Open XML document. This technique is valid for transforming any XML file into an Open XML document. He used C# in his example, so I decided to try replicating what he had done in Java.

Below is an example document where the content was programmatically created, from an xml data source:

 

Below is a screen shot of an example HL7 (xml) file:

 

HL7 is a standard (based on XML) for transmitting medical data between hospital information systems.  Refer to http://en.wikipedia.org/wiki/HL7 for further details.

An Open XML document is a collection of zipped XML files. In theory, you can navigate around the document and make changes using just standard java zip and XML tools. Ted Neward discusses this in more detail: http://www.infoq.com/articles/cracking-office-2007-with-java . This seemed complicated, which lead me to the Open XML4J library (you can download it here). This library smoothes out all the fiddly details.  Originally created by Julien Chable it has now part of the Apache POI project http://poi.apache.org.

Open XML4J has a couple of dependencies, Dom4J (you can download it here) and Log4J (you can download it here), which you will need to add to the class path. The Log4J also needs a log4j.properties file in the root source folder. This is just for Log4J’s configuration. In the attached zip file, there is the log4j.properties file which I used (which only logs ‘Fatal errors’). You can run a program without this file, but you will get an error message printed to the console when you run it.

The implementation which I made, takes a few command line arguments: A template document, a HL7 xml data file, an xslt transform file, and the file name for the new document.

In my Eclipse development project, I set arguments in the build configuration in Eclipse as follows:  click Run => Run Configurations, click the ‘Arguments’ tab, type the arguments you want in the ‘Program arguments’ text box, click ‘Run’ to finish.

 

The C# example mentioned earlier goes into greater detail in comparison as I’m focusing on how to do this in Java. I just copied the template document, xslt transform file and HL7 xml data file. Bryce discussed in detail how he created these and I would recommend you reading how he achieved this.

There is some simple data checking and error handling which I won’t go into. Review how I achieved this in the attached zip file if you need to.

 

Open and retrieve the core of the template document

 

The first step was to extract the core of the template document, which I want to modify using the Open XML4J methods:

// Open the package

                            

Package pkgToModify = Package.open(args[0], PackageAccess.READ_WRITE);

 

// Get documents core document part relationship

PackageRelationship coreDocumentRelationship = pkgToModify                            .getRelationshipsByType(PackageRelationshipTypes.CORE_DOCUMENT)

                             .getRelationship(0);

 

// Get core document part from the relationship.

PackagePart coreDocumentPart = pkgToModify.getPart(coreDocumentRelationship);

 

InputStream inStream = coreDocumentPart.getInputStream();

SAXReader docReader = new SAXReader();

Document documentToModify = docReader.read(inStream); 

 

Then load in the Hl7 xml data and the xslt file and perform a transform of the HL7 data. This converts the data, into a run, for the body of the document.

 

// --Transform HL7 to Open XML document body--

 

//get the xslt, load into transformer

                            

StreamSource  xsltStream = new StreamSource(new File( args[2])  );

TransformerFactory factory  = TransformerFactory.newInstance();

Transformer transformer = factory.newTransformer( xsltStream );

 

//get the HL7 (xml) file

File hl7xmlFile = new File(args[1]); 

Document docSource = docReader.read(hl7xmlFile);

 

 

//Transform the HL7 data into the custom body of our

DocumentSource source = new DocumentSource(docSource);

DocumentResult result = new DocumentResult();

transformer.transform( source, result );

 

Document transformedDoc = result.getDocument(); //the transformed hl7 data

 

The xslt used here is created by taking the xml from the template document; and making a few changes to convert it to an xslt. See step 3, detailed here.

 

 

 

Insert data into the template and save to file.

 

 

I now have the HL7 data formatted in a way which I can insert into the appropriate place in the template document. Using the Open XML4J methods we can easily insert that into the correct place:

 

 

// Find the body of the template document

Namespace namespaceWordProcessingML = new Namespace("w",

"http://schemas.Open XMLformats.org/wordprocessingml/2006/main");

 

Element bodyElement = documentToModify.getRootElement().element(

new QName("body", namespaceWordProcessingML));

 

// Retrieves paragraph childs from body element

List run = bodyElement.content();

 

 

Element newBodyRun = transformedDoc.getRootElement().element(

      new QName("body", namespaceWordProcessingML));

 

run.clear(); // remove old info from template body

run.addAll(newBodyRun.content()); //add the new content

 

// Save back the content into the part

                 

/*note changes made to 'run' (which was derived from documentToModify

/ will be incorporated into the saved file when the next two methods are run

*/

 

StreamHelper.saveXmlInStream( documentToModify,

coreDocumentPart.getOutputStream() );

 

pkgToModify.save(new File(args[3])); //get the name of the file, to save new document as, and write to disk

 

A gotcha which threw me was when you get the body content and assign it to a variable (in this example ‘run’) changes made to that variable will be automatically taken and incorporated when you call:   StreamHelper.saveXmlInStream( documentToModify,

coreDocumentPart.getOutputStream() );

 

 

So there you have it: xslt transforming xml to Open XML, in Java. The Open4J library makes it uncomplicated. There is a lot of potential here. Referring to http://Open XML4j.org/Documentation/Scenarios.html, there are a number of other scenarios where the Open XML4J can be put to use.

 

Check out the example (zip file) below. In the QuickDemonstration folder there is a packaged Jar file, test data files and a batch file so you can see the transformation in action. I have also included an eclipse project ready for import.

 

 

Attachment: HL7-to-OpenXML.zip
Page 1 of 1 (11 items)