Note: this code has been updated for the final Ecma schemas and the RTM version of Office. If you note some mistakes or have some issues, please report them to julien@chable.net
This post is a summary translation of articles by Julien Chable that have are available (in French) on MSDN France:
public final static String NS_CORE_DOCUMENT = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
...
final String APP_ROOT = System.getProperty("user.dir") + File.separator;ZipFile zipFile = null;
try {zipFile = new ZipFile(APP_ROOT + "sample.docx");} catch (IOException e) {e.printStackTrace();}
Package p = Package.open(zipFile, PackageAccess.Read);
// Retrieve core part relationship from his typePackageRelationship coreDocRelationship = p.getRelationshipsByType(PackageRelationshipConstants.NS_CORE_DOCUMENT).getRelationship(0);
// Get the content part from the relationshipPackagePart coreDocument = p.getPart(coreDocRelationship);System.out.println(coreDocument.getUri() + " -> "+ coreDocument.getContentType());
Listing 1
Listing 1 output for several types of documents :
Here are the extensions and the URI of the main part for several types of documents :
The following sample demonstrates how to get the core property part of a document :
public final static String NS_CORE_PROPERTIES = "http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties";...
// Get core properties part relationshipPackageRelationship corePropertiesRelationship = p.getRelationshipsByType(PackageRelationshipConstants.NS_CORE_PROPERTIES).getRelationship(0);
// Get core properties part from the previous relationshipPackagePart coreDocument = p.getPart(corePropertiesRelationship);System.out.println(coreDocument.getUri() + " -> "+ coreDocument.getContentType());
Listing 2
The output displays :
docProps/core.xml -> application/vnd.openxmlformats-package.core-properties+xml
Only a few simple lines are needed to get document’s properties :
...OpenXMLDocument docx = new OpenXMLDocument(Package.open(zipFile,PackageAccess.Read));System.out.println(docx.getCoreProperties().getCreator());System.out.println(docx.getCoreProperties().getTitle());System.out.println(docx.getCoreProperties().getSubject());
Listing 3
Julien CHABLE
Lorem Ipsum
Sample document
It’s as simple as to get a property :
// Destination fileFile destFile = new File(APP_ROOT + "sample_out.docx");
// Open the documentPackage pack = Package.open(zipFile, PackageAccess.ReadWrite);OpenXMLDocument docx = new OpenXMLDocument(pack);
CoreProperties coreProps = docx.getCoreProperties();coreProps.setCreator("OpenXMLDeveloer.org powa");coreProps.setDescription("A new description");coreProps.setTitle("SampleListing4");
// Save documentdocx.save(destFile);
Listing 4
The little framework associated with this article doesn’t provide any class or method to access extended properties. As a result, in this sample, we need to use DOM API to extract information from the extended properties part :
// Open the packagePackage p = Package.open(..., PackageAccess.Read);
// Get extended properties relationshipPackageRelationship extendedPropertiesRelationship = p.getRelationshipsByType(PackageRelationshipConstants.NS_EXTENDED_PROPERTIES).getRelationship(0);
// Get extended properties part from the previous relationshipPackagePart extPropsPart = p.getPart(extendedPropertiesRelationship);System.out.println(extPropsPart.getUri() + " -> "+ extPropsPart.getContentType());
// Extract contenttry {InputStream inStream = extPropsPart.getInputStream();
// Create DOM parserDocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();documentBuilderFactory.setNamespaceAware(true);documentBuilderFactory.setIgnoringElementContentWhitespace(true);
DocumentBuilder documentBuilder;documentBuilder = documentBuilderFactory.newDocumentBuilder();
// Parse XML contentDocument extPropsDoc = documentBuilder.parse(inStream);
// Extract the name and the version of the Open XML file generatorSystem.out.println("Document generated with "+ extPropsDoc.getElementsByTagName("Application").item(0).getTextContent()+ " vers. "+ extPropsDoc.getElementsByTagName("AppVersion").item(0).getTextContent());
// Extract statistics about this documentSystem.out.println("This document contains "+ extPropsDoc.getElementsByTagName("Words").item(0).getTextContent()+ " words and is composed of "+ extPropsDoc.getElementsByTagName("Characters").item(0).getTextContent()+ " characters and "+ extPropsDoc.getElementsByTagName("Lines").item(0).getTextContent() + " lines");
inStream.close();} catch (Exception ioe) {System.err.println("Failed to extract extended properties ! :(");}
Listing 5
Output of Listing 5 :
docProps/app.xml -> application/vnd.openxmlformats-officedocument.extended-properties+xml
Document generated with Microsoft Office Word vers. 12.0000
This document contains 262 words and is composed of 1444 characters and 12 lines
Many OpenXML documents, for example PowerPoint 2007, contain a thumbnail of the document. This specific part have the following relationship : http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail.
The following listing use tow methods – getThumbnails() and extractParts() – to extract the thumbnail of the document, and put it into the ‘export’ directory :
final String APP_ROOT = System.getProperty("user.dir") + File.separator;ZipFile zipFile = null; // Le fichier sourcetry {zipFile = new ZipFile(APP_ROOT + "sample.pptx");} catch (IOException e) {...}
// Destination folderFile destFile = new File(APP_ROOT + "export");
// Open the packageOpenXMLDocument docx = OpenXMLDocument.open(zipFile, PackageAccess.Read);
// Extract thumbnailsdocx.extractParts(docx.getThumbnails(), destFile);
Listing 6
Here are the details of the getThumbnails() and extractParts() methods :
public final static String NS_THUMBNAIL_PART = "http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail";...
// Retrieve all thumbnails contain in the document.public ArrayList<PackagePart> getThumbnails() {return container.getPartByRelationshipType(PackageRelationshipConstants.NS_THUMBNAIL_PART);}
Listing 6-1 (class OpenXMLDocument)
/*** Extract part content into the specified folder.* * @param parts* Parts to extract.* @param destFolder* Destination folder.*/
public void extractParts(ArrayList<PackagePart>parts, File destFolder) {for (PackagePart part : parts) {String filename = PackageURIHelper.getFilename(part.getUri());try {InputStream ins = part.getInputStream();FileOutputStream fw = new FileOutputStream(destFolder.getAbsolutePath()+ File.separator + filename);byte[] buff = new byte[512];while (ins.available() > 0) {ins.read(buff);fw.write(buff);}fw.close();} catch (IOException e) {e.printStackTrace();}}}
Listing 6-2 (class OpenXMLDocument)
Listing 6 result :
To simplify this example, we’re going to create a document from a blank one by modifying his content ; this manipulation is simpler to understand and to do for this article, than a ‘from scratch’ creation. To add paragraphs in a document, the classes ParagraphBuilder, Paragraph and Run are greatly useful :
// Creation of a paragraph builderParagraphBuilder paraBuilder = new ParagraphBuilder();paraBuilder.setAlignment(ParagraphAlignment.CENTER);
Listing 7-1
Once the ParagraphBuilder is ready, you could create a new paragraph by using the newParagraph() method :
// We create the first paragraphParagraph par1 = paraBuilder.newParagraph();
Listing 7-2
The following example creates two paragraphs with the content : ‘Hello Office Open XML’ and ‘OpenXMLDeveloper.org’ with a great font size :
Package pack = Package.open(zipFile, PackageAccess.ReadWrite);WordDocument docx = new WordDocument(pack);
// Add runs to modify the styleRun r1 = new Run("Hello");r1.setBold(true);
Run r2 = new Run(" Office");r2.setItalic(true);
Run r3 = new Run(" Open");r3.setUnderline(UnderlineStyle.SINGLE);
Run r4 = new Run(" XML");r4.setVerticalAlignement(VerticalAlignment.SUPERSCRIPT);
// Add previous runs to the first paragraphpar1.addRun(r1);par1.addRun(r2);par1.addRun(r3);par1.addRun(r4);
// Add the first paragraph in the document’s contentdocx.appendParagraph(par1);
// Creation of a second paragraphparaBuilder.setBold(true);
Paragraph par2 = paraBuilder.newParagraph();
Run r21 = new Run("www.openxmldeveloper.org");r21.setFontSize(55);par2.addRun(r21);
// Append the second paragraph to contentdocx.appendParagraph(par2);
// Save the documentdocx.save(destFile);
Listing 8
The result :
The OpenXML format is partly based on the XML technology, so the conversion to HTML is quite simple, at least, for basic documents thanks to the XSLT technology !
In this example, we’ll use the straightforward document generated by the previous listing with the following XSLT stylesheet :
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/3/main">
<xsl:output method="html" />
<!-- Document root --><xsl:template match="/w:document"><xsl:apply-templates select="w:body" /></xsl:template>
<!-- Body and paragraphs --><xsl:template match="w:body"><html><body><xsl:for-each select="w:p"><p><xsl:apply-templates select="w:pPr" /><xsl:apply-templates select="w:r" /></p></xsl:for-each></body></html></xsl:template>
<!-- Paragraph properties --><xsl:template match="w:pPr"><xsl:attribute name="style"><xsl:apply-templates /></xsl:attribute></xsl:template>
<!-- Text alignment --><xsl:template match="w:jc">text-align:<xsl:value-of select="@w:val" /></xsl:template>
<!-- Run --><xsl:template match="w:r"><span><xsl:apply-templates select="w:rPr" /><xsl:value-of select="w:t" /></span></xsl:template>
<!-- Run properties --><xsl:template match="w:rPr"><xsl:attribute name="style"><xsl:apply-templates /></xsl:attribute></xsl:template>
<!-- Font size --><xsl:template match="w:sz">font-size:<xsl:value-of select="@w:val" />px;</xsl:template>
<!-- Vertical alignment --><xsl:template match="w:vertAlign"><xsl:variable name="jcVal" select="@w:val" /><xsl:if test="$jcVal = 'superscript'">font-size:33%;position:relative;bottom:0.5em;</xsl:if><xsl:if test="$jcVal = 'subscript'">font-size:33%;position:relative;bottom:-0.5em;</xsl:if></xsl:template>
<!-- Bold --><xsl:template match="w:b">font-weight:bold;</xsl:template>
<!-- Italic --><xsl:template match="w:i">font-style:italic;</xsl:template>
<!-- Underline --><xsl:template match="w:u">text-decoration:underline;</xsl:template>
</xsl:stylesheet>
We’ll use the class WordToHTMLTransformer and the associated method transform() to convert our OpenXML document to HTML :
WordDocument docx = new WordDocument(...);...WordToHTMLTransformer wt = new WordToHTMLTransformer();InputStream transformStream = wt.transform(docx);
Listing 9-1
The complete example :
final String APP_ROOT = System.getProperty("user.dir") + File.separator;ZipFile zipFile = null; // Le fichier source
try {zipFile = new ZipFile(APP_ROOT + "sample_out.docx");} catch (IOException e) {e.printStackTrace();}
// La destination du fichier de sortieFile destFile = new File(APP_ROOT + "output.html");
WordToHTMLTransformer wt = new WordToHTMLTransformer();try {InputStream transformStream = wt.transform(docx);BufferedWriter outStream = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(destFile)));
BufferedReader br = new BufferedReader(new InputStreamReader(transformStream));
String buff;while ((buff = br.readLine()) != null)outStream.write(buff);outStream.close();
br.close();} catch (Exception e) {e.printStackTrace();}
Listing 10
The HTML file generated by Listing 10 in Internet Explorer 7 :
Julien Chable, student at EFREI in France and Microsoft Student Partner writes articles about Java and .NET in several magazines and websites. He can be contacted via his website http://julien.chable.net or his blog http://blogs.developpeur.org/neodante/
Hi
Can any one help me to insert a word in a docx file.
Quite confusing and complicated, maybe I need to study more, thank you
_________________________________________________________________________________
lesang.vn/.../bao-da-ipad-mini.html