wordpress hit counter
Open XML Document Image Conversion - OpenXML Developer - Blog - OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

Open XML Document Image Conversion

Open XML Document Image Conversion

  • Comments 6

Article author: Sanjay Kumar Madhva, Sonata Software Limited

In the previous articles “WordProcessingML document creation in pure Java” and “Document management utility written in Java” we demonstrate that is possible to create a word document by creating a few XML files and zipping them together, and show how to play around with the packaging of any Open XML document (Be it Word, Excel or PowerPoint) using Java code.

In this article we will use the class OpenXMLDocumentFile described in “Document management utility written in Java” to extract document parts and edit the content and replace the old document part with the new document part.

We envision that there may be documents that may contain Bitmap image embedded in them. Bitmaps images files are comparatively larger then JPEG files or the PNG file. If we replace the BMP file with the JPEG file or PNG file we will be able to reduce the size if the document.

To achieve the task, we take you through the steps required to remove the Bitmap out of the document convert it into JPEG or PNG format and replace the original content with the converted content and change the reference of the old document part with the newly introduced document part.

This article assumes that you know how to parse and edit the XML file. This article can also be used to replace one image with another.

Graphical Representation

Click on the image to the right to view a diagram that shows the steps involved in replacing the Bitmaps present in the document by a converted image file of type PNG.

Pseudo Code

It should be fairly easy to replace the BMP image file present in the document with a PNG image file or any other image file type. All you need to do is to follow a few simple steps as mentioned below.

Step 1. Extracting the [Content_Types].xml.

Use the provided packaging class (OpenXMLDocumentFile) method ExtractFile into a temp work area.

// Create an instance of OpenXMLDocumentFile
OpenXMLDocumentFile myDoc = new OpenXMLDocumentFile();

// Extract the file [Content_Types].xml into a temp area.
MyDoc.ExtractFile(“test.docx”, “[Content_Types].xml”, “temp”)

Where test.docx is the OpenXML document.

Step 2. Check if the [Content_Types].xml contains Bitmap.

Open the extracted XML file “[Content_Types].xml” and check if the “Default” node’s “Extension” attribute’s value contains “bmp”. The node is displayed below.

<Default Extension="bmp" ContentType="image/bitmap"/>

(This indicates that bit map is present in the document. If extension of “BMP” is not present then display a message that the document does not contain BITMAP and exit out, else go to step 3)

Step 3. Modify the [Content_Type].xml

Replace bmp extension and replace it with png.

From:

<Default Extension="bmp" ContentType="image/bitmap" />

To:

<Default Extension="png" ContentType="image/png" />

Step 4. Delete the [Content_Type].xml from test.docx

// Delete the file [Content_Types].xml from test.docx.
MyDoc.DeleteFiles(“test.docx”, “[Content_Types].xml”)

Step 5. Add new modified [Content_Type].xml back into test.docx

// add the modified file [Content_Types].xml into “test.docx”.
MyDoc.Add(“test.docx”, “[Content_Types].xml”)

Step 6. Extract “document.xml.rels”

// Extract the file “document.xml.rels into a temp area.
MyDoc.ExtractFile(“c:\MyWorkArea\test.docx”, “document.xml.rels”, “temp”)

Step 7. For every “Relationship” node of type “.bmp”.

Find every “Relationship” node whose target attribute ends with “.bmp”.

Step 7a. Get the name of the BMP from “document.xml.rels”.
Extract then file name stored in the “Target” attribute of the “Relationship” node, which will give Bitmap file that needs conversion.

<Relationship Id=”rId1” Type=”http://schemas.microsoft.com/office/2006/relationships/image” Target=”image1.bmp”/>

Step 7b. Extract the BMP from test.docx
(Say image1.bmp).

// Extract the file “image1.bmp” into a temp area.
MyDoc.ExtractFile(“test.docx”, “image1.bmp”, “temp”)

Step 7c. Convert BMP to PNG.
Convert the BMP (image1.bmp) into PNG (image1.png) format using Java Advanced Imaging API.

Step 7d. Delete the BMP
Delete the BMP (image1.bmp) part of the document test.docx

// Delete the file image1.bmp from test.docx
MyDoc.DeleteFiles(“c:\MyWorkArea\test.docx”, “image1.bmp”)

Step 7f. Add the new converted PNG image file
Add the new converted PNG image file (image1.png) back into “test.docx”

// add the converted image1.png into test.docx
MyDoc.Add(“c:\MyWorkArea\test.docx”, “image1.png”)

Step 7g. Modify the “Relationship” node of “document.xml.rels”

From:

<Relationship Id="rId4" Type=http://schemas.microsoft.com/office/2006/relationships/image Target="media/image1.bmp" />

To:

<Relationship Id="rId4" Type="http://schemas.microsoft.com/office/2006/relationships/image" Target="media/image1.png" />

Step 8. Delete the “document.xml.rels” from test.docx

// Delete the file document.xml.rels.
MyDoc.DeleteFiles(“c:\MyWorkArea\test.docx”, “document.xml.rels”)

Step 9. Add new modified “document.xml.rels” into test.docx

// Add the modified file “document.xml.rels” into test.docx.
MyDoc.Add(“c:\MyWorkArea\test.docx”, “document.xml.rels”)

Before/After Comparisons

test.docx before the change:

test.docx after the change:

[Content_Types].xml before the change:

[Content_Types].xml after the change:

document.xml.rels before the change:

document.xml.rels after the change:

Concluding Notes

I started the article with the assumption that the document containing the bitmap would be the biggest but for my surprise I found out that bitmap when zipped got a very high degree of compression there by making the size of the document the smallest.

An Image1.bmp of size 724 KB when converted into Image.jpg became 45KB and Image.PNG became 20KB. But when they were zipped testBMP.doc became the smallest file. Obviously, this is not what we expected!

This article was to demonstrate the mechanism of replacing one part with another part.

Page 1 of 1 (6 items)