Article author: Sanjay Kumar Madhva, Sonata Software Limited
In the previous articles “WordProcessingML document creation in pure Java” and “Document management utility written in Java” we demonstrate that is possible to create a word document by creating a few XML files and zipping them together, and show how to play around with the packaging of any Open XML document (Be it Word, Excel or PowerPoint) using Java code.
In this article we will use the class OpenXMLDocumentFile described in “Document management utility written in Java” to extract document parts and edit the content and replace the old document part with the new document part.
We envision that there may be documents that may contain Bitmap image embedded in them. Bitmaps images files are comparatively larger then JPEG files or the PNG file. If we replace the BMP file with the JPEG file or PNG file we will be able to reduce the size if the document.
To achieve the task, we take you through the steps required to remove the Bitmap out of the document convert it into JPEG or PNG format and replace the original content with the converted content and change the reference of the old document part with the newly introduced document part.
This article assumes that you know how to parse and edit the XML file. This article can also be used to replace one image with another.
Click on the image to the right to view a diagram that shows the steps involved in replacing the Bitmaps present in the document by a converted image file of type PNG.
It should be fairly easy to replace the BMP image file present in the document with a PNG image file or any other image file type. All you need to do is to follow a few simple steps as mentioned below.
Use the provided packaging class (OpenXMLDocumentFile) method ExtractFile into a temp work area.
Where test.docx is the OpenXML document.
Open the extracted XML file “[Content_Types].xml” and check if the “Default” node’s “Extension” attribute’s value contains “bmp”. The node is displayed below.
(This indicates that bit map is present in the document. If extension of “BMP” is not present then display a message that the document does not contain BITMAP and exit out, else go to step 3)
Replace bmp extension and replace it with png.
From:
To:
Find every “Relationship” node whose target attribute ends with “.bmp”.
Step 7a. Get the name of the BMP from “document.xml.rels”.Extract then file name stored in the “Target” attribute of the “Relationship” node, which will give Bitmap file that needs conversion.
Step 7b. Extract the BMP from test.docx(Say image1.bmp).
Step 7c. Convert BMP to PNG.Convert the BMP (image1.bmp) into PNG (image1.png) format using Java Advanced Imaging API.
Step 7d. Delete the BMPDelete the BMP (image1.bmp) part of the document test.docx
Step 7f. Add the new converted PNG image fileAdd the new converted PNG image file (image1.png) back into “test.docx”
Step 7g. Modify the “Relationship” node of “document.xml.rels”
test.docx before the change:test.docx after the change:
[Content_Types].xml before the change:[Content_Types].xml after the change:
document.xml.rels before the change:document.xml.rels after the change:
I started the article with the assumption that the document containing the bitmap would be the biggest but for my surprise I found out that bitmap when zipped got a very high degree of compression there by making the size of the document the smallest.
An Image1.bmp of size 724 KB when converted into Image.jpg became 45KB and Image.PNG became 20KB. But when they were zipped testBMP.doc became the smallest file. Obviously, this is not what we expected!
This article was to demonstrate the mechanism of replacing one part with another part.