wordpress hit counter
Deleting Comments from Word document using Core Java - OpenXML Developer - Blog - OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

Deleting Comments from Word document using Core Java

Deleting Comments from Word document using Core Java

  • Comments 3

Article by Vineela Kavoori, Sonata Software Limited

 

This article explains about how to delete comments from the WordProcessingML document using Core Java

 

 To delete comments from WordProcessingML document, the steps to be followed are:

1.    Unzip the word document.

2.   Delete the part referring to “comments.xml” from the [Content_Types].xml

3.    Delete “comments.xml” file

4.    Delete relationship to “comments.xml” from corresponding relationship file

 (eg. documents.xml.rels)

5.   Delete the reference to comments from main part (eg. document.xml)

6.    Zip all the files and name the extension of file as “.docx”

 

Step 1:  Unzip the existing WordProcessingML document:

To unzip the existing word document

1. Get the document from the location specified. 

2. Using the "zip" package provided in the "util" package, unzip the word document 

The code snippet for this is as follows: 

public static void unZipFile(String zipFileName, String toExtractFile)

{               ..……      

     ZipFile zipFile = new ZipFile(sourceZipFile, ZipFile.OPEN_READ);

                    Enumeration enumeration = zipFile.entries();

                    while (enumeration.hasMoreElements())

                    {

                        ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();

                        String currName = zipEntry.getName();

                        File destFile = new File(destDirectory, currName);                       

                        File destinationParent = destFile.getParentFile();

                        destinationParent.mkdirs();                        

                        if( ! zipEntry.isDirectory())

                        {

                            BufferedInputStream is =

                                    new  BufferedInputStream(zipFile.getInputStream(zipEntry));

                            int currentByte;

                            FileOutputStream fos = new FileOutputStream(destFile);

                            BufferedOutputStream dest =  new BufferedOutputStream(fos);                         

                            while((currentByte = is.read()) != -1)

                            {

                                dest.write(currentByte);

                            }

            }

        }

     ……

}

Step 2: Modify [Content_Types].xml

To modify the [Content_Types].xml, the steps to be followed are:

1.    Get the document element of [Content_Types].xml

2.    Navigate to the “Override” element that is referring to “comments.xml” and delete it.

The code snippet for this is as follows: 

File xmlFile = new File(contenTypesXmlFile);

Document xmlDoc = docBuilder.parse (xmlFile);

 Element rootElement = xmlDoc.getDocumentElement();

………. 

……….

Node contentTypeNode = nodeList.item(lstCnt);

NamedNodeMap map = contentTypeNode.getAttributes();

Node docElement = map.getNamedItem("ContentType");        

if (docElement.toString().contains(contentCommnetsType))

                {

      …………………                   

                    rootElement.removeChild(contentTypeNode);

                    helpObj.outputXML(xmlDoc,xmlFile );

                }

……..

 

 

 

Step 3: Delete “comments.xml” file

To delete comments.xml file, the following steps are to be followed.

1.    Get the path of comments.xml file from [Content_Types].xml

2.    Delete the file from the path obtained in previous step

The code snippet for this is as follows: 

File xmlFile = new File(contenTypesXmlFile);

Document xmlDoc = docBuilder.parse (xmlFile);

 Element rootElement = xmlDoc.getDocumentElement();

………. 

……….

Node contentTypeNode = nodeList.item(lstCnt);

NamedNodeMap map = contentTypeNode.getAttributes();

Node docElement = map.getNamedItem("ContentType");        

if (docElement.toString().contains(contentCommnetsType))

                {

                     nameOfDocFile = thisElement.getAttribute("PartName");

                    commentsFileName = nameOfDocFile.substring(nameOfDocFile.lastIndexOf("/")+1);                  

                    commentsFile = helpObj.getFilePath(commentsFileName);

                        ……..

                        }

          ………

          File f = new File(commentsFile);

 f.delete();

 

Step 4: Remove relationship to “comments.xml” from corresponding relationship file

To remove relationship to the “comments.xml”, the following steps have to be followed

1.    Get the path of relationship file (document.xml.rels)

2.    Navigate to the relationship element that is referring to “comments.xml

3.    Delete the node obtained from the above step.

Code snippet for this is:

…………………

if(paraElement != null)

                        {

                            NodeList runList= paraElement.getElementsByTagName("w:r");

                            //here we get the list of run elements and remove the reference to comments

                            for(int runCnt=0; runCnt < runList.getLength();runCnt++)

                            {

                                Element runElement = (Element) runList.item(runCnt);

                                NodeList runChildNodes =

                             runElement.getElementsByTagName(commentReference);

                               

                                for(int commentRefCnt=0 ; commentRefCnt < runChildNodes.getLength();

 commentRefCnt ++)

                                {

                                     Node commentRef = runChildNodes.item(commentRefCnt);

                                    //commentReference node that is referring to comments is deleted

                                    runElement.removeChild(commentRef);

                                }

                            }

                        }            

                   ……………..

                   …………….

Step 5:  Remove the reference to comments from main part (eg. document.xml)

To remove reference to comments from main part

1.       Iterate through the “run “elements

2.       Remove the elements that refer to comments.

Code snippet for this is:

 

for(int runCnt=0; runCnt < runList.getLength();runCnt++) {

Element runElement = (Element) runList.item(runCnt);

 NodeList runChildNodes = runElement.getElementsByTagName(commentReference);

                               

 for(int commentRefCnt=0 ; commentRefCnt < runChildNodes.getLength(); commentRefCnt ++) {

                                   

                                    Node commentRef = runChildNodes.item(commentRefCnt);

                                    //commentReference node that is referring to comments is deleted

                                    runElement.removeChild(commentRef);

 }

  }

Step 6: Zip all the files and name the extension of file as “.docx”

To zip back all the files, the steps to be followed are:

1.    Get the list of path of all the files that have to be zipped.

2.    Get the list of relative path of all the files to be zipped.

3.    Using the "zip" package provided in the "util" package, zip the files into a word document 

The code snippet for this is:

            …….

            FileOutputStream outStream = new FileOutputStream(zipFileName);

            ZipOutputStream zipOutStream = new ZipOutputStream (outStream);

            zipOutStream.setLevel(Deflater.BEST_COMPRESSION);

            ……

             while(itr.hasNext())

             {

                             String path = (String) itr.next();

                             inStream = new FileInputStream(path);

                             String relPath = (String) relItr.next();

                             zipOutStream.putNextEntry(new ZipEntry(relPath));

                              int i=0;

                             while ((i=inStream.read()) != -1 )

                            {

                                    zipOutStream.write(i);

                            }

            }

            …. 

This is a simple demo which demonstrates use of Java for removing comments from word document. The example demo is attached with the article as a zip file. 

 

PS:  Once the program runs for a particular document, output file (commentsremoved.docx) and one folder (removecomments) will be created at the output location specified. These files have to be deleted before we run the code for the next input document. You can change this logic based on your requirement.

 

 

Attachment: DeleteCommentsUsingJava.zip
Page 1 of 1 (3 items)