wordpress hit counter
Introducing TextReplacer: A New Class for PowerTools for Open XML - OpenXML Developer - Blog - OpenXML Developer

Introducing TextReplacer: A New Class for PowerTools for Open XML

Blog

Samples, Demos, and Reference Articles

Introducing TextReplacer: A New Class for PowerTools for Open XML

  • Comments 1

Recently I wrote some code that implemented search-and-replace for Open XML WordprocessingML documents.  I wrote that code for an Open XML developer who needed to implement that functionality using XML DOM, although with a different language than C#.  Because XML DOM is standardized, translating the code to another language and another implementation of XML DOM is relatively straightforward.

I want to introduce search-and-replace functionality in a CMDLET in PowerTools for Open XML, but I have been moving PowerTools code away from XmlDocument, so I rewrote the search-and-replace code using LINQ to XML, using a functional transform.  It was an interesting and fun project.  The video below introduces the TextReplacer class, and compares it to the code that I presented that uses XmlDocument.  It is an interesting comparison of imperative code (using XmlDocument) and functional code (using LINQ to XML).

It took me about 8 hours to write the search-and-replace code using XmlDocument.  It took me about 4 hourz to write the same code using LINQ to XML.  However, for what it’s worth, I spent far longer than that on both versions of the code making sure that whatever else happens, the code will not corrupt WordprocessingML documents.

Personally, I find the functional code to be much cleaner and easier to maintain.  It was also easier to debug.

If you want to see a detailed description of the algorithm that I used for search-and-replace, see the previous post on search-and-replace that uses XmlDocument.  The approach that I took with the LINQ to XML code is identical in nature to the code that uses XmlDocument.  This will be apparent in the video.

I am currently in process of doing some relatively major revamps to PowerTools for Open XML, so I haven’t posted this code on CodePlex yet.  You can retrieve the code as an attachment to this post.

-Eric

 

Attachment: TextReplacer.zip
  • Hi Eric.  First let me thank you for the great work.  PowerTools has saved us a lot of time when manipulating Word documents.  However, I encountered an issue where I was looping through a Dictionary of Codes and their replacement values where 5 out of the 6 Codes had Empty Strings ("") as the Values.  The first Code was processed fine in the the WmlSearchAndReplaceTransform() routine where it determines the contents of TextValue and checks if it is empty:

                           XElement paragraphWithConsolidatedRuns = new XElement(W.p,

                               groupedAdjacentRunsWithIdenticalFormatting.Select(g =>

                                   {

                                       if (g.Key == "DontConsolidate")

                                           return (object)g;

                                       string textValue = g.Select(r => r.Element(W.t).Value).StringConcatenate();

                                       XAttribute xs = null;

                                       if (textValue[0] == ' ' || textValue[textValue.Length - 1] == ' ')

                                           xs = new XAttribute(XNamespace.Xml + "space", "preserve");

                                       return new XElement(W.r,

                                           g.First().Elements(W.rPr),

                                           new XElement(W.t, xs, textValue));

                                   }));

    The second time through, the code set the TextValue to empty string and caused an Index Out of Bounds error when it performed the second check in the If statement since it is computes (0 - 1 = -1) for the Index value.

    I ended up changin the If statement to check for the length of textValue:  textValue.Length == 0

    What do you think about this?  I didn't really try to figure out what all the code above was doing, so this might not be the best approach.

Page 1 of 1 (1 items)