Recently I wrote some code that implemented search-and-replace
for Open XML WordprocessingML documents.
I wrote that code for an Open XML developer who needed to implement that
functionality using XML DOM, although with a different language than C#. Because XML DOM is standardized,
translating the code to another language and another implementation of XML DOM
is relatively straightforward.
I want to introduce search-and-replace functionality in a
CMDLET in PowerTools for Open XML, but I have been moving PowerTools code away
from XmlDocument, so I rewrote the search-and-replace code using LINQ to XML,
using a functional transform. It was an
interesting and fun project. The video
below introduces the TextReplacer class, and compares it to the code that I
presented that uses XmlDocument. It is
an interesting comparison of imperative code (using XmlDocument) and functional
code (using LINQ to XML).
It took me about 8 hours to write the search-and-replace
code using XmlDocument. It took me about
4 hourz to write the same code using LINQ to XML. However, for what it’s worth, I spent far
longer than that on both versions of the code making sure that whatever else
happens, the code will not corrupt WordprocessingML documents.
Personally, I find the functional code to be much cleaner
and easier to maintain. It was also
easier to debug.
If you want to see a detailed description of the algorithm
that I used for search-and-replace, see the previous
post on search-and-replace that uses XmlDocument. The approach that I took with the LINQ to XML
code is identical in nature to the code that uses XmlDocument. This will be apparent in the video.
I am currently in process of doing some relatively major
revamps to PowerTools for Open XML, so I haven’t posted this code on CodePlex
yet. You can retrieve the code as an
attachment to this post.
Hi Eric. First let me thank you for the great work. PowerTools has saved us a lot of time when manipulating Word documents. However, I encountered an issue where I was looping through a Dictionary of Codes and their replacement values where 5 out of the 6 Codes had Empty Strings ("") as the Values. The first Code was processed fine in the the WmlSearchAndReplaceTransform() routine where it determines the contents of TextValue and checks if it is empty:
XElement paragraphWithConsolidatedRuns = new XElement(W.p,
if (g.Key == "DontConsolidate")
string textValue = g.Select(r => r.Element(W.t).Value).StringConcatenate();
XAttribute xs = null;
if (textValue == ' ' || textValue[textValue.Length - 1] == ' ')
xs = new XAttribute(XNamespace.Xml + "space", "preserve");
return new XElement(W.r,
new XElement(W.t, xs, textValue));
The second time through, the code set the TextValue to empty string and caused an Index Out of Bounds error when it performed the second check in the If statement since it is computes (0 - 1 = -1) for the Index value.
I ended up changin the If statement to check for the length of textValue: textValue.Length == 0
What do you think about this? I didn't really try to figure out what all the code above was doing, so this might not be the best approach.