There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.
Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.
Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.
Please see this blog post for more information about my plans moving forward. Cheers, Eric
BY DARCY THOMAS
PowerTools, at http://www.codeplex.com/PowerTools makes server-side document assembly of Open XML documents very easy in a PowerShell script. It does this by providing a rich set of PowerShell cmdlets. Interestingly, the smarts developed in PowerTools are also applicable to an integrated .NET business application - typically written instead in c# or VB.NET. It is simple to tap into this functionality because the PowerTools download includes source code, and within the source code, the implementation of the cmdlets already sits over a clean set of reusable classes.
This article demonstrates how to tap into this goodness. To prove the approach, an earlier sample written in PowerShell using PowerTools at http://openxmldeveloper.org/archive/2009/04/06/4418.aspx is redeveloped in c# as a windows application, and could also equally have been a web application.
My approach includes a wrapper around the PowerTools base classes. The wrapper provides .NET methods that match the PowerShell cmdlets. My wrapper is not complete in that it mostly has methods matching the cmdlets I wanted to use - but you will see that very little code is required to extend the wrapper to access any functionality in PowerTools. For example, if you wanted to use the logic provided by the cmdlet for setting the footer Set-OpenXmlContentFormat you can add that yourself with just a few lines of glue code.
The sample mentioned above uses a few methods which grabs parts from several documents and merges them in to one document. It builds a report document dynamically based on some current environment properties (processor and disk load).
First off you will want to download PowerTools
To understand PowerTools I recommend watching this video:
Then have a look at Generating a document using PowerTools for Open XML and PowerShell (Lawrence Hodson’s article). You may want to download his example to compare with my implementation.
You will also need to install the Open XML SDK.
PowerShell is very elegant in the way it deals with parameters. The first half of most of the PowerTool cmdlets is the code to handle those parameters. But we will be bypassing most of that since we can define what our method’s input parameters are.
Most of the time you will just need to find the cmdlet .cs file you want to adapt. Scroll down to the ProcessRecord() method in that cmdlet. Grab that part of that code and, and work out what the variables are. Put that into a method and hook up the parameters.
In Visual Studio, to go to a method, variable or object ‘s definition, you can highlight it, and hit the F12 key.
To start with I want to get two documents and merge them together. Using PowerTools you would do something like this:
$docBeginningPath = # path to the first file
$docEndingPath = # path to the second file
$docOutputPath = # path of where to save the output file
$docBeginning = New-Object -TypeName OpenXml.PowerTools.DocumentSource -ArgumentList $docBeginningPath
$docEnding = New-Object -TypeName OpenXml.PowerTools.DocumentSource -ArgumentList $docEndingPath
$docs = @()
$docs += $docBeginning
$docs += $docEnding
Merge-OpenXmlDocument -OutputPath $docOutputPath -Sources $docs
That script creates two Source objects from the file paths specified. Since no other parameters (the start paragraph and paragraph count) were set, it grabs all the paragraphs from beginning to end. It then puts them into a list. Invoking Merge-OpenXmldocument merges the items in that list and saves the newly created file to disk.
To replicate the PowerShell example above I need a method that would merge a collection of Source objects. A Source object contains a document and details of what parts that will be used when merged.
To find the method I want to adapt, I searched through the solution for OpenXmlDocument. I found the relevant code in the MergeOpenXmlDocumentCmdlet.cs file.
Then it was a case of adapting the code in the ProcessRecord() method.
This is the result:
/// Takes a list of Source (datatype), merges into a document and saves to disk
/// <param name="sources">List<Source> to merge</param>
/// <param name="outputPath">Path to save the file to</param>
public void MergeAndSaveDocument(List<Source> sources, string outputPath)
//This code replicates: Merge-OpenXmlDocument -OutputPath $docOutputPath -Sources $docs
//Extracted from MergeOpenXmlDocumentCmdlet.cs
WordprocessingDocument result = DocumentBuilder.BuildOpenDocument(sources, outputPath);
//You need to close each source since it has opened a file stream to the doument it points to
foreach (Source item in sources)
For simplicity I haven’t included the exception catching.
A gotcha: a Source object needs to be closed once it has been used, otherwise it will leave a file stream open, which will cause problems if you try and access that file again later.
In the sample code I created a method named MergeDocument(), which will merge a List<Source> and returns a WordprocessingDocument. This is to give a bit more flexibility so that you can perform more than one operation before saving to disk.
I then needed a way of creating Source objects (to put into a List<>), that I will be able pass to my MergeAndSaveDocument() method.
Creating a new Source object requires a WordprocessingDocument and a bool of whether to keep section breaks when merging, or not (in this example I have always used a value of false, to ignore section breaks when merging). It then has some optional parameters of starting paragraph and number of paragraphs. The method SourceCreator() Creates a WordprocessingDocument from a file path, and works out which overloaded Source object to create. You can specify a value of -1 for the start and count parameters. If you specify a value of -1 for the start parameter, it will grab the whole document. If you specify a value of -1 for the count parameter, it will grab the following paragraphs to the end of the document.
/// Takes a document (from disk), the starting position of the paragraph(s), number of paragraphs, and returns that as a Source object
public Source SourceCreator(string path, int start, int count, bool keepSection)
DocumentSource source = new DocumentSource(path, start, count);
WordprocessingDocument doc = WordprocessingDocument.Open(source.SourceFile, false);
if (doc != null)
if (count != -1)
source = new Source(doc, start - 1, count, keepSection);
else if (start != -1)
source = new Source(doc, start - 1, keepSection);
source = new Source(doc, keepSection);
else return null;
In the sample code I have refactored this method into two – One takes a filename, converts it to a WordprocessingDocument object, and passes it to a second that takes a WordprocessingDocument object and returns a Source object. This gives greater flexibility.
I created a few other useful methods that are[EW1] adaptations of the previously described methods:
RemoveParagraphs() can be used to remove a number of paragraphs from a document. The way it works is that it grabs two sections from either side of the paragraphs that you want to remove and then merges those two Sources.
saveDocument () which will save a WordprocessingDocument to disk.
Then a couple of other useful methods:
EditCustomXml() This method finds a specified element in the customXml of a document and updates its content. I basically bundled up what would be performed by a PowerTools script like:
$customXml = Get-OpenXmlCustomXmlData -Path $docOutputPath -Part item.xml
$customXml.Root.Element("ElementName").Value = 'Element content to add'
Set-OpenXmlCustomXmlData -Path $docOutputPath -Part $customXml -PartName item2.xml –SuppressBackups
Here is my method
/// Allows you to edit the customXml of a document
/// <param name="filePath">Filepath to the file you want to edit</param>
/// <param name="customXmlName">Name of the customXml part</param>
/// <param name="elementName">CustomXml element to edit</param>
/// <param name="value">Value to set as the specified elements content</param>
public void EditCustomXml(string filePath, string customXmlName, string elementName, string value)
//The next 2 lines of code replicates: Get-OpenXmlCustomXmlData -Path $docOutputPath -Part item.xml
//Extracted from GetOpenXmlCustomXmlDataCmdlet.cs
OpenXmlDocument document = OpenXmlDocument.FromFile(filePath, FileAccess.ReadWrite);
XDocument customData = document.FindCustomXml(customXmlName);
customData.Root.Element(elementName).Value = value;// Change the value of the element
//The next 3 lines of code replicates: Set-OpenXmlCustomXmlData -Path $docOutputPath -Part $customXml -PartName item.xml -SuppressBackups
//Extracted from SetOpenXmlCustomXmlDataCmdlet.cs
To create this method I just searched through the solution for “OpenXmlCustomXmlData” (Like before “Get”, is a Powershell keyword, which has been extended in PowerTools) This took me to GetOpenXmlCustomXmlDataCmdlet.cs and SetOpenXmlCustomXmlDataCmdlet.cs. Then it was just a simple matter of grabbing the lines of code I needed;
getMatchingParts() and getSnippet().
These can be used together, to search through a document, and extract a number of paragraphs between a pair of headings.
So I hope that I have shown you that it’s not too hard to be able to write some methods for your business application that can be used instead of the PowerTools cmdlets. You can take advantage of the PowerTools functionality from C#.
Have a look through the sample zip file below, and have a go yourself. Enjoy!