In my life as a developer I’ve implemented many different flavours of Mail Merge style applications which can be run on a server without having to install any word processing clients. This has ranged from simple email auto responders, using customizable email templates, to complex reports based on customizable templates. In many cases I’ve had to reach for 3rd party applications like Aspose Words for.NET to get the job done within the budget and time frames allowed. With the release of the Open XML SDK 2.0, I wanted to see how easy it is to build my own mail merge application. So here we have Another Mail Merge Client (AMMC).
Overview
In Word users can create their mail merge templates that look like this.
Dated: «Date» Hi «Recipient», This is a test to see if my «Adjectives» mail merge worked. «Spiel» Johannes
Dated: «Date»
Hi «Recipient»,
This is a test to see if my «Adjectives» mail merge worked.
«Spiel»
Johannes
This same template file can also be used to generate documents in an application using data from the application. By using Open XML to do this you do not require Word to be installed, which is a bad idea anyway.
The code here so far does not support multi-valued data, but you should be able to generate even complex word documents as long as there is just one row of data for each document you generate.
To generate complex reports take a look at the XSLT approach taken here.
The completed Visual Studio 2008 solution (inside /AMMC-src) is included in this code sample, along with a pre-compiled sample binary (inside /AMMC-bin).
The AMMC is built using two projects with the following files.
The DocumentFormat.OpenXml.dll assembly is also included with the binary version of the application. AMMC is a command line application which will print the help documentation whenever invalid input is given.
Understanding Mail Merge in Open XML
I started creating a simple Mail Merge document in Microsoft Word 2007 to use as my template. As my data source I put together a simple comma-separated values (CSV) file using Microsoft Excel 2007. To get a basic understanding of how a Mail Merge document looks in Open XML, I opened up my template document using The Document Reflector, one of several usefull tools found in the Open XML SDK 2.0, which can be found in the Open XML Format SDK\V2.0\tools\ directory.
Scanning through the document I quicly find the 3 key parts I have to work with.
There are three main parts to the MailMerger and they are all inside the only method public void MailMerge. At this stage the MailMerger only has the one key method but as it’s in its own class library this can be expanded on later.
The processing works by reading in the merge template document, and then replacing values the mail merge field codes with the actual data then saving as the new file.
To start off we need to collect all the elements which describe the mail merge fields. These are identified by attribute called "instr" which contains a string value prefixed with "MERGEFIELD". For this I used LINQ to XML.
// Get all Mail Merge Fields
IList<XElement> mailMergeFields =
(from el in newBody.Descendants()
where el.Attribute(XMLNS + "instr") != null
select el).ToList();
// Replace all merge fields with Data
foreach (XElement field in mailMergeFields)
{
string fieldName = field.Attribute(XMLNS + "instr").Value.Replace("MERGEFIELD", string.Empty).Trim();
if (row.Table.Columns.Contains(fieldName))
XElement newElement = field.Descendants(XMLNS + "r").First();
newElement.Descendants(XMLNS + "t").First().Value = row.Field<string>(fieldName);
field.ReplaceWith(newElement);
}
wordDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
wordDocument.MainDocumentPart.Document.Save();
Next we need to remove the recipientData.xml file out of the package.
// Delete MailMerge Data Source Part
DocumentSettingsPart settingsPart = wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();
MailMergeRecipientDataPart mmrPart = settingsPart.GetPartsOfType<MailMergeRecipientDataPart>().First();
settingsPart.DeletePart(mmrPart);
Lastly we remove the element that describes the data connection to the external data file. If this is not removed the document will prompt to connect to the original data source which may no longer be there especially if opened on another machine or from another location. This element is called: "mailMerge" and I use LINQ to XML to find it.
// Delete refrence to Mail Merge Data sources
XElement settings = XElement.Parse(settingsPart.RootElement.OuterXml);
IList<XElement> mailMergeElements =
(from el in settings.Descendants()
where el.Name == (XMLNS + "mailMerge")
foreach (XElement field in mailMergeElements)
field.Remove();
settingsPart.RootElement.InnerXml = settings.ToString();
settingsPart.RootElement.Save();
Using the Mail Merger
To test and consume my Open XML Mail Merge utility I built the AMMC console application as a test harness. The Main method calls out to several methods to process and validate the arguments, load the data source and handle all the errors. There are only 2 lines which use the MailMerger class library. One to new up a new MailMerger object and one to do the mail merge for every row in the data source.
// Create the MailMerger
MailMerger merger = new MailMerger(Template.FullName);
// Do MailMerge for earch record
foreach (DataRow row in dataSet.Tables[Data.Name].Rows)
// Get unique target file name
string uniqueFileName = CreateUniqueFileName(Target.FullName + "\\MMDoc.docx");
// Merge Data and Save File
merger.MailMerge(row, uniqueFileName);
Possible Improvements
There are many ways this application can be improved but here are a couple that I have thought of while building this application.
In this code sample I set out to use the new Open XML SDK 2.0 to create an application for generating Word documents from a Mail Merge template using my own dataset. I made ready use of the Document Reflector to inspect the template. I made use of Open XML SDK 2.0’s support for LINQ to XML. Overall, the supplied libraries and tools of the new Open XML SDK make the matter of building applications for document content generation much, much easier.