wordpress hit counter
Mail Merging with a Custom Client Using the Open XML SDK 2.0 - OpenXML Developer - Blog - OpenXML Developer

Mail Merging with a Custom Client Using the Open XML SDK 2.0

Blog

Samples, Demos, and Reference Articles

Mail Merging with a Custom Client Using the Open XML SDK 2.0

Rate This
  • Comments 4

By Johannes Prinz

Introduction

In my life as a developer I’ve implemented many different flavours of Mail Merge style applications which can be run on a server without having to install any word processing clients. This has ranged from simple email auto responders, using customizable email templates, to complex reports based on customizable templates. In many cases I’ve had to reach for 3rd party applications like Aspose Words for.NET to get the job done within the budget and time frames allowed. With the release of the Open XML SDK 2.0, I wanted to see how easy it is to build my own mail merge application. So here we have Another Mail Merge Client (AMMC).


Overview

In Word users can create their mail merge templates that look like this.

Dated: «Date»

Hi «Recipient», 

This is a test to see if my «Adjectives» mail merge worked.

«Spiel»

Johannes

This same template file can also be used to generate documents in an application using data from the application. By using Open XML to do this you do not require Word to be installed, which is a bad idea anyway.

The code here so far does not support multi-valued data, but you should be able to generate even complex word documents as long as there is just one row of data for each document you generate.

To generate complex reports take a look at the XSLT approach taken here.

 

Code Sample Structure

The completed Visual Studio 2008 solution (inside /AMMC-src) is included in this code sample, along with a pre-compiled sample binary (inside /AMMC-bin).

The AMMC is built using two projects with the following files.

  • MailMerge (a reusable class library you call from your application).
    >
    MailMerger.cs (where all the Open XML action is)
  • AMMC (Another Mail Merge Client application, which is a simple test harness to demonstrate calling the MailMerger class.)
    >
    TestMaterial (Folder containing some test assets).  
          -
    MergeTemplate.docx (Mailmerge template created in MS Word 2007)
          -
    TestData.csv (Test data created in MS Excel 2007). 
    >
    Program.cs (YAMMC programm using the MailMerge class library)

The DocumentFormat.OpenXml.dll assembly is also included with the binary version of the application. AMMC is a command line application which will print the help documentation whenever invalid input is given.


Understanding Mail Merge in Open XML

I started creating a simple Mail Merge document in Microsoft Word 2007 to use as my template. As my data source I put together a simple comma-separated values (CSV) file using Microsoft Excel 2007. To get a basic understanding of how a Mail Merge document looks in Open XML, I opened up my template document using The Document Reflector, one of several usefull tools found in the Open XML SDK 2.0, which can be found in the Open XML Format SDK\V2.0\tools\ directory.

 

Scanning through the document I quicly find the 3 key parts I have to work with.

  1. A section in the settings.xml file with information on connecting to the datasource file. This will have to be removed in my generated documents.
  2. A simple field with a MERGEFIELD attribute indicating the field to insert for every mail merge field I used in the template document. This will be where I’ll insert the data.
  3. Hidden near the top is a recipientData.xml file. This field needs to be removed or the document will open with errors.


The Mail Merger Explained

There are three main parts to the MailMerger and they are all inside the only method public void MailMerge. At this stage the MailMerger only has the one key method but as it’s in its own class library this can be expanded on later.

The processing works by reading in the merge template document, and then replacing values the mail merge field codes with the actual data then saving as the new file.

To start off we need to collect all the elements which describe the mail merge fields. These are identified by attribute called "instr" which contains a string value prefixed with "MERGEFIELD". For this I used LINQ to XML.

// Get all Mail Merge Fields

IList<XElement> mailMergeFields =

    (from el in newBody.Descendants()

     where el.Attribute(XMLNS + "instr") != null

     select el).ToList();

Once we have a collection of these we can iterate through them and replace the field with the data. It is important to remember to save the changes at the end.

// Replace all merge fields with Data

foreach (XElement field in mailMergeFields)

{

    string fieldName = field.Attribute(XMLNS + "instr").Value.Replace("MERGEFIELD", string.Empty).Trim();

    if (row.Table.Columns.Contains(fieldName))

    {

        XElement newElement = field.Descendants(XMLNS + "r").First();

        newElement.Descendants(XMLNS + "t").First().Value = row.Field<string>(fieldName);

        field.ReplaceWith(newElement);

    }

}

wordDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());

wordDocument.MainDocumentPart.Document.Save();

Next we need to remove the recipientData.xml file out of the package.

// Delete MailMerge Data Source Part

DocumentSettingsPart settingsPart = wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();

MailMergeRecipientDataPart mmrPart = settingsPart.GetPartsOfType<MailMergeRecipientDataPart>().First();

settingsPart.DeletePart(mmrPart);

Lastly we remove the element that describes the data connection to the external data file. If this is not removed the document will prompt to connect to the original data source which may no longer be there especially if opened on another machine or from another location. This element is called: "mailMerge" and I use LINQ to XML to find it.

// Delete refrence to Mail Merge Data sources

XElement settings = XElement.Parse(settingsPart.RootElement.OuterXml);

IList<XElement> mailMergeElements =

    (from el in settings.Descendants()

     where el.Name == (XMLNS + "mailMerge")

     select el).ToList();

foreach (XElement field in mailMergeElements)

{

    field.Remove();

}

settingsPart.RootElement.InnerXml = settings.ToString();

settingsPart.RootElement.Save();


Using the Mail Merger

To test and consume my Open XML Mail Merge utility I built the AMMC console application as a test harness. The Main method calls out to several methods to process and validate the arguments, load the data source and handle all the errors. There are only 2 lines which use the MailMerger class library. One to new up a new MailMerger object and one to do the mail merge for every row in the data source.

// Create the MailMerger

MailMerger merger = new MailMerger(Template.FullName);

// Do MailMerge for earch record

foreach (DataRow row in dataSet.Tables[Data.Name].Rows)

{

    // Get unique target file name

    string uniqueFileName = CreateUniqueFileName(Target.FullName + "\\MMDoc.docx");

    // Merge Data and Save File

    merger.MailMerge(row, uniqueFileName);

}


Possible Improvements

There are many ways this application can be improved but here are a couple that I have thought of while building this application.

  1. Manage the memory stream for the template a little better. There is no need to reload the template every time a row is merged.
  2. Provide a couple of overload methods for the merge method to return documents in different formats like byte arrays of WordDocument objects rather than just saving to file.


Conclusion

In this code sample I set out to use the new Open XML SDK 2.0 to create an application for generating Word documents from a Mail Merge template using my own dataset. I made ready use of the Document Reflector to inspect the template. I made use of Open XML SDK 2.0’s support for LINQ to XML. Overall, the supplied libraries and tools of the new Open XML SDK make the matter of building applications for document content generation much, much easier.

Attachment: AMMC.ZIP
  • Or to handle all new formats and parts of a word document, try

    using System;
    using System.IO;
    using System.Data;
    using System.Linq;
    using System.Text;
    using System.Xml.Linq;
    using System.Collections.Generic;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;

    namespace MailMerge
    {
       public class MailMerger
       {
           public FileInfo Template { get; private set; }
           public XNamespace XMLNS { get { return XNamespace.Get(WORDMLNS); } }

           public const string WORDMLNS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

           enum MergeStatus { None, HaveOld, HaveNew, HaveBegin, HaveName, HaveSeperator, HaveData };

           public MailMerger(string templatePath)
           {
               if (string.IsNullOrEmpty(templatePath)) throw new ArgumentNullException("The Template cannot be null.");

               FileInfo template = new FileInfo(templatePath);
               if (!template.Exists) throw new ArgumentNullException("The Template File does not exists.");
               if (template.Extension != ".docx") throw new ArgumentException("The Template must be of type docx.");
               Template = template;
           }

           private string GetFieldData(string text)
           {
               string newText = text;
               if (newText.Length > 0 && newText.Substring(0, 1) == "\"")
               {
                   newText = newText.Substring(1);
                   if (newText.Length > 1 && newText.Substring(newText.Length - 1, 1) == "\"")
                   {
                       newText = newText.Substring(0, newText.Length - 1);
                   }
               }

               return newText;
           }

           public XElement MergeReplace(XElement newBody, DataRow row)
           {
               // Get all Mail Merge Fields
               IList<XElement> mailMergeFields =
                   (from el in newBody.Descendants()
                    where (el.Name == (XMLNS + "r") || el.Attribute(XMLNS + "instr") != null)
                    select el).ToList();

               MergeStatus status = MergeStatus.None;
               XElement status0 = null, status1 = null, status2 = null, status3 = null, status4 = null;
               string fieldName = "";
               // Replace all merge fields with Data
               foreach (XElement field in mailMergeFields)
               {
                   if ((status == MergeStatus.None) &&
                       (field.Element(XMLNS + "fldChar") != null) &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType") != null) &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType").Value == "begin"))
                   {
                       status = MergeStatus.HaveBegin;
                       status0 = field;
                   }
                   else if ((status == MergeStatus.None) &&
                       (field.Attribute(XMLNS + "instr") != null))
                   {
                       status = MergeStatus.HaveOld;
                       status0 = field;
                       fieldName = GetFieldData(field.Attribute(XMLNS + "instr").Value.Replace("MERGEFIELD", string.Empty).Trim());
                   }
                   else if ((status == MergeStatus.HaveBegin) &&
                       (field.Element(XMLNS + "instrText") != null) &&
                       (field.Element(XMLNS + "instrText").Value != null) &&
                       (field.Element(XMLNS + "instrText").Value.Trim().Length > 10) &&
                       (field.Element(XMLNS + "instrText").Value.Trim().Substring(0, 10) == "MERGEFIELD"))
                   {
                       status = MergeStatus.HaveName;
                       status1 = field;
                       fieldName = GetFieldData(field.Element(XMLNS + "instrText").Value.Replace("MERGEFIELD", string.Empty).Trim());
                   }
                   else if ((status == MergeStatus.HaveName) &&
                       (field.Element(XMLNS + "fldChar") != null) &&
                       (field.Element(XMLNS + "fldChar").Value != null) &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType").Value != "") &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType").Value.Trim() == "separate"))
                   {
                       status = MergeStatus.HaveSeperator;
                       status2 = field;
                   }
                   else if ((status == MergeStatus.HaveSeperator) &&
                       (field.Element(XMLNS + "rPr") != null) &&
                       (field.Element(XMLNS + "rPr").NextNode != null) &&
                       (((field.Element(XMLNS + "rPr").NextNode as XElement).Value != null)))
                   {
                       status = MergeStatus.HaveData;
                       status3 = field;
                   }
                   else if ((status == MergeStatus.HaveData) &&
                       (field.Element(XMLNS + "fldChar") == null) &&
                       (field.Element(XMLNS + "rPr") != null) &&
                       (status3 != null))
                   {
                       status3.Element(XMLNS + "rPr").Add(field.Element(XMLNS + "rPr").Nodes());
                       field.Remove();
                   }
                   else if ((status == MergeStatus.HaveData) &&
                       (field.Element(XMLNS + "fldChar") != null) &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType") != null) &&
                       (field.Element(XMLNS + "fldChar").Attribute(XMLNS + "fldCharType").Value == "end"))
                   {
                       status = MergeStatus.HaveNew;
                       status4 = field;
                   }
                   else
                   {
                       status = MergeStatus.None;
                       status0 = null;
                       status1 = null;
                       status2 = null;
                       status3 = null;
                       status4 = null;
                       fieldName = "";
                   }

                   if (status == MergeStatus.HaveOld || status == MergeStatus.HaveNew)
                   {
                       if (row.Table.Columns.Contains(fieldName))
                       {
                           XElement newElement = null, newElement2 = null;
                           if (status == MergeStatus.HaveOld)
                           {
                               newElement = field.Descendants(XMLNS + "r").First();
                               newElement.Descendants(XMLNS + "t").First().Value = row.Field<string>(fieldName);
                           }
                           else
                           {
                               newElement = new XElement(XMLNS + "fldSimple");
                               newElement.SetAttributeValue(XMLNS + "instr", "MERGEFIELD " + fieldName);
                               newElement2 = new XElement(XMLNS + "t", row.Field<string>(fieldName));
                               newElement.Add(newElement2);
                           }
                           if (status == MergeStatus.HaveOld)
                           {
                               status0.ReplaceWith(newElement);
                           }
                           else
                           {
                               status3.Element(XMLNS + "rPr").NextNode.ReplaceWith(newElement);
                               status0.Remove();
                               status1.Remove();
                               status2.Remove();
                               status4.Remove();
                           }
                       }
                       status = MergeStatus.None;
                   }
               }
               return newBody;
           }

           public void MailMerge(DataRow row, string targetFileName)
           {
               if (row == null) throw new ArgumentNullException("The Data Row cannot be null.");
               if (string.IsNullOrEmpty(targetFileName)) throw new ArgumentNullException("The Target File Name cannot be null.");
               // if (new FileInfo(targetFileName).Exists) throw new ArgumentOutOfRangeException("The Target File Name already exists.");

               // Open Template
               byte[] sourceBytes = File.ReadAllBytes(Template.FullName);
               using (MemoryStream _workingMemoryStream = new MemoryStream())
               {
                   // Load into memory
                   _workingMemoryStream.Write(sourceBytes, 0, sourceBytes.Length);

                   using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(_workingMemoryStream, true))
                   {
                       XElement newBody = XElement.Parse(wordDocument.MainDocumentPart.Document.Body.OuterXml);
                       newBody = MergeReplace(newBody, row);
                       wordDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
                       wordDocument.MainDocumentPart.Document.Save();

                       foreach (HeaderPart x in wordDocument.MainDocumentPart.HeaderParts)
                       {
                           newBody = XElement.Parse(x.Header.OuterXml);
                           newBody = MergeReplace(newBody, row);
                           x.Header = new Header(newBody.ToString());
                           x.Header.Save();
                       }

                       foreach (FooterPart x in wordDocument.MainDocumentPart.FooterParts)
                       {
                           newBody = XElement.Parse(x.Footer.OuterXml);
                           newBody = MergeReplace(newBody, row);
                           x.Footer = new Footer(newBody.ToString());
                           x.Footer.Save();
                       }

                       XElement settings = null;
                       DocumentSettingsPart settingsPart = null;
                       if (wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().Count() > 0)
                       {
                           settingsPart = wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();
                           if (settingsPart != null)
                           {
                               if (settingsPart.GetPartsOfType<MailMergeRecipientDataPart>().Count() > 0)
                               {
                                   MailMergeRecipientDataPart mmrPart = settingsPart.GetPartsOfType<MailMergeRecipientDataPart>().First();
                                   settingsPart.DeletePart(mmrPart);
                                   // Delete refrence to Mail Merge Data sources
                                   settings = XElement.Parse(settingsPart.RootElement.OuterXml);
                               }
                           }
                       }

                       if (settings != null)
                       {
                           IList<XElement> mailMergeElements =
                              (from el in settings.Descendants()
                               where el.Name == (XMLNS + "mailMerge")
                               select el).ToList();

                           foreach (XElement field in mailMergeElements)
                           {
                               field.Remove();
                           }
                       }

                       if (settingsPart != null && settings != null)
                       {
                           settingsPart.RootElement.InnerXml = settings.ToString();
                           settingsPart.RootElement.Save();
                       }

                       // Save in output directory
                       // Create a new document based on updated template
                       using (FileStream fileStream = new FileStream(targetFileName, FileMode.Create))
                       {
                           _workingMemoryStream.WriteTo(fileStream);
                       }
                   }
               }
           }
       }
    }
  • I have noticed a bug in the published code when removing the mail merge data sources. The original code parses the OuterXML of the document settings element and rewrites it into the InnerXML provoking issues with the document structure.

    Here is the original code:

    <!---- Original code - start ----->
    // Delete refrence to Mail Merge Data sources
    XElement settings = XElement.Parse(settingsPart.RootElement.OuterXml);
    IList<XElement> mailMergeElements =
       (from el in settings.Descendants()
        where el.Name == (XMLNS + "mailMerge")
        select el).ToList();

    foreach (XElement field in mailMergeElements)
    {
       field.Remove();
    }
    settingsPart.RootElement.InnerXml = settings.ToString();
    settingsPart.RootElement.Save();
    <!---- Original code - end ----->


    The XML ends up being:


    <w:settings xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
     <w:settings>
        ...
     </w:settings>
    </w:settings>

    Microsoft Word can handle this error however, when using the DocumentBuilder in PowerTools to combine multiple documents together, Word cannot read the output document and shows the following error:
    "Word cannot open this document because some parts are missing or invalid"

    A workaround for it is:

    <!---- Workaround - start ----->
    //// Delete refrence to Mail Merge Data sources
    OpenXmlElement mailMergeEl = settingsPart.Settings.First(l => l.LocalName == "mailMerge");
    if (mailMergeEl != null)
    {
       settingsPart.Settings.RemoveChild(mailMergeEl);
       settingsPart.Settings.Save();
    }
    <!---- Workaround - end ----->
  • This worked for us very nicely, and being a class we can now incorporate it into our systems using our own datasources. Thank you so much. You've stopped me wasting about 4 days on this one issue - how to mergefields using OpenXML effectively. Other solutions and experiences others have had were disappointing to say the least.

  • Just to let you know that having a datasource in the Word docx file causes issues when the sources vary by the client or the template needs further modification. We found that by removing some of your code and removing the datasources from the Word template, the program was still able to work and not rely on having any data sources.

    We deleted these lines of code ...

                       //            // Delete MailMerge Data Source Part

                       //            DocumentSettingsPart settingsPart = wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();

                       //            MailMergeRecipientDataPart mmrPart = settingsPart.GetPartsOfType<MailMergeRecipientDataPart>().First();

                       //            settingsPart.DeletePart(mmrPart);

                       //            // Delete refrence to Mail Merge Data sources

                       //            XElement settings = XElement.Parse(settingsPart.RootElement.OuterXml);

                       //            IList<XElement> mailMergeElements =

                       //                (from el in settings.Descendants()

                       //                 where el.Name == (XMLNS + "mailMerge")

                       //                 select el).ToList();

                       //            foreach (XElement field in mailMergeElements)

                       //            {

                       //                field.Remove();

                       //            }

                       //            settingsPart.RootElement.InnerXml = settings.ToString();

                       //            settingsPart.RootElement.Save();

    And we edited the template in Word and answered NO to when it asked for the datasource location and stripped off any references to the datasource in Word. Re-saved the template and all works really well now. Clients can edit the template to add other mergefields and all I have to do is alter the datasource before calling the class. Works like a sharm.

    Thanks again. :)

Page 1 of 1 (4 items)