wordpress hit counter
Building a Formatted WordProcessingML Document - OpenXML Developer - Blog - OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

Building a Formatted WordProcessingML Document

Building a Formatted WordProcessingML Document

  • Comments 14

Author: Roch Baduel (Cross Système), email: rochb@essilor.fr

In this article, we’re going to take a look at some C# code, basically to build a simple WordProcessingML document. Intentionally the code does not use the new System.IO.Packaging API that ships with WinFX. Instead I use a zip library: SharpZipLib. Information about this library can be found at the following URL: http://www.icsharpcode.net/OpenSource/SharpZipLib. This does not matter much, any zip library can be used. For simplicity, the code will use only the XML DOM (XMLDocument, XMLElement …) to manipulate XML files.

A WordProcessingML document is basically a zipped archive. The zip must contain at least the following files:

  • [Content_Types].xml
  • _rels/.rels
  • and a main document (The main document location and name is not important, as long as it’s the same as specified in the .rels file) For example the main document will be: word/document.xml

The first step to create our document is to create a class that will represent the WordProcessingML document. This class will hold the various XMLDocuments and will enable writing to the zip archive.

class DocX
	{
		
		private XmlDocument _contentTypes;
		private XmlDocument _mainRels; 
		private XmlDocument _wordDocument; 

In order to be able to add a file to a zip archive we‘ll need some glue implemented here as a private method:

private void WriteXmlDocumentEntry(ZipOutputStream output, XmlDocument xml, string name)
	{
		MemoryStream stream = new MemoryStream(100000);
		XmlWriter writer = XmlWriter.Create(stream);
		xml.WriteTo(writer);
		writer.Flush();
		ZipEntry entry = new ZipEntry(name);
		entry.DateTime = DateTime.Now;
		entry.Size = stream.Length;
		byte[] datas = stream.ToArray();
		stream.Close();
		Crc32 crc = new Crc32();
		crc.Update(datas);
		entry.Crc = crc.Value;
		output.PutNextEntry(entry);
		output.Write(datas, 0, datas.Length);
	}

This method writes an XMLDocument to the zip archive.

In the constructor of the DocX class, we initialize the various XMLDocument with some default content:

public DocX()
{
   _contentTypes = new XmlDocument();
   _mainRels = new XmlDocument();
   _wordDocument = new XmlDocument();
			
    //first initialize parts with minimum infos
    contentTypes.LoadXml(
     @"<TYPES xmlns="" content-types?? 2006 package schemas.openxmlformats.org
       http:><DEFAULT vnd.openxmlformats-package.relationships+xml??
       application ContentType="" rels?? Extension="" />
       <DEFAULT application ContentType="" Extension="" xml?? />
       <OVERRIDE application ContentType=""
       vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml??
       document.xml?? word PartName="" />
       </TYPES>");
	
    mainRels.LoadXml( @"<RELATIONSHIPS
      xmlns="" 2006 package schemas.openxmlformats.org http: relationships??>
      <RELATIONSHIP id="" 2006 schemas.openxmlformats.org http:
      document.xml?? word Target="" officeDocument?? relationships
      officeDocument Type="" rId1?? />
      </RELATIONSHIPS>");
	
   wordDocument.LoadXml( @"<?xml:namespace
      prefix = w /><w:wordDocument 2006 schemas.openxmlformats.org
      http: relationships?? officeDocument main?? 2 wordprocessingml
      xmlns:w="" xmlns:r="">
      <w:body>					
      </w:body>
      </w:wordDocument>");
}

At the end of the constructor execution we have a minimal word document. The code needed to save it is very simple: we only have to save every XMLDocument:

public void Save(string fileName)
{
	ZipOutputStream output = new ZipOutputStream(File.Create(fileName));
	try
	{
	   //Build the DocX doc
	   WriteXmlDocumentEntry(output, contentTypes, @"[Content_Types].xml");
	   WriteXmlDocumentEntry(output, mainRels, @"_rels/.rels");
	   WriteXmlDocumentEntry(output, wordDocument, @"word/document.xml");
	}
	finally
	{
	   output.Finish();
	   output.Close();
	}
}

At this point, we can test our blank document:

DocX testDoc = new DocX();
TestDoc.Save(“blank.docx”);

We’re now able to create a blank document. Let’s add some text with various properties (Bold, Italic, Underline, Font, Size).

We must begin by inserting inside the body at least one paragraph containing one run of text. This looks like (assuming w is the alias for wordprocessingML):

<?xml:namespace prefix = w /><w:body>
	
	   
            some text
       	 
	 
           some other text
       	 
	

In order to include properties for the run, we will include a element. Inside the rPr element we can set some properties for the run. For example bold will be set by inserting a element. Here are the properties that we will implement:

  • Bold → element
  • Italic → element
  • Underline → element
  • Font Size → element
  • Font → element

In order to manipulate elements in the word document we’ll derive a class from XMLElement, adding a Namespace manager so that we can do XPath query with aliases:

public class WordElement : XmlElement
{
  public const string WordNameSpace =
@"http://schemas.openxmlformats.org/wordprocessingml/2006/2/main";
  protected static XmlNamespaceManager namespcmgr;
  static WordElement()
   {
      namespcmgr = new XmlNamespaceManager(new NameTable());
      namespcmgr.AddNamespace("w", WordNameSpace);
    }
  public WordElement(XmlDocument wordDocument, string name)
			: base("w", name, WordNameSpace, wordDocument){ }
}

We can now create a class to represent a run. The class stores internal references to the t and rPr elements. Note that here we always create an rPr element when we create a run. Whereas this is not required, it is not an error to have an empty rPr element (and it will be easier for manipulation).

class Run : WordElement
	{
		private XmlElement _runP;
		private XmlElement _txt;
		
		public Run(XmlDocument wordDocument, string text)
			: this(wordDocument)
		{
			this.Text = text;
		}
		public Run(XmlDocument wordDocument) : base(wordDocument,"r")
		{
			_runP = wordDocument.CreateElement("w:rPr", WordNameSpace);
			_txt = wordDocument.CreateElement("w:t", WordNameSpace);
			_txt.SetAttribute("xml:space", "preserve");
			this.AppendChild(_runP);
			this.AppendChild(_txt);
		}

Some helper methods can assist in manipulating the content of the rPr (content verification, replacement …):

private XmlElement RprAppendElement(string name)
{
	XmlElement chld = this.OwnerDocument.CreateElement(name, WordNameSpace);
	_runP.AppendChild(chld);
	return chld;
}
private void RprRemoveElement(string name)
{
	XmlElement chld = RprGetElement(name);
	if (chld != null) _runP.RemoveChild(chld);
}
private XmlElement RprReplaceElement(string name)
{
	RprRemoveElement(name);
	return RprAppendElement(name);
}

Now we can implement the simple properties like text, bold, italic and underline.

public string Text 
	{   get { return _txt.InnerText; } set { _txt.InnerText = value; } }

	public Boolean Bold {
		get { return RprGetElement("w:b") != null; }
		set { if (value) RprReplaceElement("w:b"); else
		RprRemoveElement("w:b");} }

	public Boolean Italic {
		get { return RprGetElement("w:i") != null; }
		set{if (value) RprReplaceElement("w:i"); else
		RprRemoveElement("w:i");} }

	public Boolean Underline {
		get { return RprGetElement("w:u") != null; }
		set { if (value) RprReplaceElement("w:u"); else
		RprRemoveElement("w:u");} }

Controlling Font size is almost as simple. We only need to insert an sz element with a val attribute:

public int Size
{
	get 
	{
		XmlElement sz = RprGetElement("w:sz");
		if (sz == null) return -1;
		return int.Parse(sz.GetAttribute("val", WordNameSpace));
	}
	set
	{
		XmlElement sz = RprReplaceElement("w:sz");
		sz.SetAttribute("val", WordNameSpace, value.ToString());
	}
}

To define the font, we’ll insert a rFonts element. rFonts have some attributes defining which Font to use for each part of the character set. “ascii” attribute defines the font for the lower part of the character set, hAnsi defines it for the higher part (symbols).

The name that is supplied should match the name of an installed font. It’s possible to give information for substituting the font or to embed the font but this is out of scope for our sample.

public string Font
  {
    get
    {
      XmlElement ft = RprGetElement("w:rFonts");
      if (ft == null) return string.Empty;
      return ft.GetAttribute("ascii", WordNameSpace);
    }
    set
    {
      if (value == string.Empty) { RprRemoveElement("w:rFonts");
      return;}
      XmlElement ft = RprReplaceElement("w:rFonts");
      ft.SetAttribute("ascii", WordNameSpace, value);
      ft.SetAttribute("hAnsi", WordNameSpace, value);
    }
  }

Let’s end this sample by creating a test document with formatting. All this code is implemented in the DocX class.

The code will create paragraphs under the body element of the document with runs with different properties:

public XmlElement CreateParagraph(params XmlElement[] runs)
{
  XmlElement p = wordDocument.CreateElement("w:p", WordElement.WordNameSpace);
  foreach (XmlElement run in runs) p.AppendChild(run);
  return p;
}
		
public void CreateSomething()
{
  XmlElement body = wordDocument.SelectSingleNode("//w:body",
    WordElement.namespcmgr) as XmlElement;
  for (int i=1; i<6; i++)
  {
	Run run1 = new Run(this.wordDocument," first text");
	Run run2 = new Run(this.wordDocument,"second text");
	Run run3 = new Run(this.wordDocument, "third text");
	run1.Size = 48;
	run1.Italic = true;
	run1.Font = "Broadway";
	run2.Bold = true;
	run2.Size = i*5;
	run3.Font = "Times New Roman";
	run3.Size = i*20;
	body.AppendChild(CreateParagraph(run1));
	body.AppendChild(CreateParagraph(run2,run3));
  }
}

And here is the result:

A better way to generate formatting would have been to use styles …

Page 1 of 1 (14 items)