Displaying Open XML documents in Silverlight with TextGlow
By James Newton-King
The Office Open XML file format has opened up a new range of possibilities to working with documents. The Microsoft Office 2007 suite of products replaces the old binary formats and produces documents in a parser friendly XML format by default instead.
TextGlow (www.textglow.net) is a Silverlight 2.0 application that leverages the new Office Open XML file format to display Word documents directly in the browser. This article will look at how to use the built-in features of Silverlight 2.0 to read content from an Open XML package, parse XML using LINQ to XML and display the document contents using Silverlight.
Reading an Open XML package using Silverlight
An Open XML document is typically made up of many different files: XML files describing the document content, styles and other document; resource files that are used in the document like images; and XML package metadata which list the contents of a package and relationships between files. All of this information is packaged together in a zip file and follows the Open Packaging Convention. Although a Word document ends with a docx extension, the file Word is writing is a zip archive.
An Open XML package can be opened by any program that supports the zip format.
Silverlight 2.0 has excellent built-in support for reading zip files.
Using the GetResourceStream method that Silverlight provides, together with the package metadata included in an Office Open XML file we are able to build up a model of the files (called Parts the specification) that are used by the document. Along with file names and paths, the package model also has information about part types and their relationships to each other. This information is important because resources in an Office Open XML document are often referenced by relationship ID.
The structure of a package is a simple tree of parts, each part having a collection of child parts.
TextGlow has its own model for working with packages which can be found in the TextGlow.Control.Packaging project. If you are writing an application for the regular .NET framework rather than Silverlight then Microsoft has released a framework for working with Open XML packages called the Open XML Format SDK. The current SDK is version 2.0 CTP (with the 2.0 being shipped about the same time as Office 14) and can be downloaded here.
Parsing Open Office XML
Now that we have XML and resource files that make up the document in a model we need to parse the part contents. The primary content file of a docx file is /word/document.xml. This file is where we will find the text, paragraphs, tables and other content that makes up a WordProcessingML document defined in XML.
The XML below is the content of a simple Hello world! document written by Microsoft Word 2007.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:p w:rsidR="002079AC" w:rsidRPr="00BB43D8" w:rsidRDefault="00D11212" w:rsidP="002079AC">
<w:sectPr w:rsidR="002079AC" w:rsidRPr="00BB43D8" w:rsidSect="007F0645">
<w:footerReference w:type="default" r:id="rId7"/>
<w:pgSz w:w="11907" w:h="16840" w:code="9"/>
<w:pgMar w:top="1418" w:right="1134" w:bottom="1440" w:left="1134" w:header="567" w:footer="567" w:gutter="0"/>
LINQ to XML makes querying this XML easy. In the snippit below we get all the text content of our Hello world document.
XDocument document = XDocument.Load("document.xml");
XNamespace main = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XElement body = document.Root.Element(main + "body");
foreach (XElement text in body.Elements(main + "p").Elements(main + "r").Elements(main + "t"))
Console.WriteLine("Press any key to continue...");
The WordProcessingML standard that defines the content in Microsoft Word 2007 document is very detailed. All the information that a word processing application like Microsoft Word 2007 needs to display a document is stored in XML.
TextGlow requires a lot of that information to accurately display the document in Silverlight. Many styles within a document are inherited. For example, to calculate the correct font size TextGlow needs complete information about styles in the document. An application that only cares about content, that doesn’t need complete information about a document, could easily skip this process and could work directly against XML.
The document model is built up from parsing the document XML, using LINQ to XML similar to the example above except on a larger scale.
TextGlow’s document model is in the TextGlow.Control.Model project.
Displaying Open Office XML content using Silverlight
Once we have the document content in an easy to use model we can start to display the document using Silverlight.
Silverlight displays content using its own implementation of WPF. While Silverlight WPF supports all the basic building blocks for UI like vector graphics, text, animation and images, it doesn’t support complex controls like flowing text documents that are found in the full version of WPF, meaning a lot of text layout in TextGlow is done manually.
Images embedded inside an Office Open XML document are stored in their original format. A gif, jpg or png that is inserted into a document is just included in the package. Images are displayed in TextGlow by simply extracting the file content and passing it to Silverlight’s Image control.
The controls that TextGlow uses to display a document are in the TextGlow.Control.Controls project.
Putting it all together
Now that we have our document model and controls to display the document, we put everything together using the DocumentBuilder.
IContentGetter contentGetter = new PackageContentGetter(packageStream);
Package package = new Package(contentGetter);
Document documentModel = Document.CreateDocument(package);
DocumentBuilder documentBuilder = new DocumentBuilder(documentModel, p => Container.AddPage(p));
From the zip file data we build the package model. The package model is used to create the document model which is then used by the document builder to create the WPF controls. The page controls created by the DocumentBuilder are added to the Silverlight application as they are created.
Once the pages are part of the Silverlight application any UI can be used to control how they are displayed. TextGlow uses a Cover Flow-like UI but the document pages could easily be used in an application with a more traditional Word-like UI.
Storing documents in the Office Open XML format opens a wide range of new possibilities, both from a reading and writing perspective. TextGlow is a first step into combining reading an Open XML document with the rendering power of Silverlight.
See TextGlow in action at www.textglow.net
The sourcecode is in the attached zip file.