wordpress hit counter
Introducing the Open XML SDK 2.0 - OpenXML Developer - Blog - OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

Introducing the Open XML SDK 2.0

Introducing the Open XML SDK 2.0

  • Comments 8

To celebrate the RTM release of the Open XML SDK 2.0 we’re launching a bunch of new content here at Open XML Developer. This article provides our brief history of the Open XML SDK 2.0 and provides useful links to content here at Open XML Developer.

Introduction

 

The Open XML file formats are a set of internationally standardised document formats. For developers used to working with complex, loosely documented legacy binary formats, Open XML represented a fantastic change in the way they worked. Working with the Open XML formats is still fairly heavy lifting involving quite complex XML data manipulation; modern office file documents are a very rich representation of information.

 

The Open XML SDK 2.0 is a tool that aims to drive developer productivity for those working with the Open XML formats. It provides a strongly typed object API and removes the need for developers to manipulate the raw XML data that make up an Open XML document. Importantly, the Open XML SDK 2.0 is designed to work without any dependence on Microsoft Office or any other tool; you can comfortably run Open XML SDK 2.0 based code on high volume web servers for example, a place you would not typically want to be running Microsoft Word.

 

In this article we will look briefly at the various approaches for working with Office documents and their pros and cons. We will then dive deep into the Open XML SDK 2.0 and see key features designed to help developers be more productive in working with these open, interoperable standards.

 

In the beginning

 

Love it or hate it, Microsoft Office has been a powerful force in business over the past 20 odd years. Applications like Word and Excel (and indeed other non-Microsoft alternatives) have touched almost every business in the world. A schoolchild these days is likely to be as proficient with a word processor as they are with a pen and pencil.

 

Office Automation:

For a long time developers have wanted to be able to manipulate Office documents from their code. The Microsoft Office suite has historically had good developer capability- there are more ‘mission critical’ Excel macro based applications out there than we probably care to admit! Through Visual Basic for Applications and Microsoft Office Automation we can, as developers and power users, effectively remote control Office applications by using a typed object model.

 

Office automation was and still is a fantastic solution for  many applications. It provides an extensive Object model covering almost every feature of the key Microsoft Office applications. Because it is actually manipulating the productivity application, Office automation exposes not only file format information but also application functionality; for example we can programmatically regenerate a table of contents or perform a spell check. If you are writing a smart client application for Windows and you know that your user will have Office installed the office automation should be at the top of your list of approaches to use.

 

This approach has drawbacks too though, particularly given the nature of applications today where we see high volume workloads on servers. Microsoft Office is a people focussed application and it is not really designed to be run on a server- it does not scale the way web developers need it to. Office automation is simply not designed for the sort of workloads that are generated in big web applications. Finally, a big web farm would carry with it significant additional licensing costs where Microsoft Office required on each server.

 

Finally, office automation was not an appropriate choice where your application needed to run on something other than a Windows PC. If you were writing code for a Windows SmartPhone you would not have access to any of the automation APIs.

 

Cutting out the Middle-man

 

In order to achieve our needs we would need to go one level deeper and manipulate the actual document file rather than the application. We would sacrifice some functionality in doing this, for example we would no longer be able to use the API for application function such as spellchecking, but we would have the power to write extremely lightweight high performance code that could target a variety of platforms.

 

In the good old days this was pretty difficult- the Microsoft Office binary formats were not publicly documented and were a complex proposition to work with. Several large Independent Software Vendors wrote applications that could work with these formats and some 3rd party APIs emerged but the prospects for small software firms or internal IT teams working with these documents were limited.

 

That is where a more transparent and understandable Document file format comes to the rescue.

 

The early 2000’s were the age of angle brackets- if it wasn’t in HTML or XML it was not worth having. Between 2000 and 2003 Microsoft started down a process refining and documenting the formats into an XML based format.

 

Well, almost...

  • Only Word and Excel got the new format treatment

  • Not all formatting features were supported

  • The format consisted of single file of XML content that got pretty unwieldy quickly

These new formats really appealed to developers of the day who, already being experts in working with XML, were able to easily manipulate Office documents without a reliance on the Office client applications. They were less popular with end users, as all the pain of moving to a new default format just did not seem worth it.

 

TODAY: Open Standards and the Open XML SDK v2.0

 

Over the course of the 2000’s Microsoft worked with International Standards Bodies such as ECMA and ISO to develop, refine, and finally standardise their Open XML file formats. The Open XML file format is now an ISO standard: IS29500.

 

Key features of this format are:

  • Uses Open Packaging Convention which is essentially a renamed ZIP file containing folders and XML file “parts” along with other arbitrary parts such as images and videos

  • Word, Excel & PowerPoint all get new document formats- WordProcessingML, SpreadSheetML, PresentationML

  • The formats are open and available to all from http://www.iso.org meaning that software companies on any platform are able to write Open XML based applications that will interoperate with all other Open XML based applications

  • Extensibility is permitted in a managed fashion through the Markup Compatibility and Extensibility feature of standard

Because of their simplicity, the Open XML formats are very easy for developers to work with. All you need is an ability to unpack a *.zip file and manipulate a text document.

However, for .NET developers the Open XML SDK 2.0 2.0 makes working with Open XML documents more accessible than ever before. The SDK 2.0 provides a strongly typed API for manipulating the OPC packages and the Open XML format markup contained within them. In the next section of the document we’ll look at what makes the Open XML SDK 2.0 a great tool.

Strongly Typed API

The Open XML SDK 2.0 includes a strongly typed API for manipulating documents. Rather than having to work with the Open XML markup parts as generic XML you can use these typed objects. This means that you will have full Intellisense support inside various developer tools and certain tools will also support retrieving member documentation.

 

The API is able to serialize back into an Open XML document at any time. This means that developers can manipulate a document using the object model and make a simple call to Save() to write out the Open XML format package.

 

Language Integrated Query is a .NET based technology for querying data structures from code. The Open XML SDK 2.0 supports LINQ as a first class concept meaning that it is possible to easily query and manipulate collections of objects such as paragraphs or table rows.

 

The Open XML SDK 2.0 can be coupled with Office Services functionality of Microsoft SharePoint to support batch document conversion and other advanced operations. The SDK 2.0 requires only a medium level of trust and as such will be compatible with most server side deployment scenarios including ‘locked down’ hosting providers and SharePoint 2010 sandboxed solutions.

 

Developers can layer their own code on top of the API to build application specific libraries for reuse; because the Open XML SDK 2.0 supports free distribution rights these libraries can be distributed broadly.

1.     Developer Productivity Tools

The Open XML SDK 2.0 includes the Developer Productivity Tool that help developers to more quickly write their applications.

 

Document Comparison:

The SDK 2.0 includes a mechanism to compare two documents. Developers can use this a little like they might use a normal ‘Diff’ tool with text files. The Open XML Diff capability is unique in that it understands the structure of the open XML formats and is therefore able to show changes not only in the underlying Open XML mark-up but also the OPC package structures.

 

The Open XML Diff tool is particularly useful for comparing documents pre and post manipulation by another Open XML implementation. Need to understand how Microsoft Word applies a particular format? Compare the before and after markup using this tool.

 

Document Reflection:

Often developers will have a template document from which they wish to work. They may have built a word document and now want to understand how to write code to create that document programmatically. For this need the SDK 2.0 provides a Document Reflector. This document reflector is able to parse an existing document and emit C# source code that will recreate that document via the strongly typed API.

 

If you are familiar with the excellent Red Gate .NET Reflector tool then you will understand just how useful this approach can be.

 

For more information on how to use the Developer Productivity Tool please see this OpenXMLDeveloper.org article: An Introduction to Open XML SDK 2.0.

2.     Interoperability and Standards Conformance

The Open XML SDK 2.0 represents the most complete API for working with the internationally standardised IS29500 Open XML file formats.

 

Validation of documents is important for many developers, particularly those generating content dynamically from other data sources. As well as the inherent validation provided for by the strongly typed API, the SDK 2.0 includes specific validation logic. This logic checks not only schema conformance but also semantic conformance based on the requirements set out in the IS29500 specification text. Validation errors caused by syntactically or semantically incorrect markup return detailed XPath information to allow easy resolution.

 

The SDK 2.0 provides easy access to documentation directly from within the tools. As well as intellisense documentation the SDK 2.0 validation tools will retrieve guidance from the IS29500 specification and the Microsoft implementation notes.

 

The IS29500 specification makes specific provision for implementers to add their own markup. This is set out in part 3 of the specification; Markup Compatibility and Extensibility. The SDK 2.0 provides specific support for MCE constructs. Documents can be pre-processed to retrieve either Office 2007 or Office 2010 markup. The SDK 2.0 is also able to easily emit MCE constructs should developers wish to specify their own extended markup. For more details on working with MCE in the Open XML SDK 2.0 please see the hands on lab: OpenXML Markup Compatibility and Extensibility with the OpenXML SDK 2.0[CA1] 

3.     Application Independence

The Open XML SDK 2.0 does not have any dependency on Microsoft office or any other productivity suite. The SDK 2.0 supports both server side and client side configurations with Microsoft .NET being the only prerequisite.

The Open XML SDK 2.0 is freely distributable. This means that any solution developer that chooses to implement the Open XML formats in their application is able to distribute the Open XML Binaries (that is the *.dll files) with their application.

 

Learning About the Open xml SDK 2.0

In this section we’ll set out and link to many of the learning resources for the Open XML SDK 2.0.

Articles:

We have a number of OpenXmlDeveloper.org articles that discuss the Open XML SDK 2.0. Some of these articles have been written for CTP releases of the SDK 2.0. For the most part these should work with the final release.

  • An Introduction to Open XML SDK 2.0
    This article by James Newton-King focuses on the SDK 2.0 tooling, particularly the XML Diff and XML Reflector tools.
  • A Photo Slideshow Presentation Using the Open XML SDK 2.0
    This article by Philip Wong demonstrates how to crate PresentationML slide shows using the Open XML SDK 2.0. It demonstrates the use of the Document Reflector to create a template into which data is inserted.
  • Mail Merging with a Custom Client Using the Open XML SDK 2.0
    In this article by Johannes Prinz we demonstrate how to work with wordprocessor mail merge type functionality using the Open XML SDK 2.0. This code is suitable for use in a server side environment. This article also demonstrates the great LINQ support in the SDK 2.0.
  • Formatted Excel using SDK 2.0 and .NET
    Lawrence Hodson provides a SpreadsheetML based example here with details on how to use the Open XML SDK 2.0 to create richly formatted documents. This article demonstrates how simple it is to manipulate documents with the SDK 2.0 thanks to the auto-save capability.
  • Adding Headers and Footers with SDK 2.0
    A common question on the OpenXmlDeveloper.org forums is how to manipulate the headers and footers of documents. In this article Lawrence Hodson shows how to use the Open XML SDK 2.0 to perform this task. This article includes a useful code library to reuse in your own applications showing the layering capability of the SDK 2.0.
  • Organization Chart using the Open XML SDK 2.0
    Developers often forget that DrawingML makes up a key part of the Open XML file formats. In this article Jian Sun uses the Open XML SDK 2.0 to create an organizational chart by first querying an LDAP store (think Active Directory) and then creating the chart in DrawingML. Jian Sun discusses using the Document Reflector tool and approaches to converting the outputted C# code to VB.NET. This article is written in VB.NET.
  • SpreadsheetML Made Easy Using C#[MDB2] 
    In this article Tim Coulter introduces his open source ExtremeML library. ExtremeML builds atop the Open XML SDK 2.0 and provides a set of useful layer of functionality focussed on working with SpreadsheetML markup. It provides a great example of how developers can bundle their own functionality with the SDK 2.0 and ship useful libraries for others to use.
  • ExtremeML Pivot Table
    Tim Coulter demonstrates the power of his open source ExtremeML library by building a ShreadsheetML pivot table in this article. Thanks to the power of the Open XML SDK 2.0 and the ExtremeML library it takes just 80 lines of code to transform a simple CSV telephone bill data table into a rich pivot table.

Hands on Labs:
We have refreshed the Hands on Lab content available here at Open XML Developer. While it still provides the detailed drill down on Open XML markup it now also discusses the Open XML SDK 2.0 and includes two new labs that demonstrate how to use the Open XML SDK 2.0

  1. Introduction to Open XML SDK 2.0
    [CA3] This lab provides you with everything you need to go from Zero to Hero with the Open XML SDK 2.0. We go from ‘Hello World’ to much more complex applications that create, manipulate and format data using the SDK 2.0.
  2. Open XML Markup Compatibility and Extensibility with the OpenXML SDK 2.0[CA4] 
    In this lab we demonstrate the MCE constructs in IS29500 Open XML and show how the Open XML SDK 2.0 makes it simple to work with MCE content.

Conclusion

 

The 2.0 release of the Open XML represents a major step forward for developers working with the Open XML file formats. It provides the richest client independent document format manipulation API on any platform.

Here at Open Xml Developer we will be publishing a number of new articles on the Open XML SDK 2.0 over the coming months. You can subscribe to our RSS feed to be notified of new updates or follow us on Twitter @OpenXmlDev.

 


 [CA1]Note to person posting this. We need to link to this hands on Lab

 [MDB2]Link needed?

 [CA3]Note to editor: Link required.

 [CA4]Note to Editor: Link required once lab is published

 

To view the Open XML SDK 2.0 Launch forum click here.

Page 1 of 1 (8 items)