There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.
Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.
Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.
Please see this blog post for more information about my plans moving forward. Cheers, Eric
By Tim Coulter
When I first discovered the Open XML SDK 2.0, I was thrilled by its elegance and power. My goal was to create a simple application that would periodically generate a new version of a standard spreadsheet, using the latest data from a database. This seemed like a fairly typical challenge and I imagined that the correct methodology would already be well defined. I was to be surprised and disappointed.
Using the SDK, it doesn't take many lines of code to construct a new spreadsheet package, add a worksheet and inject some row data, derived from a dataset. The presentation is not especially pretty but it gets the job done.
Unfortunately, when you start to add features, the complexity mounts at a rapid rate. Just adding styling to the column headings and creating a column of clickable hyperlinks took me several hours of research and many lines of code. When the user asked me to wrap the data in an Excel 2007 Table (so that she could easily sort and filter it) I realized that I needed to tackle this challenge in a more intelligent way.
It occurred to me that a large part of my "dynamic" spreadsheet actually consisted of mostly static elements. Only the data from the database was dynamic. The table definitions, the boilerplate content and even the growing list of styling rules were essentially static - they never varied from one instance of the spreadsheet to the next. It seemed inappropriate to build these elements dynamically using the SDK when there was already a much better tool for the job - Excel.
So, starting with a blank Excel workbook, I defined an empty placeholder table that reflected the column structure of the data that would be injected. I then applied some styling and various other user-requested embellishments, before saving it for re-use. All that now remained was to write some code to inject the dynamic data.
Unfortunately, as I was to discover, working with a template spreadsheet using the SDK presents a significantly greater challenge than creating a new package from scratch. The reason is that every action must include safeguards to avoid invalidating existing elements. For example, inserting a row into a worksheet entails renumbering all subsequent rows, as wells as all the cells within them. If that isn't enough, consider that a spreadsheet package can contain a whole slew of elements that reference individual cells (the calculation chain, tables, pivot tables, hyperlinks etc). Each time a cell address changes, every element that references it must undergo a matching change. Just adding a new row to a worksheet can invalidate dozens or hundreds of other elements, and it only takes one such error for Excel to refuse to open the workbook.
It was at this point that I realized that this challenge needed a radically better solution than simply adding more code to an already-bloated project. So, I set to work creating that radical solution.
Fast forward three months and the result is ExtremeML; a new open source library that adds a powerful layer of additional functionality to the SDK, enabling developers to program against an intuitive object model and freeing them from the burden of implementing the low-level detail of OpenXML.
The remainder of this article presents a walkthrough of a typical use-case for ExtremeML; the programmatic creation of a fully styled and formatted Excel workbook, populated with live data from a database and enhanced by the addition of a pivot table and a chart.
If you arrived at this page expecting to see reams of C# sample code, you might be disappointed. One of the distinguishing characteristics of ExtremeML is that it achieves remarkable results with surprisingly small amounts of code. But please keep reading. You are about to discover a new level of simplicity in your OpenXML programming.
The Importance of Templates
In this contrived example, the goal is to use a database query to retrieve some statistical data about the countries of the world and to apply this data programmatically to create an Excel workbook that incorporates the following features:
Faced with this list of requirements, some developers would attempt to implement all of them in code, starting with an empty package. But this strategy would require thousands of lines of code and many of them would be used to create package elements that never vary. This is very wasteful and inevitably leads to massive code duplication between projects of a similar nature.
A better approach is start with a template, using it to store every static element that will be present in the final result. Unfortunately, many developers who have tried this approach have been dissuaded by the horrible complications that I alluded to in my introduction. As we will see, ExtremeML massively simplifies working with templates and is able to implement this impressive list of features in just a few lines of code.
Creating the Basic Template
The starting point for all ExtremeML projects is a workbook template. In this example, we will begin with a basic design and improve it progressively.
For this application, we need an Excel Table that will store statistical data about the countries of the world. Our template table must define a column for each of the fields of data to be injected. Each column must be styled to suit the data that it will accommodate, with styling being applied to the worksheet column behind the table, rather than to the table column itself. This technique ensures that when new rows are programmatically inserted into the worksheet, their cells will automatically inherit the style of the worksheet column in which they reside.
This example template table includes an empty data row (at least one data row is mandatory for all Excel tables), and an optional totals row. We will apply the label "Average" to one of the cells in the totals row and a column average formula to three of the columns. Most importantly, the table must be named, so that it can be referenced from within our code.
Note how the formulas in the table result in calculation errors. But don't worry - these will be resolved as soon as data is injected into the template.
This is all that we need for the basic version of our template. Now, let’s test it.
Building and Running the Code
In ExtremeML, a simple standardized code pattern is used for most forms of data injection. This pattern will be very familiar to anyone who has programmed using the SDK, except that it is significantly more compact.
An Excel Table is represented in ExtremeML by an instance of the TableWrapper class, which is referenced using its unique name (as per the table in the template). Data is injected into a TableWrapper by calling its Fill method, which has a number of overloads that support a variety of data scenarios. The one we’re using here accepts a DataSet and a string that specifies the name of one of its contained tables.
public static void PopulateCountriesFromDataSet()
using (var package = SpreadsheetDocumentWrapper.Open("MyWorkbook.xlsx"))
using (var connection = new SqlConnection(MyConnectionString))
var table = package.WorkbookPart.GetTablePart("Countries").Table;
var command = new SqlCommand("SELECT * FROM Countries", connection);
var adapter = new SqlDataAdapter(command);
var dataSet = new DataSet();
ADO.NET data columns are mapped to Excel table columns by name, but may optionally be aliased, by assigning a value to their caption property. This provides great flexibility, since table columns can be organized in any sequence and assigned any name and they will still map correctly. Moreover, there is no obligation to map every column in a data source to a table column, or to populate every table column.
That’s it! The code is complete, so now we can run it...
After code execution, the original empty template table has expanded to 179 data rows and has been populated with the data from the passed DataSet. Each column is correctly formatted according to its data type and both the totals row and the calculated column now contain results that correctly reflect their intended values.
In case you are in any doubt, this is a real live Excel table in every sense. As with tables created manually, it supports sorting and filtering using the drop-down menus on each column header. And when a filter is applied, the values in the totals row are automatically updated to reflect the visible rows.
All this was achieved using a simple template table and a few lines of code. Can you now begin to see what's possible when you combine Excel templates with the power of ExtremeML?
Adding a Pivot Table
Having completed a simple Excel template and the code necessary to inject data into it, we can now improve its usability by presenting this data in a pivot table. This is as simple as extending the template to include a definition for a skeleton pivot table, which will be automatically populated when its associated table is filled with data.
Since no data exists at the template stage, it is not possible to bind the pivot table to a specific cell range. Instead it is bound by name to the table that we created in the previous step.
In this example, the objective is to show the popularity of languages spoken throughout the world, by aggregating the populations of countries by their official language. Statisticians may find this technique somewhat dubious (given that many people and many countries speak multiple languages), but it's an ideal way to demonstrate the capabilities of a pivot table.
If you're not already familiar with Excel pivot tables, this tutorial will get you started.
To keep things simple in this demonstration, I will bind the pivot table to just two of the columns in the underlying table and configure how their data will be presented.
You might also wish to sort or filter your pivot data, but this is optional.
When defining a pivot table in a template, one essential final step is to ensure that it refreshes itself when the workbook is next opened. This achieved using the Options dialog, which is accessible via the pivot table's context menu. Simply uncheck Save source data with file and check Refresh data when opening the file.
Finally, this is how our pivot table appears in the template. Although it looks rather uninspiring at this stage, it will come to life when its associated table is filled with data.
No additional code is required to implement this example. All we need to do is save the template workbook and re-run the code from the earlier example. After execution, the pivot table correctly shows the spoken languages of the world in order of their popularity.
Adding a Pivot Chart
Many spreadsheets use graphics to bring their data to life. Fortunately, integrating any of Excel’s chart formats into an ExtremeML project entails only one additional step and, once again, it’s just a matter of extending the template – no program code is required.
Unlike conventional Excel charts, a pivot chart is ideal for this application, because it is bound directly to a pivot table (rather than a cell range) so it adapts dynamically to changes in the content of the pivot table. This means that, even though neither the pivot table nor the pivot chart have any populating data at the template stage, both will become fully hydrated once data is injected into the underlying table.
If you're not already familiar with Excel pivot charts, this tutorial will get you started.
To add a pivot chart definition to the template, simply select the pivot table that will act as a data source and insert a new chart of the desired type. For this example, we will use a Bar of Pie chart, since it allows us to aggregate all the minority languages into an “others” category, thus affording a less cluttered view of the major languages. Of course, in a real application, you will choose your chart type according to the characteristics of the data being represented.
Since the pivot table is not yet populated, its pivot chart is not very elegant at the template stage. However, all of its properties are accessible, and these can be used to apply any styling and behavior that is required by your application.
Styling and configuring a pivot chart without data is not intuitive, since it does not provide the same visual feedback that you would receive from a populated version. You may therefore find it helpful to add some dummy data to the underlying table whilst adjusting the pivot chart to suit your preferences. However, when this adjustment is complete, be sure to remove the dummy data before saving the template.
As with its partner pivot table, a pivot chart needs no additional program code. Simply re-run the code from the earlier example and, after execution, the pivot chart correctly illustrates the relative popularity of the languages of the world.
This has been a relatively simplistic example but it has nonetheless demonstrated some of the many SpreadsheetML challenges that are easily solved by combining Excel workbook templates with ExtremeML.
ExtremeML is available under the GPL v.2 open source license and may be downloaded from CodePlex, complete with full API documentation and a User/Developer Guide that includes 15 detailed tutorials with sample code.
I invite you to give ExtremeML a try and to submit your feedback, including suggestions and feature requests, via the project discussion forum.