Background
Here is a simple WordPress plugin that extrapolates a set of data from Piwik – which is an open source real time web analytics application and updates a pivot table in an Excel document.
The plugin captures the user's Piwik API key and displays a link to an excel document on a custom dashboard widget.
Figure 1 - Wordpress manage plugins page
Figure 2 – Plugin configuration page
Figure 3 - WordPress administration dashboard with custom widget
Introduction
This sample app highlights a number of possible ways of working with Open XML documents in PHP. Before continuing this article I would recommend reading Sanjay Kumar’s article on Pivot Tables in SpreadsheetML's.
Not having worked with pivot tables in Open XML I started by creating two excel documents one with an empty pivot table and another containing data, I then compared the differences using the Open XMLDiff tool (Downloaded with the Open XML SDK 2.0). It became apparent that when you change a pivot table a significant number of files need to be updated.
After some investigation I discovered the only pivot table part you have to update is the pivotCacheDefinition. If you amend the pivotCacheDefinition’s workSheetSource reference attribute it will update its necessary dependencies.
Note: The workSheetSource reference attribute specifies what cells in the worksheet make up the pivot table’s data source.
Parts required to be updated when updating a pivot table:
Getting the data
The actual retrieval and manipulating of the data doesn’t directly relate to working with Open XML in PHP so is not the focus of this article. If you want to go straight to working with Open XML in PHP skip to the next section Updating SpreadsheetML Part’s.
Piwik exposes the statistics it collects via a REST API, I utilized two functions; GetSearchEngines and GetKeywordsFromSearchEngineId.
GetSearchEngines – Returns Search Engines Referrers
Google
Bing
GetKeywordsFromSearchEngineId – Returns keywords for search engines
Bing:
Open XML
39
Open XML SDK 2.0
88
Wordpress
23
Google:
12
Wordpress Plugin
67
SpreadsheetML
32
I then needed to insert the data into a spreadsheet to look as below:
Figure 4 - Pivot table source data sample
Open XML SpreadsheetML’s store strings in a shared string part not in the worksheet. The worksheet references the strings inside the shared string part, as illustrated below:
Figure 5 - Shared string overview
Looking at the excel document we can see that the text “Bing” is inside cell B2. Cell B2’s value element () inside the worksheet is 4, index 4 inside the shared string part is “Bing”; as is the standard programming paradigm the first element in the shared string part (keyword) is indexed as 0.
Note: Integer values are stored as inline strings e.g. Cell C2’s value 39 is inside the worksheet.
Below is pseudo code of how I structured the data returned via the REST API calls to enable it to be inserted into the various SpreadsheetML parts:
1. For each search engine create an array
2. Loop through the keywords for that search engine (returned via the rest call)
3. Check to see if they exist in a global keyword array (stores all unique keywords and is used as the data for the shared string part)
4. If the keyword doesn’t exist, add it to the keyword array and update the search engine array. The search engine array key is a reference to the keyword array which stores the keyword.Note: This code populates the arrays used to move the data into the Open XML file in the next section.
Updating SpreadsheetML Part’s pivotCacheDefinition
Key PHP libraries used:
ZipArchive – Library for working with ZIP archives
SimpleXML – Library for converting XML into an object
Inside the if statement are the most interesting lines of code, I first locate the pivotCacheDefinition1.xml file (281-282), load the xml into an object (283) find the reference attribute and amend it (284) lastly I insert the updated pivotCacheDefinition back into the zip package, you need write permissions on the file to complete this.
Note: ZipArchive will overwrite files with the same name.
By loading the pivotCacheDefinition xml using the simplexml_load_string class it turned the xml into an object which you can then navigate through. By looking at the actual XML produced in the pivotCacheDefinition we are able to identify how to walkthrough the object to locate the ref attribute:
$xml is the root node therefore you don’t have to reference the pivotCacheDefinition element
Figure 6 - Wrong way to navigate XML objectUpdating sharedString and workSheet partI used the same functionality as above for editing the sharedString and workSheet part. The workSheet part does have a couple of differences.
Before the new worksheet data is added I remove the current data
I then insert the new data underneath the closing columns tag.
Generating Spreadsheet XML
Key PHP Library used:
XMLWriter – Generate XML files or streams.
To figure out the XML I needed to generate for the sheet data I opened up an Open XML package containing a pivot table with the data structure I wanted to mimic.
Figure 7 - Sheet data XML for sample data
Inspecting the XML we can see that it follows a consistent pattern, a row element containing three child cell elements.
Cells with text have an additional t attribute which specifies they reference a shared string and the nested value element references the element in the shared string part.
Figure 8 - Cell element referencing a shared string
If the cell contains an integer it stores the value inline.
Figure 9 - Cell element with inline integer value
Let’s take a look at the code:
Figure 10 - Code to write sheet data XML
We first create the sheetData element (190), by not calling $writer->endElement() to write an end tag for sheetData any element created beneath will be created as a child node.
For each search engine array element we create one row element and three cell elements:
1. Write opening tag for row element (204-206)
2. Search engine keyword (208-212)
3. Search engine name (214-218)
4. Hit count (220-223)
5. Close row element (225)
We then call $writer->endElement three times closing the cell, row and lastly the sheetData element.
Walk Through
1. The sample attached to this article contains an initial Spreadsheet with a pivot table for which the data will be updated. This is setup as follows:
Note that if you create your own spreadsheet pivot you must set the pivot table to refresh data when opening the file. This is already set in the included spreadsheet:
2. Wordpress admin dashboard with new custom widget
3. After clicking Export prompted with dialog box
4. Pivot table updated
5. Pivot table displaying visits
Outro
We have seen using PHP how to open an Open XML package, manipulate and/or add new valid XML and then save it back into the package; hopefully you will be able to use some of these concepts to achieve your own goals.
For the full source code please download it here.
Acknowledgements
I extended Jules’s existing Piwik Analytics WordPress plugin; Jules’s plugin adds the necessary Piwik JavaScript for tracking statistics into a WordPress site.
[PIP1]Formatting needs to be cleaned up as there is a mixture of Arial 9 & 10 point in the document.