wordpress hit counter
ExtremeML Pivot Table - OpenXML Developer - Blog - OpenXML Developer

ExtremeML Pivot Table

Blog

Samples, Demos, and Reference Articles

ExtremeML Pivot Table

  • Comments 1

Background

More and more companies are going green these days, for one reason or another. Telecommunication companies are no exception and phone bills are a great starting point for them. Instead of graciously inking carbon on the glossy sheets of paper produced from the fine Canadian pine trees, some of them have switched to electronic bills to reduce their carbon footprints. Vodafone New Zealand is a sterling example of such cases.

However, the format we receive the bills in is plain old .csv files (a comma separated text based file format) and it appears to come from a transactional billing database because all it contains are raw calling details that run several pages long. One of the problems is that most people don’t have the time to check through each phone call to identify any incorrect tolls. Some are also not visible as many calls can be part of a calling plan and therefore cost zero dollars.

Belo is an example of the raw call history in the bills. Please note, names and phone numbers are anonymous.

Caller Number

Caller Name

Call Date

Call Time

Duration

Function

Calling Number

Type

Feature

Cost

232422503

Abercrombie, Kim

15/04/2009

8:19:54

0:00

Off-Net TXT Message

717608688

TXT

FR

0.17

232422503

Abercrombie, Kim

15/04/2009

8:47:58

0:00

Off-Net TXT Message

717608688

TXT

FR

0.17

232422503

Abercrombie, Kim

15/04/2009

9:10:18

0:00

Off-Net TXT Message

717608688

TXT

FR

0.17

232422503

Abercrombie, Kim

18/04/2009

10:15:56

10:00

Toll Free

800033067

FR

0

Tim Coulter, the project owner of ExtremeML, came up with this brilliant idea of creating a call analysis spreadsheet using Excel and ExtremeML, making an intuitive and easy to use analysis interface for end users to make more sense out of their phone bills. In this article, we will take you through a sample application that does exactly this.

What is ExtremeML?

ExtremeML is an open source project founded by Tim Coulter and hosted on Codeplex.com. It is an extension library to Open XML SDK that allows document content to be created and maintained in a more intuitive manner, relieving developers of the burden of complying with the semantic rules of Open XML. In this sample application we will demonstrate how easy it is to use ExtremeML to write data to a spreadsheet.

For more information on ExtremeML, visit http://extrememl.codeplex.com/

What we want to achieve?

Moving away from the dull paper copy bills that only contain call histories, we ask ourselves; what would be a good approach to informatively present call information to users? We start by looking at what sorts of information are represented in the bills. In the sample bill table above, you will find the following information:

Caller number

The phone number for the billed customer

Caller name

Billed customer’s name

Call date

The date the call was made

Call time

The time in the day the call was made

Call duration

The length in time the phone call lasted

Place called

The geographical location the call was made to.

Possible values for this might be: Mobile Call, On-Net TXT Message, Off-Net TXT Message, VoiceMail Call, National Call etc…

Number called

The number dialed for the call entry

Feature Used

 

Call Type

Call types that determine the pricing of the call. Some possible values are PK (Peak), OP (Off Peak), TZ (Corporate Calls)

Charge

The cost amount in dollars

The key message the phone bill is to delivers to the customer is cost. This is a classic textbook scenario for using an OLAP cube. For those of you haven’t worked with an OLAP cube before, it is a very simple concept with a convoluted name. Some basic terms for a cube are:

Facts / measures

Numeric value that we are trying to make some sense out of.

In this case, the fact / measure is the cost.

Dimensions

Ways by which facts / measures are grouped / categorized

In this case, everything else can be a dimension for the cost.

A pivot table in Excel is a good way to present OLAP cubes. So the end result we are trying to achieve looks like the following:

For rows, we have a few hiararchical levels that we can drill down to. Start from Customer Name, then placed called and the phone number dialed under each place called. Further, we have the date and time each of the phone calls were made. Across the top, we have feature call type as well.

This way end users can very intuitively drill down to the appropriate level to find out what the aggregated cost is. For example, it is very easy to find out how much money a customer has spent calling one particular phone number over the billing period.

The Thinking Process – How is it done?

The thinking behind the implementation is very straightforward. We first create an empty table with the neccesary column names from the orginal .csv bill, then we create a pivot table based on the columns of this table. Save this away and use it as a template. In code, we read and parse the raw phone call history data from the source .csv files and populate the table with this data using ExtremeML. The pivot table will be automatially populated for us.

1.    Create a template using Excel

Starting with a template is standard practice for most Open XML development, as it saves a lot of time for developers from writing repetitive code for generating static document elements.

Tim Coulter has a very succinct article "The importance of Templates" in his ExtremeML Guide, which can be downloaded from http://extrememl.codeplex.com/. It is well worth the read.

There are two things we create in this template, a table and a pivot table. The table in the sample application looks like the following:

And the pivot table looks the the following:

For details on how to add a pivot table based on an empty table in Excel, refer to Tim Coulter’s "Adding a PivotTable" in the ExtremeML Guide from http://extrememl.codeplex.com/

2.    Use the template in code

// 1. Copy template
System.IO.File.Copy(@"Template.xlsx", DOCNAME, true);

First thing we do is make use of the template file we just prepared.

3.    We then open the .csv file with a TextReader to read in the data. Also open the document we just created with ExtremeML, getting ready to write data to it.

// 2. Read and write data
using (var tgtPkg = SpreadsheetDocumentWrapper.Open(DOCNAME))
using (System.IO.TextReader reader = System.IO.File.OpenText(args[0]))
{...}

4.    From the .csv file, we read one line at a time, parse them out on the comma delimiter, then write them into an in-memory storage.

// Open the data table
var tgtPart = (tgtPkg.WorkbookPart.WorksheetParts.Count > 0) ? tgtPkg.WorkbookPart.WorksheetParts["Data"] : null;
var table = tgtPart.TableDefinitionParts["Data"].Table;

// Create a in memory data store to write data to from the .csv file
var data = new List<Object[]>();

string currentRowStr;
while(!string.IsNullOrEmpty(currentRowStr = reader.ReadLine()))
{
   // Parse data row
   string[] rowArr = currentRowStr.Split(new char[] { ',' });

   DateTime callDate;
   if (string.IsNullOrEmpty(rowArr[SRCDATECOL]) ||
   !DateTime.TryParse(rowArr[SRCDATECOL], CultureInfo.CreateSpecificCulture("en-NZ"), DateTimeStyles.AssumeLocal, out callDate))
   {
      continue;
   }

   try
   {
      // Read and write data row
      data.Add(
         new Object[]
         {
            DateTime.Parse(rowArr[SRCBILLINGDATECOL], CultureInfo.CreateSpecificCulture("en-NZ"), DateTimeStyles.AssumeLocal).ToShortDateString() // Bill date
      , rowArr[SRCCALLERNUMBERCOL] // callerNumber
      , rowArr[SRCNAMECOL] // callerName
      , callDate.ToShortDateString()
      , rowArr[SRCTIMECOL] // Call Time
      , rowArr[SRCDURATIONCOL] // duration
      , rowArr[SRCPLACECALLEDCOL] // place Called
      , rowArr[SRCNUMBERCALLEDCOL] //numberCalled
      , null // Feature used
      , rowArr[SRCCALLTYPECOL] //call Type
      , double.Parse(rowArr[SRCCHARGECOL]) // Charge
   });
}
catch (Exception ex)
   {
      continue;
   }
}

5.    Finally, in one line of code with ExtremeML we are able to popluate the data table in the target .xlsx file with the data in the in memory object arrays.

// Fill the target table
table.Fill(data.ToArray());

6.    Build the project, and run the application via a command prompt.

7.    In the same directory that we ran the executable. Open the ExtremeMLPivotTable.xlsx file and we’ll see the table is now populated with data.

Click on Pivot sheet and you will see the pivot table is populated as well.

Conclusion

With 80 lines of code we are able to turn a phone bill with plain call histories into a very informative pivot table that allows users to slice and dice the data to provide some useful insights into their phone call usages. This was made easily utilising ExtremeML as it takes care of the packaging side of the Open XML document. Its API is pretty well designed and very intuitive to use.

As shown in some of the previous articles, using a template definitely reduces development effort significantly because it abstracts away all the static document elements. By doing so, it lets developers focus on the real business problems at hand.

  • With my tests of only 1100 rows it took almost ten minutes to load the array into the table.

Page 1 of 1 (1 items)