Background
More and more companies are going green these days, for one reason or another. Telecommunication companies are no exception and phone bills are a great starting point for them. Instead of graciously inking carbon on the glossy sheets of paper produced from the fine Canadian pine trees, some of them have switched to electronic bills to reduce their carbon footprints. Vodafone New Zealand is a sterling example of such cases.
However, the format we receive the bills in is plain old .csv files (a comma separated text based file format) and it appears to come from a transactional billing database because all it contains are raw calling details that run several pages long. One of the problems is that most people don’t have the time to check through each phone call to identify any incorrect tolls. Some are also not visible as many calls can be part of a calling plan and therefore cost zero dollars.
Belo is an example of the raw call history in the bills. Please note, names and phone numbers are anonymous.
Caller Number
Caller Name
Call Date
Call Time
Duration
Function
Calling Number
Type
Feature
Cost
232422503
Abercrombie, Kim
15/04/2009
8:19:54
0:00
Off-Net TXT Message
717608688
TXT
FR
0.17
8:47:58
9:10:18
18/04/2009
10:15:56
10:00
Toll Free
800033067
0
Tim Coulter, the project owner of ExtremeML, came up with this brilliant idea of creating a call analysis spreadsheet using Excel and ExtremeML, making an intuitive and easy to use analysis interface for end users to make more sense out of their phone bills. In this article, we will take you through a sample application that does exactly this.
What is ExtremeML?
ExtremeML is an open source project founded by Tim Coulter and hosted on Codeplex.com. It is an extension library to Open XML SDK that allows document content to be created and maintained in a more intuitive manner, relieving developers of the burden of complying with the semantic rules of Open XML. In this sample application we will demonstrate how easy it is to use ExtremeML to write data to a spreadsheet.
For more information on ExtremeML, visit http://extrememl.codeplex.com/
What we want to achieve?
Moving away from the dull paper copy bills that only contain call histories, we ask ourselves; what would be a good approach to informatively present call information to users? We start by looking at what sorts of information are represented in the bills. In the sample bill table above, you will find the following information:
Caller number The phone number for the billed customer Caller name Billed customer’s name Call date The date the call was made Call time The time in the day the call was made Call duration The length in time the phone call lasted Place called The geographical location the call was made to. Possible values for this might be: Mobile Call, On-Net TXT Message, Off-Net TXT Message, VoiceMail Call, National Call etc… Number called The number dialed for the call entry Feature Used Call Type Call types that determine the pricing of the call. Some possible values are PK (Peak), OP (Off Peak), TZ (Corporate Calls) Charge The cost amount in dollars
Caller number
The phone number for the billed customer
Caller name
Billed customer’s name
Call date
The date the call was made
Call time
The time in the day the call was made
Call duration
The length in time the phone call lasted
Place called
The geographical location the call was made to.
Possible values for this might be: Mobile Call, On-Net TXT Message, Off-Net TXT Message, VoiceMail Call, National Call etc…
Number called
The number dialed for the call entry
Feature Used
Call Type
Call types that determine the pricing of the call. Some possible values are PK (Peak), OP (Off Peak), TZ (Corporate Calls)
Charge
The cost amount in dollars
The key message the phone bill is to delivers to the customer is cost. This is a classic textbook scenario for using an OLAP cube. For those of you haven’t worked with an OLAP cube before, it is a very simple concept with a convoluted name. Some basic terms for a cube are:
Facts / measures Numeric value that we are trying to make some sense out of. In this case, the fact / measure is the cost. Dimensions Ways by which facts / measures are grouped / categorized In this case, everything else can be a dimension for the cost.
Facts / measures
Numeric value that we are trying to make some sense out of.
In this case, the fact / measure is the cost.
Dimensions
Ways by which facts / measures are grouped / categorized
In this case, everything else can be a dimension for the cost.
A pivot table in Excel is a good way to present OLAP cubes. So the end result we are trying to achieve looks like the following:
For rows, we have a few hiararchical levels that we can drill down to. Start from Customer Name, then placed called and the phone number dialed under each place called. Further, we have the date and time each of the phone calls were made. Across the top, we have feature call type as well.
This way end users can very intuitively drill down to the appropriate level to find out what the aggregated cost is. For example, it is very easy to find out how much money a customer has spent calling one particular phone number over the billing period.
The Thinking Process – How is it done?
The thinking behind the implementation is very straightforward. We first create an empty table with the neccesary column names from the orginal .csv bill, then we create a pivot table based on the columns of this table. Save this away and use it as a template. In code, we read and parse the raw phone call history data from the source .csv files and populate the table with this data using ExtremeML. The pivot table will be automatially populated for us.
1. Create a template using Excel
Starting with a template is standard practice for most Open XML development, as it saves a lot of time for developers from writing repetitive code for generating static document elements.
Tim Coulter has a very succinct article "The importance of Templates" in his ExtremeML Guide, which can be downloaded from http://extrememl.codeplex.com/. It is well worth the read.
There are two things we create in this template, a table and a pivot table. The table in the sample application looks like the following:
And the pivot table looks the the following:
For details on how to add a pivot table based on an empty table in Excel, refer to Tim Coulter’s "Adding a PivotTable" in the ExtremeML Guide from http://extrememl.codeplex.com/
2. Use the template in code
// 1. Copy templateSystem.IO.File.Copy(@"Template.xlsx", DOCNAME, true);
First thing we do is make use of the template file we just prepared.
3. We then open the .csv file with a TextReader to read in the data. Also open the document we just created with ExtremeML, getting ready to write data to it.
// 2. Read and write datausing (var tgtPkg = SpreadsheetDocumentWrapper.Open(DOCNAME))using (System.IO.TextReader reader = System.IO.File.OpenText(args[0])){...}
4. From the .csv file, we read one line at a time, parse them out on the comma delimiter, then write them into an in-memory storage.
// Open the data tablevar tgtPart = (tgtPkg.WorkbookPart.WorksheetParts.Count > 0) ? tgtPkg.WorkbookPart.WorksheetParts["Data"] : null;var table = tgtPart.TableDefinitionParts["Data"].Table;
// Create a in memory data store to write data to from the .csv filevar data = new List<Object[]>();
string currentRowStr;while(!string.IsNullOrEmpty(currentRowStr = reader.ReadLine())){ // Parse data row string[] rowArr = currentRowStr.Split(new char[] { ',' });
DateTime callDate; if (string.IsNullOrEmpty(rowArr[SRCDATECOL]) || !DateTime.TryParse(rowArr[SRCDATECOL], CultureInfo.CreateSpecificCulture("en-NZ"), DateTimeStyles.AssumeLocal, out callDate)) { continue; }
try { // Read and write data row data.Add( new Object[] { DateTime.Parse(rowArr[SRCBILLINGDATECOL], CultureInfo.CreateSpecificCulture("en-NZ"), DateTimeStyles.AssumeLocal).ToShortDateString() // Bill date , rowArr[SRCCALLERNUMBERCOL] // callerNumber , rowArr[SRCNAMECOL] // callerName , callDate.ToShortDateString() , rowArr[SRCTIMECOL] // Call Time , rowArr[SRCDURATIONCOL] // duration , rowArr[SRCPLACECALLEDCOL] // place Called , rowArr[SRCNUMBERCALLEDCOL] //numberCalled , null // Feature used , rowArr[SRCCALLTYPECOL] //call Type , double.Parse(rowArr[SRCCHARGECOL]) // Charge });}catch (Exception ex) { continue; }}
5. Finally, in one line of code with ExtremeML we are able to popluate the data table in the target .xlsx file with the data in the in memory object arrays.
// Fill the target tabletable.Fill(data.ToArray());
6. Build the project, and run the application via a command prompt.
7. In the same directory that we ran the executable. Open the ExtremeMLPivotTable.xlsx file and we’ll see the table is now populated with data.
Click on Pivot sheet and you will see the pivot table is populated as well.
Conclusion
With 80 lines of code we are able to turn a phone bill with plain call histories into a very informative pivot table that allows users to slice and dice the data to provide some useful insights into their phone call usages. This was made easily utilising ExtremeML as it takes care of the packaging side of the Open XML document. Its API is pretty well designed and very intuitive to use.
As shown in some of the previous articles, using a template definitely reduces development effort significantly because it abstracts away all the static document elements. By doing so, it lets developers focus on the real business problems at hand.
With my tests of only 1100 rows it took almost ten minutes to load the array into the table.