Join
Sign in
Search Options
Search Everything
Search Formats
Home
Blog
Resources
Forums
About Open XML
More ...
Home
»
Forums
»
Formats
»
SpreadsheetML
»
Performance issue when deleting a first row in 20,000+ rows excel file
Performance issue when deleting a first row in 20,000+ rows excel file
Formats
Discussions about working with different Open XML Formats
Get this RSS feed
Details
2
Replies
0
Subscribers
Posted
over 3 years ago
Options
Subscribe via RSS
Share this
SpreadsheetML
Performance issue when deleting a first row in 20,000+ rows excel file
rated by 0 users
This post has
2 Replies |
0
Followers
codetao
25
Posted by
codetao
on
Wed, Dec 9 2009 11:25 AM
rated by 0 users
Post Points: 20
Performance issue when deleting a first row in 20,000+ rows excel file
Do anyone come across a performance issue when deleting a first row in a 20,000+ rows excel file?
I am using the delete row coding suggested in the openxml sdk document. It tooks me serveral minutes just to delete the first row using openxml sdk, But it only takes just a second in excel applicaton.
I eventually found out that the bottle-neck is actually on the buble-up approach in dealing with row deletion. There are many row updating after the deleted row. So in my case, there are around 20,000 rows to be updated, shifting up the data row by row.
I wonder if there is any faster way to do the row deletion.
Do anybody have an idea?
bral
95
Posted by
bral
on
Wed, Dec 16 2009 5:19 PM
rated by 0 users
Post Points: 20
Re: Performance issue when deleting a first row in 20,000+ rows excel file
How many columns of data are in your test spreadsheet? I did the same thing (roughly, I copied the examples to Linqpad and ran them from there) and used an XLSX file with 8 columns of data and 20k rows.
Deleting row 1 (or 2, or 3, etc) never took more than about 5 seconds.
Note that the example doesn't seem to handle cases with calculations.
codetao
25
Posted by
codetao
on
Wed, Jan 20 2010 11:02 PM
rated by 0 users
Post Points: 5
Re: Performance issue when deleting a first row in 20,000+ rows excel file
Thanks for the reply and taking this into consideration.
Actually the excel has 39 columns of data. So the total cells involved are: 20, 000 x 39 = 780, 000 cells.
I have actually made an experiment to discover how the deletion is done.
Here is the procedure:
I created a sample excel file with small amount of data, using Excel 2007. I deleted several top rows, then save the modified version as another file. I then compare these two files using an openxml sdk tool called "OpenXmlDiff".
Here is the founding:
I found that each cells data has to be modified after the top row is deleted. The reason for this is that the position of cell is hardcoded in the file format. Therefore, when the top row is deleted, the positions of the cells below the deleted row
ALL
need to be updated.
That is counterintuitive. I expect the row or cell is represented as just an xml node. Simply delete a node doesn't take long. And rendering the document in relative position is straight-forward.
Knowing that the current OpenXML version is still in CTP, I hope that the performance for heavy processing can be improved.
Page 1 of 1 (3 items)