I have been given the following task: unzip an xlsx file, creating a directory structure of xml files, and then read one or more of those xml files back into Excel.
The result should be the same as if I directly opened the xlsx file. The XML file that seems to hold the data is sharedStrings.xml, but when I read that into Excel 2010, it does not produce the proper rows/columns. I suspect that information from the other files is needed, or a schema/stylesheet needs to be defined. The XML files point to schema.openxmlformats.org, but that page is inaccessible. Any ideas as to how to make this work, or if it is even possible?Thanks,Sanford Stein, CyberTools Inc.
I don't see why importing the file should be the same as opening the xlsx. The entire structure of files and folders is needed to define the spreadsheet. All strings are usually stored in sharedStrings.xml, but the information about what cells those strings appear in and their formatting appears in other xml files.
Although the zip file format does compress the data, it also provides the file and folder structure needed to define the entire spreadsheet. No one file contains enough data to "recreate" the whole spreadsheet.
Perhaps you could explain the task a little more and I might be able to help.
Bob,
Thanks for your response. It was painstaking, but I think I figured out the how the unzipped files can be used to recreate the data. xl/worksheets/sheet{x}.xml defines the cells, either as literal values or as pointers to the strings array defined in xl/sharedStrings.xml. Finally xl/workbook.xml defines the sheets. For my purposes, this should be sufficient. I can read these files, recreate the rows of data and put them in an internal data structure, or dump them to comma-separated files.
SS