wordpress hit counter
OpenXML Developer
Goodbye and Hello

OpenXmlDeveloper.org is Shutting Down

There is a time for all good things to come to an end, and the time has come to shut down OpenXmlDeveloper.org.

Screen-casts and blog posts: Content on OpenXmlDeveloper.org will be moving to EricWhite.com.

Forums: We are moving the forums to EricWhite.com and StackOverflow.com. Please do not post in the forums on OpenXmlDeveloper.org. Instead, please post in the forums at EricWhite.com or at StackOverflow.com.

Please see this blog post for more information about my plans moving forward.  Cheers, Eric

getting error "File contains corrupted data" when trying to open excel file using OpenXML

  • rated by 0 users
  • This post has 51 Replies |
  • 17 Followers
  • Hi,
    i am having application developed in .Net 2.0 and using OpenXML to import excel file into aplication but getting following exception on this line of code

    SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open(Path, true);

    Can anybody please help me on this as this is bvery much urgent for me.
    Following are exception details

    --------------------------------- Error message -------

    System.IO.FileFormatException: File contains corrupted data.
    at MS.Internal.IO.Zip.ZipIOEndOfCentralDirectoryBlock.FindPosition(Stream archiveStream)
    at MS.Internal.IO.Zip.ZipIOEndOfCentralDirectoryBlock.SeekableLoad(ZipIOBlockManager blockManager)
    at MS.Internal.IO.Zip.ZipIOBlockManager.LoadEndOfCentralDirectoryBlock()
    at MS.Internal.IO.Zip.ZipArchive..ctor(Stream archiveStream, FileMode mode, FileAccess access, Boolean streaming, Boolean ownStream)
    at MS.Internal.IO.Zip.ZipArchive.OpenOnFile(String path, FileMode mode, FileAccess access, FileShare share, Boolean streaming)
    at System.IO.Packaging.ZipPackage..ctor(String path, FileMode mode, FileAccess access, FileShare share, Boolean streaming)
    at System.IO.Packaging.Package.Open(String path, FileMode packageMode, FileAccess packageAccess, FileShare packageShare, Boolean streaming)
    at System.IO.Packaging.Package.Open(String path, FileMode packageMode, FileAccess packageAccess, FileShare packageShare)
    at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(String path, Boolean readWriteMode)
    at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(String path, Boolean isEditable, OpenSettings openSettings)
    at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(String path, Boolean isEditable)
    at Banking_ImportStatements.ImportOnlineTemplate(String name, Int64 len) in e:\Projects\Siondo ERP\2 Siondo ERP EXECUTION\3 Source Code\SiondoERP\SiondoERPUI\Banking\ImportStatements.aspx.cs:line 1121}

    Thanks & Regards,
    Vikas Jaigude.
  • Hi Vikas

    Are you sure the xlsx file is in correct OpenXML format?

    A simple test is to open it in Office.  Next is to validate it using the OpenXML SDK v2.0 Productivity Tool.

    To me this looks like the xlsx is not in a valid zip state..  Could you please try renaming the .xlsx file's extension to .zip and then extract this?
  • Hi, several years later this still seems to persist... We are experiencing the very same issue with Excel 2007 files (xlsx). Does anybody have a solution for this? We have checked the file (also the above mentioned renaming to ZIP and then unpacking) and the file itself is not corrupted. If we save this file with Excel 2010, the problem is gone. So it must be something with 2007 Excel format. PLEASE HELP!!!

  • Hi,

    Where did the file come from? Was it generated by an application or created by excel? Does the file validate with the open XML productivity tool?

    -Eric

  • Hi, thanks for answering! The file was created with Excel. It was actually created with Excel 2010, but then it was opened and saved in Excel 2007. And after this the problem started. I suspected that file was somehow corrupted and I created a new demo file the same was and behaviour was exactly the same. I can send you the file if you are willing to give it a try :-) The thing is that the file contains some locked and some unlocked cells on several worksheets. The worksheets are all password-protected and the whole file is again password-protected. Data can be entered in unlocked cells. This is a requirement that must be met. Again, when saved in Excel 2010 it works, when saved with xcel 2007 it doesn't work. Any ideas? :-)

  • Found this: social.msdn.microsoft.com/.../3adab393-4bb7-467f-becd-127583aa925f, it sheds some light into the problem, but I would really expect that Open XML should adapt to such anomalies and offer the same functionality through same APIs regardless of the file version (as long as it is XML, of course).

  • Additional question (that might bypass the problem exposed in this thread): is there any programatic way to convert a file from Excel 2007 to Excel 2010 format? Having Excel 2010 installed on the machine is a viable option, but still conversion must be done programatically from C# code... Clues?

  • I am getting same error while converting the word document to HTML. The Word document has some images embedded in it.  I have used Transforming Open XML WordprocessingML to HTML link to convert word document to HTML

  • Hi Yogesh, are you saying that the Word document will not open using the Open XML SDK?  Are you able to open the DOCX in Word?

    -Eric

  • Hi Eric,

    Yes, I can open the DOCX file in word.

    Facing some problems while using the OpenXML and Powertools to convert Word document to HTML.

    I have used DocumentFormat.OpenXml(v2.0) and OpenXmlPowerTools(v2.0) assemblies in the project.

    1. I can not use the (.DOC) file for conversion to HTML - Exception - File contains corrupted data.

    So, I have to use the (.DOCX) files for conversion to HTML. Is this the limitation?

    2. If file contains some images it is not correctly converted. If I Open the HTML file in browser.I can not see the images

    Following code I am using -

    public static void ConvertToHTMLUsingOpenXml(string fUpload, string htmlFile, string imagePath)

           {

               string sourceDocumentFileName = fUpload;

               string imageDirectoryName = Path.GetFileNameWithoutExtension(sourceDocumentFileName) + "_files";

               int imageCounter = 0;

               byte[] byteArray = File.ReadAllBytes(sourceDocumentFileName);

               using (MemoryStream memoryStream = new MemoryStream())

               {

                   memoryStream.Write(byteArray, 0, byteArray.Length);

                   WordprocessingDocument doc =

                       WordprocessingDocument.Open(memoryStream, true);

                   {

                       HtmlConverterSettings settings = new HtmlConverterSettings()

                       {

                           PageTitle = Path.GetFileNameWithoutExtension(sourceDocumentFileName),

                           ConvertFormatting = true,

                       };

                       XElement html = HtmlConverter.ConvertToHtml(doc, settings,

                           imageInfo =>

                           {

                               DirectoryInfo localDirInfo = new DirectoryInfo(imagePath + imageDirectoryName);//Server.MapPath(HTML_FILE_PATH + imageDirectoryName));

                               if (!localDirInfo.Exists)

                                   localDirInfo.Create();

                               ++imageCounter;

                               string extension = imageInfo.ContentType.Split('/')[1].ToLower();

                               ImageFormat imageFormat = null;

                               if (extension == "png")

                               {

                                   extension = "jpeg";

                                   imageFormat = ImageFormat.Jpeg;

                               }

                               else if (extension == "bmp")

                                   imageFormat = ImageFormat.Bmp;

                               else if (extension == "jpeg")

                                   imageFormat = ImageFormat.Jpeg;

                               else if (extension == "tiff")

                                   imageFormat = ImageFormat.Tiff;

                                if (imageFormat == null)

                                   return null;

                                string imageFileName = imagePath+ imageDirectoryName + "\\image" + // Server.MapPath(HTML_FILE_PATH + imageDirectoryName) + "/image" +

                                    imageCounter.ToString() + "." + extension;

                               try

                               {

                                   imageInfo.Bitmap.Save(imageFileName, imageFormat);

                               }

                               catch (System.Runtime.InteropServices.ExternalException)

                               {

                                   return null;

                               }

                               XElement img = new XElement(Xhtml.img,

                                   new XAttribute(NoNamespace.src, imageFileName),

                                   imageInfo.ImgStyleAttribute,

                                   imageInfo.AltText != null ?

                                       new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);

                               return img;

                           });

                       File.WriteAllText(htmlFile, html.ToString());

                   }

                   memoryStream.Close();

               }

           }

    Thanks,

    Yogesh

  • Hi Yogesh,

    First of all, it certainly is a limitation that you cannot use the Open XML SDK with the binary (.DOC) formats.  It isn't designed to do so.  However, there are a variety of options to convert from .DOC to .DOCX and then you can write code to transform to HTML or whatever.

    With regards to your issues around images, are you able to get the HtmlConverter examples that come with the core OpenXmlPowerTools to work?  Go to powertools.codeplex.com, go to the Downloads tab, download PowerTools Core 2.2.9, build and run "ExampleHtmlConverter03Images", and then look at the result.  It creates an HTML document, it create a directory to hold the images, all of the images are put in the directory, and the HTML can display the images from that directory.

    Are you able to get that example to work?

  • Hi Eric,

    Thanks for reply.

    I got the examples from site. Now I images are working fine.

    I have one DOCX file. It has some contents in the "Trebuchet MS" font. It not converting correctly.

    I can attach DOCX file and HTML file in this post. But not able to find any attach option. If you can provide your email ID, I can send you those files.

    Thanks,

    Yogesh

  • Hi Eric,

    I am facing one problem.

    I have used b2xtranslator (doc2x.exe) to convert he DOC files to DOCX. I have downloaded the files from http://b2xtranslator.sourceforge.net/

    After this,processed the file(DOCX file) to HTML using HTMLConverter of OpenXml.  In that case getting exception in file 'MarkSimplifier.cs' and method 'MergeAdjacentInstrText'  as Indexed out of bounds. Please Help.

    Thanks,

    Yogesh

  • Hi Yogesh,

    I have never used that b2xtranslator, so I don't know of its quality.

    Instead of using that, I would recommend the bulk DOC to DOCX converter that is published by Microsoft:

    blogs.msdn.com/.../bulk-convert-doc-to-docx.aspx

    There is a COM interface to that DLL.  I am searching around for that interface.  I've asked a member of the Word team regarding that.  I'll post the link when I get it.

    Depending on the architecture of your application, you may be able to use Word Automation Services, which is the absolute highest quality converter.  It is part of SharePoint 2010 and SharePoint 2013.

  • Hi, I'm using OpenXML to read an Excel spreadsheet into a SharePoint 2010 Document List. The upload file process works on my local VM. (it will upload the excel spreadsheet) but when I attempt to do the upload to our development box it gives the error 'File Not Found'. Is there a limitation that the file has to be on the same machine as SharePoint? Is there a work around?

    Thanks!

    Phillip Mitchem - Georgia Perimeter College

Page 1 of 4 (52 items) 1234