wordpress hit counter
Re: Determine Validity & Client Application from OPC - Open Packaging Convention - Formats - OpenXML Developer

Re: Determine Validity & Client Application from OPC

Formats

Discussions about working with different Open XML Formats

Determine Validity & Client Application from OPC

  • rated by 0 users
  • This post has 4 Replies |
  • 3 Followers
  • Does anyone know of a simple and straighforward way to determine:

    1. if a package is a valid Open XML file (.xlsx, .pptx, etc.) - again, from inside the package, not based on file extention
    2. What type of file we're dealing with - a Word, Excel or PowerPoint file. I'm not looking for sub types (e.g. an .xlst or a .pptx or a .dotm, etc.), just whether or not it is a W, E or PP file (and potentially 2007 vs. 2010).

    #2 is more important that #1.

     

    I'm looking at all the parts from package.GetParts and trying to figure it which ContentType is the one I should be looking for. Or is there a different/better way?

  • This is how I determine the document type for a package:

                using (Package package = Package.Open(stream, FileMode.Open))
                {
                    PackageRelationship relationship = package.GetRelationshipsByType("http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument").FirstOrDefault();
                    if (relationship != null)
                    {
                        PackagePart part = package.GetPart(PackUriHelper.ResolvePartUri(relationship.SourceUri, relationship.TargetUri));
                        switch (part.ContentType)
                        {
                            case "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml":
                                return typeof(WordprocessingDocument);
                            case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml":
                                return typeof(SpreadsheetDocument);
                            case "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml":
                                return typeof(PresentationDocument);
                        }
                        return typeof(Package);
                    }
                    return null;
                }
    

    You can find this in the PowerTools for OpenXML core code.

    Validation of the XML can be done using the OpenXmlValidator (from DocumentFormat.OpenXml.Validation). An example of this can also be found in the PowerTools for OpenXML for the Confirm-OpenXmlValid cmdlet. (See ValidateXml in the PowerToolsExtensions.cs file.)

    Note that if the file is not a valid package, it will fail to open, so you would need to catch that exception as well.

  • Thanks Bob. This is the path I started down, almost identical, but then started running into files that don't have those MIME types, like a .pptm (which uses "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" instead of "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml").

    So I guess I'll just create all the different formats and locate the "...main+xml" MIME and put that into a dictionary for look up. Much appreciated for your help here.

  • Ah, it gets worse. If I have a .PPTM file with an embedded Excel document (from a chart or otherwise), it reports both "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" and "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" in the content types (mime). This means that with just the code above, my PPTM would be reported as an XLSX. Need to search further on this.

  • This is what I was able to work out, using URIs instead:
        Sub SetPackageType(package As Package)
            Dim isError As Boolean = False
            Dim openXMLUris As New Dictionary(Of String, String)
            With openXMLUris
                .Add("/ppt/presentation.xml", "PowerPoint")
                .Add("/word/document.xml", "Word")
                .Add("/xl/workbook.xml", "Excel")
            End With
            Dim parts As PackagePartCollection = package.GetParts
            If parts.Any(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)) Then
                Dim applicationURI = parts.Where(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)).SingleOrDefault.Uri.OriginalString
                Dim applicationType = openXMLUris(applicationURI)
                Select Case applicationType
                    Case Is = "Word"
                        Type = PackageType.Word
                    Case Is = "PowerPoint"
                        Type = PackageType.PowerPoint
                    Case Is = "Excel"
                        Type = PackageType.Excel
                    Case Else
                        Type = PackageType.Unknown
                End Select
     
                If Type = PackageType.Unknown Then
                    isError = True
                End If
            Else
                isError = True
            End If
     
            If isError Then
                Throw New System.Exception("This is not a valid Office document.")
            End If
        End Sub
Page 1 of 1 (5 items)