Does anyone know of a simple and straighforward way to determine:
#2 is more important that #1.
I'm looking at all the parts from package.GetParts and trying to figure it which ContentType is the one I should be looking for. Or is there a different/better way?
This is how I determine the document type for a package:
using (Package package = Package.Open(stream, FileMode.Open)) { PackageRelationship relationship = package.GetRelationshipsByType("http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument").FirstOrDefault(); if (relationship != null) { PackagePart part = package.GetPart(PackUriHelper.ResolvePartUri(relationship.SourceUri, relationship.TargetUri)); switch (part.ContentType) { case "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml": return typeof(WordprocessingDocument); case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml": return typeof(SpreadsheetDocument); case "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml": return typeof(PresentationDocument); } return typeof(Package); } return null; }
You can find this in the PowerTools for OpenXML core code.
Validation of the XML can be done using the OpenXmlValidator (from DocumentFormat.OpenXml.Validation). An example of this can also be found in the PowerTools for OpenXML for the Confirm-OpenXmlValid cmdlet. (See ValidateXml in the PowerToolsExtensions.cs file.)
Note that if the file is not a valid package, it will fail to open, so you would need to catch that exception as well.
Thanks Bob. This is the path I started down, almost identical, but then started running into files that don't have those MIME types, like a .pptm (which uses "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" instead of "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml").
So I guess I'll just create all the different formats and locate the "...main+xml" MIME and put that into a dictionary for look up. Much appreciated for your help here.
Ah, it gets worse. If I have a .PPTM file with an embedded Excel document (from a chart or otherwise), it reports both "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" and "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" in the content types (mime). This means that with just the code above, my PPTM would be reported as an XLSX. Need to search further on this.
This is what I was able to work out, using URIs instead:
Sub SetPackageType(package As Package) Dim isError As Boolean = False Dim openXMLUris As New Dictionary(Of String, String) With openXMLUris .Add("/ppt/presentation.xml", "PowerPoint") .Add("/word/document.xml", "Word") .Add("/xl/workbook.xml", "Excel") End With Dim parts As PackagePartCollection = package.GetParts If parts.Any(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)) Then Dim applicationURI = parts.Where(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)).SingleOrDefault.Uri.OriginalString Dim applicationType = openXMLUris(applicationURI) Select Case applicationType Case Is = "Word" Type = PackageType.Word Case Is = "PowerPoint" Type = PackageType.PowerPoint Case Is = "Excel" Type = PackageType.Excel Case Else Type = PackageType.Unknown End Select If Type = PackageType.Unknown Then isError = True End If Else isError = True End If If isError Then Throw New System.Exception("This is not a valid Office document.") End If End Sub