Does anyone know of a simple and straighforward way to determine:
#2 is more important that #1.
I'm looking at all the parts from package.GetParts and trying to figure it which ContentType is the one I should be looking for. Or is there a different/better way?
This is how I determine the document type for a package:
using (Package package = Package.Open(stream, FileMode.Open))
PackageRelationship relationship = package.GetRelationshipsByType("http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument").FirstOrDefault();
if (relationship != null)
PackagePart part = package.GetPart(PackUriHelper.ResolvePartUri(relationship.SourceUri, relationship.TargetUri));
You can find this in the PowerTools for OpenXML core code.
Validation of the XML can be done using the OpenXmlValidator (from DocumentFormat.OpenXml.Validation). An example of this can also be found in the PowerTools for OpenXML for the Confirm-OpenXmlValid cmdlet. (See ValidateXml in the PowerToolsExtensions.cs file.)
Note that if the file is not a valid package, it will fail to open, so you would need to catch that exception as well.
Thanks Bob. This is the path I started down, almost identical, but then started running into files that don't have those MIME types, like a .pptm (which uses "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" instead of "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml").
So I guess I'll just create all the different formats and locate the "...main+xml" MIME and put that into a dictionary for look up. Much appreciated for your help here.
Ah, it gets worse. If I have a .PPTM file with an embedded Excel document (from a chart or otherwise), it reports both "application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml" and "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" in the content types (mime). This means that with just the code above, my PPTM would be reported as an XLSX. Need to search further on this.
This is what I was able to work out, using URIs instead:
Sub SetPackageType(package As Package)
Dim isError As Boolean = False
Dim openXMLUris As New Dictionary(Of String, String)
Dim parts As PackagePartCollection = package.GetParts
If parts.Any(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)) Then
Dim applicationURI = parts.Where(Function(f) openXMLUris.Keys.Contains(f.Uri.OriginalString)).SingleOrDefault.Uri.OriginalString
Dim applicationType = openXMLUris(applicationURI)
Select Case applicationType
Case Is = "Word"
Type = PackageType.Word
Case Is = "PowerPoint"
Type = PackageType.PowerPoint
Case Is = "Excel"
Type = PackageType.Excel
Type = PackageType.Unknown
If Type = PackageType.Unknown Then
isError = True
isError = True
If isError Then
Throw New System.Exception("This is not a valid Office document.")