Well, Dennis, if you're a "protocols and formats standards junky" then you're probably in the right place. :-)
I appreciate your attempt to come up with a conceptual model for understanding Office Open XML packages. I'm trying to formulate some abstractions to aid my own understanding of the formats, and it's good to hear another perspective. On another thread, Stephane and Sanjay were discussing some of the distinctions between what constitutes a part (i.e., is a chunk of XML a part, or just one of the tangible artifacts of a part?), so it looks like we're all trying to pin down these concepts more pecifically.
I'm still mulling it all over, but here are a couple of observations after a first read of your thoughts ...
At the highest level, we just have parts, relationships, and content types, all bundled into a package. The fact that there is a hierarchical structure to the package is just an implementation detail, really, and in fact the draft Ecma spec strongly emphasizes the importance of not writing code to the hierarchy and writing it instead to the defined relationships.
I'm starting to see that's the key to understanding the formats, in general: thinking in terms of parts, relationships, and content types, rather than the physical implementations of those concepts. I started thinking about OOX at a concrete, "what's in the ZIP archive?" level, but last week Stephen Peront (who contributed the embedded-objects sample) pointed out to me that we really shouldn't think at that level. It's the first thing you see when you crack open a package, and of course as a developer you need to know the implementation details, but the part/relationship/content-type abstractions are at the core of what an "Open XML Format" is all about.
Another aspect of learning about Open XML is to divide it into the document markup languages and the packaging conventions. Many developers already know a lot about WordProcessingML, SpreadsheetML, and PresentationML from prior experience with Office 2003 and related products. But the packaging conventions are new to everyone. (Well, except Brian and a few of his cohorts.) So I think it's worth concentrating on the packaging conventions first, and viewing the document MLs as essentially blobs for now. That may not be an absolutely necessary abstraction, but anything that lets me ignore 90% of the spec feels like a good short-term tactic at this point on the learning curve. :-)
- Doug
- Doug Mahugh
Technical Evangelist, Microsoft