We are facing the issue of splitting a large Word (docx) file (300-1000+ pages) without loading it into memory. The splitting has to be done at page level, preserving the page number, page formatting, etc...
After splitting, we have to convert each page to PDF, then assemble the document, in order to avoid the memory issue. Can we do that in Open XML SDK using OpenXmlReader/Writer or other classes / methods? (it was suggested on a blog post, ref large Excel files)
Yes, you can do this.
You do not need to implement a streaming approach. I have used the Open XML SDK directly on very large documents - some up to 5000 or 6000 pages. You will not come close to using all available memory. In contrast, some spreadsheets contain millions of rows. When dealing with spreadsheets of that size, you must use a streaming approach, but this is not true for WordprocessingML.
You can use DocumentBuilder to do your document splitting for you. DocumentBuilder can do much of what you want to do - use the same section formatting, headers, footers. However, you will need to tweak the resulting document to preserve page numbering. One of the DocumentBuilder examples demonstrates shredding a document (and then recombining it). It breaks on headings instead of pages, but if your document has been paginated already, the difference in code is not too great. If your document has been paginated, it will contain w:lastRenderedPageBreak elements. See the DocumentBuilder Resource Center for more info.
When converting to PDF, you will want to use Word Automation Services.