wordpress hit counter
Importing HTML that contains Numbering using altChunk - OpenXML Developer - Blog - OpenXML Developer

Importing HTML that contains Numbering using altChunk

Blog

Samples, Demos, and Reference Articles

Importing HTML that contains Numbering using altChunk

Rate This
  • Comments 2

It is possible to import HTML that contains bullets or numbering using atlChunk. Word 2007 or 2010 imports the numbered items and creates the appropriate WordprocessingML markup, as well as necessary numbering styles, to create a word-processing document that looks as close as possible to the original HTML. The following example alters a document by adding an altChunk element at the end of the document. The HTML that is imported contains an ordered list.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;
class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test3.docx", true))
        {
            string html =
@"<html>
<head/>
<body onbeforeunload=""RunOnBeforeUnload()"">
<h1>Html Heading</h1>
<ol>
<li>one.</li>
<li>two.</li>
<li>three.</li>
</ol>
</body>
</html>";

            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                "application/xhtml+xml", altChunkId);
            using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
            using (StreamWriter stringStream = new StreamWriter(chunkStream))
                stringStream.Write(html);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }
    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }
    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}

  • Eric that's great. But I have a problem when the HTML contains complex characters such as Arabic, Hindi etc. In the final document I am getting garbage.

  • Hi Eric,

    I appreciate all the blogs you've written about OpenXML and they've been a true asset during the implementation of my new project, but for some reason this code sample does absolutely nothing in my case.

    What could be the cause of this?

    It doesn't give an exception, the file is valid, but nothing happened whatsoever.

    Thanks in advance,

    - Yannick

Page 1 of 1 (2 items)