Welcome to OpenXML Developer Sign in | Join | Help

How can I retrieve text from docx file

Last post 07-21-2008, 3:39 AM by broshni. 2 replies.
Sort Posts: Previous Next
  •  07-20-2008, 5:26 AM 3485

    How can I retrieve text from docx file

    I have a simple docx file where I just type in as follows 

    Apple

    Orange

    Papaya

    Now I have xsl file which is written like this

    <?xml version="1.0" encoding="utf-8"?>

    <xsl:stylesheet version="1.0"

    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/3/main" exclude-result-prefixes="w">

    <xsl:output method="html" />

    <xsl:template match="/">

    <html>

    <body>

    <p>Before Loop</p>

    <xsl:for-each select="w:p/w:r">

    <p>Within Loop</p>

    <p>

    <xsl:value-of select="w:t"/>

    </p>

    </xsl:for-each>

    <p>After Loop</p>

    </body>

    </html>

    </xsl:template>

    </xsl:stylesheet>

    Now my problem is that I am not able to get into the for-each loop?

    I am using java to parse. I am sure my parser works perfectly but it only prints

    "Before Loop" and "After Loop" in the html output file.

    Help me out.Please

    Thanx in advance

    broshnikanta@gmail.com

  •  07-20-2008, 8:36 PM 3487 in reply to 3485

    Re: How can I retrieve text from docx file

    If you are applying that to the main document part (eg word/document.xml), you'll still need to get your XPath right.

    The w:p/w:r live inside w:document/w:body

    It can be convenient to run the transform on a single xmlPackage file (so your transform can use the styles part).

    Given that you are using Java, you might like to try docx4j.

    See for example the transform method in this class.

    If all you want is to extract the text from the document.xml, non-xslt approaches may work for you as well.  For example, docx4j has a P class representing the paragraph, from which you can get the text via a method call.


  •  07-21-2008, 3:39 AM 3489 in reply to 3487

    Re: How can I retrieve text from docx file

    As far as I know, I don't think it'srequired to specify 'w:document/w:body' explicitly in the xsl since in the xsl template I am using

    <xsl:template match="/">

    Any way, thanx for response with some minor changes, everything works fine.

    But I am not able to retrieve the images and put them in the appropriate position in the html file generated. Looking forward to quickest responses from your end this time again. Thanx in advance

     


View as RSS news feed in XML