Tuesday, January 31, 2012

Using Saxon to embed an image's binary into Wordprocessingml (xml)

We have a process for taking an xml document (representative strictly of content) and generating a Microsoft Word document.  We leverage transforms to create a flat, single WordprocessingML file, and later leverage code to convert that to an OPC (the zip archive of DOCX).

One of our greatest challenges was getting images embedded into the document.  With the binaryData element, we knew it was possible, and our uses were in demand for it.

Because word can open the flat file, or the archive, we felt it was important to get the image binary into the XML, and into the base64 format.

In leveraging Saxon, and XSL 2.0, we were able to accomplish this.  (This functionality worked with both Saxon.Net as well as the java edition).

You’ll see below the use of defining java objects, and having the image data pulled in, and converted.

********** Stylesheet Snippet Below **********

<xsl:stylesheet
  version="2.0"
 
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/1999/xlink"
 
  xmlns:msp="urn:us:gov:ic:msp:v4"
  xmlns:ism="urn:us:gov:ic:ism:v2"
         
  xmlns:file="java.io.File"
  xmlns:uri="java.net.URI"
  xmlns:url="java.net.URL"
  xmlns:javatype="http://saxon.sf.net/java-type"
  xmlns:is="java.io.InputStream"
  xmlns:bis="java.io.BufferedInputStream"
  xmlns:bos="java.io.ByteArrayOutputStream"
  xmlns:b64="sun.misc.BASE64Encoder"
   
  exclude-result-prefixes="xs xsl xlink msp file uri url javatype is bis bos b64" >

<!-- Additional xsl here -->

<xsl:template match="myImageNode" >
    <!-- Additional xsl here -->
      <ns1:binaryData>
        <!-- Using the href attribute, try and embed the image binary into the xml -->
        <xsl:call-template name="javaReadFile">
          <xsl:with-param name="fileNamePath" select="@xlink:href" />
        </xsl:call-template>
      </ns1:binaryData>
    </ns1:part>
  </xsl:template>

  <!-- Template used to embed images into a Wordml document -->
  <xsl:template name="javaReadFile">
    <xsl:param name="fileNamePath" />

    <!-- Capture the target image file into a java.net.URL object -->
    <xsl:variable name="fileUrl">
      <xsl:choose>
        <xsl:when test="not(contains($fileNamePath, '://'))"><!-- Test for cases where it is not a URL -->
          <xsl:variable name="fileObj" select="file:new(string($fileNamePath))" /><!-- Capture in a java.io.File object -->
          <xsl:variable name="fileUri" select="file:toURI($fileObj)" /><!-- Convert to a java.net.URI object -->
          <xsl:value-of select="uri:toURL($fileUri)" /><!-- Convert to a java.net.URL object -->
        </xsl:when>
        <xsl:otherwise>
          <!-- Create a new instance of the object using a default constructor -->
          <xsl:value-of select="url:new(string($fileNamePath))"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

   <!-- Open the image file for reading -->
    <xsl:variable name="inStream" select="url:openStream($fileUrl) treat as javatype:java.io.InputStream" />
   
    <!-- Wrap the image file stream with a buffer stream for performance reasons -->
    <xsl:variable name="bInStream" select="bis:new($inStream) treat as javatype:java.io.BufferedInputStream" />
   
    <!-- Create a stream to hold the image binary and for encoding later  -->
    <xsl:variable name="bOutStream" select="bos:new() treat as javatype:java.io.ByteArrayOutputStream" />

    <!-- Read the image contents -->
    <xsl:call-template name="javaReadByte">
      <xsl:with-param name="inStream" select="$bInStream" />
      <xsl:with-param name="outStream" select="$bOutStream" />
    </xsl:call-template>

    <!-- Create a sun.misc.BASE64Encoder -->
    <xsl:variable name="encoder" select="b64:new()" />
   
    <!-- Output the image bytes into an encoded base64 string -->
    <xsl:value-of select="b64:encode($encoder, bos:toByteArray($bOutStream))" />

    <!-- Close the working streams -->
    <xsl:value-of select="bos:close($bOutStream)" />
    <xsl:value-of select="bis:close($bInStream)" />
  </xsl:template>

  <!-- Recursive function for reading one byte at a time from one stream and writing them into another -->
  <xsl:template name="javaReadByte">
    <xsl:param name="inStream" />
    <xsl:param name="outStream" />

    <!-- Capture the next byte into a local variable -->
    <xsl:variable name="byte" select="bis:read($inStream)" />

    <!-- Check to see if the byte represents the end of the file -->
    <xsl:if test="not($byte = -1)">
      <!-- Write the byte onto the new stream -->
      <xsl:value-of select="bos:write($outStream, $byte)"/>

      <!-- Call this template again (recursive) for reading the next byte -->
      <xsl:call-template name="javaReadByte">
        <xsl:with-param name="inStream" select="$inStream" />
        <xsl:with-param name="outStream" select="$outStream" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

No comments:

Post a Comment