Generating output from content that includes entities

"Out of the box" the Output Generator from version 4.0 forward does not support entities in DITA content, meaning that they are simply not included in the output. This was not the case in earlier versions of the Output Generator, in which entities were supported.

The issue is not with the Output Generator itself, but rather with the DITA Open Toolkit, which does not support entities out of the box either. The underlying problem comes from Xerces's processing of entities. They have a memory leak that was never fixed. We had a patch for the leak in the Output Generator 3.4 because we were controlling which jar file was loaded.

To explain that last statement, here's a little history. In versions 3.4 and earlier of the Output Generator, we launched a single Java Virtual Machine (JVM) which ran all jobs. A memory leak in this one JVM was quite a problem so we created our own patch for Xerces to resolve the leak. This patch included a grammar caching fix as well, allowing the DITA-OT to output entities.

Beginning with version 4.0 of the Output Generator, we changed the infrastructure of the Output Generator to launch each job in its own JVM, making our memory leak patch unnecessary. The Output Generator now includes an unmodified version of the DITA-OT that does not include our Xerces patch. The unmodified DITA-OT therefore has the same old grammar caching issue with the entities.

While we no longer deliver the patched Xerces within the DITA-OT, we still deliver it within the Output Generator itself, so it is available to you. In the /libs/xerces folder of the Output Generator, there are three files:

  • resolver.jar
  • xercesImpl.jar
  • xml-apis.jar

Copy these three files to the /lib folder of the DITA-OT. Restart the Output Generator and entities should output correctly.