About indexed content

Indexed content is indexed in place of the document's content.

A document can contain many kinds of data, including content, custom properties, and indexed content. Indexed content is optional: your application may or may not store indexed content for some or all documents in a docbase.

If indexed content is present in a document, the indexed content can be retrieved whenever the document is retrieved from its docbase.

Whenever a docbase is indexed:

  • If indexed content is present in a document, the indexed content will be indexed (in place of the content).
  • If not, the content will be indexed (providing that the content is in XML).

Indexed content must be coded in (valid) XML. Accordingly, indexed content can always be indexed.

A document’s content, however, can be in XML — or it can be in PDF, JPEG, MP3, or any other binary or text format. However, TEXTML Server can index the content only if it is in XML.

Indexed content is application-specific. TEXTML Server does not provide a DTD for indexed content: the structure of indexed content is the responsibility of the application programmer.

How indexed content can be used

Indexed content is often used when the content of a document is not in XML.

Let’s say that the content is in PDF. The application program can read a PDF file, extract from the PDF some data to be indexed, and format the extracted data as XML. The program then:

  • Sets the PDF as the document’s content.
  • Sets the XML as the document’s indexed content.

As a result, whenever TEXTML Server indexes the docbase, index entries for the document will be based on the indexed content.