Extraction parameters

The parameters in the <Extraction> element calibrate the size of the document batches that TEXTML Server indexes.

Starting at the oldest unindexed document, TEXTML Server takes a batch of documents, parses them, and writes all the values to the different indexes in parallel (according to the IndexationThreadCount value).

Extraction parameters should not be changed unless IXIASOFT Customer Support expressly recommends it.

Table 1. Docbase extraction parameters
Parameter name Default value Description
StopUpdatePeriod 1 Specifies the interval, in seconds, at which the indexing and deindexing tasks will check for pending read operations and permit interruption.
MaximumUpdateSize 67108864 Specifies the size (in bytes) of parsed documents that an indexing batch can contain.
MaximumSourceSize 10485760

Specifies the size (in bytes) of content that an indexing batch can contain.

OccAllocManagerBlockSize 4194304 Specifies the size (in bytes) of the blocks used in order to minimize memory allocation to the OS when indexing and deindexing.
ClusterIndexDocuments (not specified) Specifies the maximum number of documents that an indexing batch can contain. When not specified, maximum is 500 documents.
ClusterDeindexDocuments (not specified) Specifies the maximum number of documents that a deindexing batch can contain. When not specified, maximum is 500 documents.
IgnoreInvalidCharacters False  
LongXPathEvaluationTreshold 10 Length of time (in seconds) an XPath evaluation can take before triggering a warning.