Understanding the DateTime_Formats.xml document

The DateTime_Formats.xml document lists date and time patterns for each locale. It provides date formats as follows:

<format NAME="en_US" LANG="en" DELIM="; - / \ , .'" 
COMB="DateTime" INFO="English/United States" STRICT="No">
  • NAME contains the name of the format, which is a locale. Each format proposes a set of patterns.
  • LANG contains the language of the format.
  • DELIM lists the recognized separators.
  • COMB indicates the order in which TEXTML Server evaluates times and dates when they appear in the same element.
  • INFO describes the locale.
  • STRICT indicates that the evaluated date must match the length of the pattern.

    STRICT is set to Yes for the ISO format only.

About Locales (NAME attribute)

The format name is identified with a four-letter code as xx_YY, where xx represents the language code, and YY the country code. For example, en_US is the locale for American English, and en_CA is the locale for Canadian English.

The language code follows the ISO-639 standard, while the country code follows the ISO-3166 standard.

For the complete ISO-639 standard, go to http://www.w3.org/WAI/ER/IG/ert/ iso639.htm. For the complete ISO-3166 standard, go to http://www.iso.org/iso/en/ prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html.

Three additional formats are also defined:

  • IXIA_ym_num, for dates, written with numbers, that contain the year and the month only. This format specifies that the year always appears first.

    For example, 01/02 is read as February 2001.

  • IXIA_my_num, for dates, written with numbers, that contain the year and the month only. This format specifies that the month always appears first.

    For example, 01/02 is read as January 2002.

  • ISO 8601, for dates written according to the ISO 8601 standard

About Patterns

The symbols used in the date and time format patterns are the following:

  • y for the year.
  • M for the month.
  • d for the day in the month.
  • h for the hour from 1 to 12 am/pm.
  • H for the hour from 0 to 23.
  • m for the minutes.
  • s for the seconds.
  • E for the day of the week spelled out.
  • a for the am/pm marker.
  • w for the week in a year.
  • e for the number of the day in a week.

The repetition of each symbol describes how the date is written:

  • MM, M, d, dd, represent the month and the day as numbers; yy represents the year written on two characters.

    For example, MM/dd/yy represents dates written as 06/18/02.

  • EEE, MMM represents the abbreviation of the day and the month.

    For example, EEE, MMM dd, yy represents dates written as Tue, Jun 18th, 02.

  • EEEE, MMMM, yyyy represents the complete day, month, and year.

    For example, EEEE, MMMM d, yyyy represents dates written as Tuesday, June 18th, 2002.

  • H:mm:ss represents times written as 14:35:30.

About Separators (DELIM attribute)

For each date or time format, TEXTML Server recognizes a set of separators. It can therefore recognize and properly index virtually any date or time, whatever the separator used.

About Date and Time Combinations (COMB attribute)

TEXTML Server can read date and time combinations, such as Thursday, October 31, 2002, 17:03.

The COMB attribute indicates the order in which the date and time appear so that the indexing engine can recognize them.

For example, let’s assume you specify a format that accepts EEEE d MMMM yy and MMMM yy as patterns for dates, and h:mm:ss a and H mm ss as patterns for times.

If the COMB attribute is set to TimeDate, the indexing engine will try to match the date in the XML element with each date pattern in the following order:

  • EEEE d MMMM yy
  • MMM yy
  • h:mm:ss a EEEE d MMMM yy
  • H mm ss EEEE d MMMM yy
  • h:mm:ss a MMM yy
  • H mm ss MMM yy

Once a pattern is recognized, the content of the element is indexed.