Blog

Custom Business Rules for DITA Projects

Custom Business Rules for DITA Projects

What Are Business Rules

Business rules define or constrain some aspect of business, and always resolve to true or false. They assert business structure or control or influence the behavior of the business. Most importantly, they help a business achieve its goals.

Business rules apply to DITA projects as well. Examples of business rules for DITA include:

  • Titles must be uppercase
  • Short descriptions should not exceed 50 characters
  • Lists should contain more than one item
  • <codeblock> elements must have an @outputclass attribute value

If you’ve worked with oXygen, either on the desktop or within the IXIASOFT CCMS, you might have noticed messages that appear at the bottom of the topic, telling you things like, “Value xyz is not in the set of allowed values defined in the subject scheme…” or “Attribute ‘module’ is not allowed to appear in element ‘p’.” Schematron, working in real time behind the scenes while you write, is responsible for these messages.

These types of errors are structural. That is, they are caused by violations of the DITA structural rules defined in the DITA DTDs. However, there are many other rules that cannot be captured by the DTDs. The four examples above are such rules. There is nothing in the DITA DTDs to require that titles be uppercase, nor is there a way to capture that requirement in a DTD, or any other kind of schema such as XSD or RNG.

You might rely on a style guide and your writers’ memories to enforce these rules, or you might have internal peer reviews or an editor who can correct any deviations from these rules. But wouldn’t it be nice to provide writers with real-time reminders of these rules?

Schematron

Schematron fits the bill nicely. Schematron is an ISO standard used by hundreds of projects; it’s not unique to oXygen or any other application. It’s a natural language for making assertions in documents. The structural rules mentioned earlier are expressed as Schematron rules but the language is robust enough to express custom rules as well.

Schematron can also be used to verify data interdependency, such as to verify that a start date is before an end date. You can also check data cardinality or perform algorithmic checks. Many industries use Schematron, including finance, insurance, and government. Writers are probably most interested in using Schematron to enforce their style guide rules, although if you work in a highly regulated industry such as healthcare or aviation, you might find that you need Schematron’s data integrity checks as well.

Schematron is actually quite simple. There are five basic elements:

  • assert
  • report
  • rule
  • pattern
  • schema

Here is an example of a rule that enforces the rule “Lists should contain more than one item”:

<sch:rule context="ul">

   <sch:assert test="count(li)>1">

   A list must have more than one item</sch:assert>

</sch:rule>

This rule first establishes the context to which it applies: unordered lists, or <ul> elements. Next, the assert tests that the number (count) of items (<li> elements) in the unordered list is greater than one. If the test fails, the error to be displayed is “A list must have more than one item.” In addition to unordered lists, there are also ordered lists (<ol> elements) and simple lists (<sl> elements), so you would need additional rules to cover those contexts as well.

The previous rule displays an error if condition is not true. You can also have Schematron rules that display an error if the condition is true. These rules use sch:report rather than sch:assert. Here is an example of a Schematron rule that displays an error if a <title> element includes any inline bolding (<b> elements>:

<sch:rule context="title">

   <sch:report test="b">

   Bold is not allowed in the title element</sch:report>

</sch:rule>

Again, you could write additional rules for other inline styling elements such as <i> or <u> or you could expand this rule to check all three. You could also be more specific with the context, to apply this rule only to topic titles and not to section titles, table titles, or figure titles. If you have worked with XSL or XPath, the syntax of Schematron rules probably looks familiar to you.

Don’t worry though…you don’t necessarily have to create Schematron rules from scratch! If you already have a style guide, you can annotate it in such a way that you can automatically extract Schematron rules from it.

To get you started, there is also an intelligent style guide available as an open source project on GitHub: github.com/oxygenxml/integrated-styleguide. It allows you to enforce the rules in the style guide but is also annotated so that you can auto-generate Schematron rules from it.

Generated rules are based on generic rules, which you can create using a library of rules. Generic rules consist of abstract patterns such as

  • Avoid a word in a certain element
  • Limit the number of characters
  • Avoid duplicate content

Then to the generic “Avoid a word…” rule, for example, you add as a parameter the specific word to avoid. In the “Limit the number of characters” generic rule, you add as parameters the specific character and the limit number.

When generating Schematron rules from the abstract rules, you might first choose to generate a “Limit the number of characters” rule for <shortdesc> with an appropriate character limit. Then you might reuse the abstract rule with different character limit to generate another “Limit the number of characters” rule for <title>.

You can also use a chatbot to generate Schematron rules. Natural language statements such as, “Paragraphs should have less than 100 words” can be fed through Dialogflow (software for creating chatbots for websites, mobile applications, messaging platforms, and IoT devices) to create a Schematron rule such as:

<sch:pattern is-a="limitElement">

   <sch:param name="element" value="p"/>

   <sch:param name="limit" value="maximum"/>

   <sch:param name="size" value="100"/>

   <sch:param name="unit" value="words"/>

   <sch:param name="message" value="p should contain a maximum of 100 words"/>

</sch:pattern>

Implementing Schematron Rules

Now that you understand what a Schematron rule looks like, let’s look at how you implement them. You can specify a set of Schematron rules to apply to a specific topic by declaring them the rules file and the namespace in the topic itself:

<?xml-model href="rules.sch" type="application.xml"

       schematypesns="http://purl.oclc.org/dsdl/schematron"?>

But it’s (hopefully) unlikely you have individual topics or maps that require a unique set of rules; and besides, the approach is not really practical on a large scale.

Much more practically, you can associate Schematron rules with all DITA topics and maps, or you can develop project-specific rules to associate only with DITA files in a specific project. You can combine these as well and have some rules that apply across the board and additional project-specific rules.

If you’re already using XSD or RNG schemas to validate your content, you can add Schematron directly in the XSD appinfo element or in any element on any level of an RNG schema. The examples here are all for DITA content but you can use Schematron with any XML tagset, including custom XML tagsets. You can even use Schematron to validate Schematron rules!

Within oXygen, Schematron rules are implemented via frameworks. For example, the dita.framework uses several Schematron rules files, including dita-1.2-for-xslt2-mandatory.sch, dita-1.2-for-xslt2-other.sch, styleguide.sch, and accessibility.sch. To expand on these rules, you can create your own oXygen framework that uses rules in addition to these, or even instead of these.

Schematron Quick Fixes

Identifying issues via Schematron business rules is important but even more important is resolving those issues. That’s where Schematron Quick Fixes (SQF) can come in.

The assertion message alone (such as “p should contain a maximum of 100 words”) is not always enough to help the writer find and fix the problem. If possible and logical, you should also provide some proposals or suggestions to help the writer–similar to the way that spellcheck offers suggested words when it finds a misspelling.

Generally, an XML expert–someone who knows the XML structure inside and out creates Quick Fixes that help a Subject Matter Expert (SME)–who knows the content but is naive about XML–find and fix the problem. The XML expert’s work can create a much less frustrating authoring experience for the SME and enable him or her to simply get on with creating the content.

SQF is an extension of the ISO Schematron standard; that is, Quick Fixes are an extension of Schematron rules. Even if you are using an application that doesn’t recognize the solution, you can still use the business rules. The Schematron Quick Fixes specification is still in draft with the W3C, so if you decide to use it, you can offer feedback at schematron-quickfix.github.io/sqf.

As an example: say you have a business rule that requires that all sections have an ID. As a solution, you could offer “Add an ID to the current section” or “Add an ID to all sections in the document.” The writer can select the appropriate action and behind the scenes, a unique ID is created (using some predetermined pattern) and added to the current section or all sections.

Similarly, you can use SQF to convert text links to xref so that they are clickable, or to add missing cells to tables. There’s really no limit to the solutions you can provide with SQF.

You can also add SQF options to ignore errors found by the business rules. Obviously, you would only do this for rules that can be optionally applied, where the rule is mainly informational.

Blog Author

Leigh White

DITA Specialist at IXIASOFT

This blog was originally presented as an IXIAtalks webinar by Philipp Baur, Keith Schengili-Roberts and Sydney Jones. You can find the webinar here.

Learn more about our IXIAtalks webinar series.



X