Technical Quality Assurance in Structured Content

It ain’t art. At the very latest, when moving from unstructured to structured authoring, the last lone ranger in the technical publications department will realize that technical writing is not an art, but a craft—an engineering craft. Artistic attitudes, flights of fancy, and strokes of genius must give way to more abstract, collaborative, and technical approaches. Structured content has many benefits and plenty of implications for the technical documentation process. Apart from a shift in mindset from art to engineering and a plethora of new tools and standards that need to be mastered along the way, two facts stand out:

First, structured content is transparent. It can be dissected into ever smaller units. Gone are the large legacy monoliths in proprietary formats, with uncounted hidden parameters, individual settings, and case-by-case layout choices. Instead we have dozens, hundreds, or even thousands of XML files following a uniform, open, and well-defined structure that can be used to systematically analyze their content and components.

Second, structured content increases complexity and interdependence. Chunks of information that used to be duplicated, processed, and compiled over and over again in separate manual steps are now automatically processed, transformed, and fed into different processes. The artistic part is gone and with it the ability to adjust and intervene ad libitum. Single sourcing promises many gains in efficiency but it also means that we have to get things right, and we need to get them right the first time.

The first point means that a new and very different approach to quality assurance is possible. The second point means that it is absolutely necessary.

Need for Reliable Output

Need for reliable output In the medical device industry, accurate, complete, and compliant documentation is essential for the approval and release of new products. A missing warning statement or an incorrect plus or minus sign not only can cause the loss of millions of dollars, but can ultimately and literally have lethal consequences. It is therefore vital that the documentation is not only authored correctly, but also results in complete and correct output. This, of course, is true for any company, even if the possible damage may not be quite that high. Unfortunately, the transformation from the source material to the deliverables (manuals and help systems) is tricky.

XML environments that strictly separate form and content often do not come with full-fledged WYSIWYG support. For the writers, the transformation from sources to deliverables remains essentially a black box—a process full of pitfalls and outside their control. At this point, technical quality assurance (QA) enters the picture. By allowing the writers to control their source files and by supporting the alignment with business rules and information architecture, the risk of generating flawed output can be reduced dramatically.

Extending the V & V Framework

After migrating from unstructured FrameMaker to DITA, our company immediately experienced repeated and often difficultto-detect issues with output that did not correspond to the writers’ intentions. Moreover, those discrepancies were often detected much later in the process and then triggered a multitude of last-minute corrections and new publications. Some cases were due to defects in the transformation (stylesheets), others due to insufficient understanding of the information model or writers’ incorrect use of certain elements and attributes. If structured authoring is intended to make us more efficient and create a tangible return on investment, especially in localization, then we are forced to rethink our methods of quality assurance. The result of this process is depicted in Figure 1 on page 2, an adaptation of the V-Model from Systems Engineering.

The top and bottom layers remained, but in between a new level was added: the Technical QA Review. In the documentation lifecycle, the Technical QA Review is the formal step to make sure that the files “work.”

Files that Work

It soon turned out that incorrect output was only the most obvious and immediate problem. But quality issues can also manifest themselves further down the road. And, as usual, the longer the distance between the creation of the problem and its discovery, the more expensive it becomes to fix.

The scope of the Technical QA Review was therefore augmented. While it is still concerned with technical aspects (XML tagging), “files that work” now has a broader meaning:

The files work for all present purposes.
- Each transformed output is complete, correct, and faithful to the intentions of the writer.
Each output uses up-to-date templates.
- The files work as well as possible for all future purposes.
- A future release of the same publication can use the same files with minimal extra effort (ideally none at all).
- If the content happens to offer itself for reuse in a different publication or in a different format in the future, the files can be reused as far as possible without the need for any alterations.
- The files are suitably marked up for future translations and their correct rendering, without requiring any manual adjustments by the localization team.
- Future stylesheet changes and developments of the information model do not have unintended side-effects on the output. For example, such unintended side effects can arise from elements that are used for wrong purposes and when the formatting of those elements changes in a future stylesheet.

Because of the high level of automation with DITA, the localization process is particularly prone to surprises. Combinations of elements and text that work well in the English original sometimes turn out to be impossible to replicate in a foreign language or require the manual addition or removal of some elements. Layouts, particularly for tables, that look fine in the original can become impossibly convoluted in the translation because of text expansion or other text re-flow effects. And last but not least, inconsistent tagging can significantly reduce the benefit of translation memories.

Tool and Process Requirements

No one can go into a document and manually review hundreds of XML files by hand. Therefore, the Technical QA Review must rely on an automated support tool that analyzes the code and provides the reviewer with data to quickly identify potential problems. A QA report developed according to company-specific needs fills the gap. It highlights all the issues that can make the difference between “files that work” and “files that do not work,” which allows the technical review to focus quickly and efficiently on problematic code passages.

The best QA report will be worthless, though, if it is not embedded in a suitable process. With very tight deadlines for writers and many last minute content corrections, it is vital to the success of a technical QA initiative that the review is clearly defined in the authoring process. The following factors have been found critical in our implementation of the Technical QA Review:

Four-eye principle—The reviewer is not the same person as the writer.
Intermediate reviews—Writers use the QA report to review their work at the content development stage to eliminate as many issues as possible as early as possible.
Easy access—The QA report is easy and fast to produce and process. All writers are able to produce it themselves.
Explanations—The QA report does not only list possible issues, but also contains explanations and advice on how to analyze and resolve them.
Accountability—The Technical QA Review is a formal review, and the results are recorded. Writer and reviewer are held accountable for the review and the actions taken.

The Varian QA Report

At Varian Medical Systems, we have developed a QA report and a corresponding technical review process that are now applied in all DITA documentation projects.

It all started with a number of Perl scripts that required the DITA bookmap and its content to be exported from the content management system. At first, these scripts could be run and interpreted by only a few expert users. This approach worked in the early stages of the DITA adoption, but as the number of users and the volume of documentation grew steadily, the reviewers were soon overwhelmed.

Subsequently, a second version of the QA report was developed using XSLT. It is integrated into the transformation framework of the content management system, and, with additional explanations and advice, it is now a tool that is regularly used by writers to prepare for the technical review. Moreover, both the improved structure of the report and the increased expertise of the writers allow more and more of them to act as technical QA reviewers. The Varian QA report is generated as a Microsoft Excel workbook and has evolved from the original report into a form that fulfills four main functions:

Issues report with findings overview—Based on past experience, the report lists any occurrences of patterns in the XML code that are known to be detrimental or at least often problematic. For example, a figure that does not contain an image usually points to a problem. Each finding is reported together with the exact location in the code, so that the writer can analyze the passage and correct it if appropriate.
In addition, a findings overview lists each finding once, together with its severity and frequency in the book.
Statistics—The report reads out several statistics from the DITA code, including the frequency of all characters used in the text and whether they are supported by the standard corporate fonts.

It includes a list of all conditional processing attributes (product, audience and so on) with their values as used in the book; furthermore, it shows on which elements they occur and in which combinations. This information allows the advanced reviewer to gather information on the complexity of the book and on possible issues with reuse and localization.

Among other things, the report also lists all conrefs (and their resolution) as well as all inline elements. These lists help us review the consistency of tagging and spelling. For example, a sorted list of all the <uicontrol> or <cite> elements enables us to identify inconsistencies in spelling and find obsolete references to other publications.

Review form—The report provides fields for writer and reviewer to document the diligent execution of the Technical QA Review. In particular, the findings overview requires a statement from the writer on what action was taken or why a particular finding was considered uncritical and left unaddressed. The reviewer is then required to review the action taken and, if necessary, request rework.
Writer’s checklist—Since the findings overview requires a commitment from the writer on each finding, it was also found to be a suitable means for a technical checklist. For example, an entry in the findings overview will remind the writer to correctly identify all images that need to be localized.

Customizing the QA Report

Widening the scope, Technical QA Reviews need not only to be part of the authoring process but occur in a wider set of circumstances. Different situations call for different information in the QA report. In particular, the QA report should only contain those findings that are relevant and actionable in the context in which they are generated. The scenarios in Table 1 on page 4 can be distinguished, each having its own requirements. A customizable QA report, driven by parameters and tailored to specific needs, will increase efficiency and improve its acceptance considerably.

Adoption and Difficulties

Taking stock after two years of using a Technical QA Review process, a number of issues and challenges manifest themselves;

Time pressure. It is notoriously difficult to secure enough resources for reviews. Adding a new type of review is challenging, especially if it should be executed at the very last moment and after all the other content and text changes have been implemented.
Finding the right balance. Especially when they are very scarce, it is difficult to allocate the available time resources sensibly. The writer should not neglect major content issues because of minor technical issues and vice versa. Finding the golden mean remains a challenge even for very experienced authors.
Agency problem and awareness. Numerous issues in the XML code do not necessarily cause an issue for the initial publication and manifest themselves only in later process steps, typically when files are translated or reused in future versions or in other publications and formats. As there is no immediate benefit for the initial writer in fixing those findings, they tend to be underestimated or ignored, especially if the potential consequences (and benefits from avoiding them) are not fully understood.
Roles and burdens. The responsibilities of writer and reviewer are sometimes unclear. Because the reviewer is typically a more experienced user, the writer might rely on the reviewers to analyze or even fix issues that they could have resolved themselves.
Judgment calls and false positives. The QA report is a tool that reports on patterns that are typically problematic. But exceptions and borderline cases exist for many of these. Judging them correctly and recognizing false positives is very hard without intimate knowledge of the output transformations, localization processes, and future reuse scenarios.
Acceptance problems. Acceptance of the QA report depends decisively on the accuracy of the report and the explanations. If the report has too many false positives or if the report fails to provide clear explanations and advice, users are not prepared to spend valuable time sorting out all the questions that arise.
False sense of security. Having a tool provide a list of (potential) issues and guiding the writer through the process of addressing those creates the illusion that the DITA code is automatically fine if no findings or irregularities remain in the QA report. It is often forgotten that the report only supports the writer in getting the code right. In particular, being based on patterns known from previous incidents, the QA report will often not be able to help writers or reviewers detect novel issues.
Repetition of the same findings. The QA report in its current form is agnostic to findings from previous reviews. Any suspicious patterns that are analyzed by writer and reviewer and found unproblematic will still turn up in the next report, creating unnecessary overhead.

None of these challenges is easy to come by. But with ongoing training, steady communication, and team building, as well as technological advancement, there is hope to solve them.

In addition, we have started to improve the automated authoring support by converting rules and patterns into Schematron rules wherever possible. Using Schematron rules helps writers detect issues while they type and is expected to significantly reduce the later effort needed for completing the Technical QA Review.

Benefits

Notwithstanding the challenges just mentioned, the technical QA initiative is a success. Over the last two years we have benefited in many ways:

Improved quality. By enforcing coding standards, the documents show markedly fewer output issues and inconsistencies in text. Templates are applied consistently, and the cases of content unintentionally being omitted or added to certain outputs has fallen from 30 percent to less than 2 percent.
Reduced number of re-publications. As a direct result, the percentage of documents that had to be re-published because of faulty outputs has decreased significantly, saving us a considerable amount of time and money.
Improved understanding. Working on the Technical QA Reviews has prompted many writers to actively immerse themselves in the details of DITA and learn more about its principles and the company’s coding guidelines. It has proven an efficient means of encouraging writers and reviewers to talk to each other and share experience.
Reduced translation effort. The improved quality of the source material and the improved quality of the translated XML files have led to clear improvements in the quality of the translated outputs. In addition, by providing the translation team valuable information on the structure of the source material and allowing the team to efficiently analyze files received from translation vendors, the technical QA process has helped the team save at least 15 percent of the time needed to process the projects.
Joy. While making writers painfully aware of issues in their DITA code, the QA report has also given many of them encouragement and joy by helping them gain more confidence in their XML writing abilities as more and more often they get things right—and get them right the first time. Moreover, it gradually shows that in the end, consistent and correct tagging is not eating away time from content work, but by removing many technical obstacles it actually helps the writers concentrate on what matters most: the quality of the content.

About the Author

Richard Forster

Information Architect and Documentation Specialist, DITA