Technology and QA in localization: frenemies with benefits

In the first article of this series we looked at what quality assurance exactly is in the context of localization and we also examined some of the current practices in the industry. It is now time to turn our focus to the technology available for QA. This is a rather important chapter: between today and the time when the first QA checks were incorporated in CAT tools almost 20 years ago, a lot has changed in the industry, both in terms of demands in quality and in terms of technological capabilities. Let’s have a brief look at the history and the current state of affairs.

The evolution of QA technology

When the first translation tools were introduced in the mid-1980s, the only means for quality assurance was effectively a human proofreader. Spellcheckers then slowly became more diverse and more popular in word-processing applications. Later on, terminology management tools, which started emerging as companions to translation memories, provided a second layer of quality assurance checks, and in the late-1990s all these functions were incorporated in the first CAT tool to offer this kind of range. Other tools followed this example for more than a decade; CAT tools would develop QA functionality and include it in their suite of applications and plug-ins. The first tool designed and developed as a stand-alone QA-focused application was officially launched only in 2004. (Not that long ago!)

So, to summarise:
– There is a gap of nearly 15 years between the creation of the first CAT tool ever and the incorporation of QA checks in the first CAT tool to do that.
– There is a gap of another 15 years between the time QA checks were introduced in a CAT tool for the first time and the launch of the first ever stand-alone QA tool.
– Since that time until today, it’s been only thirteen years.

This staggered evolution of quality assurance technology presents an interesting dynamic. Over time it is obvious that the need for QA checks became more and more pressing, as the automated processes supported by the continuously developing CAT tools provided the conditions for such functions to be developed. When the first stand-alone QA tools came about, CAT tools were already well ahead in terms of development. However, QA had never been a part of the core business for CAT software developers. In the early days, QA checks were a nice thing to have, but it took years before they were considered essential. Nowadays the situation is different: more and more tools (both CAT and QA) have emerged, online CAT systems are becoming common and the demand for more efficient technology is growing fast.

Where we stand now

Today we could classify QA technologies in three broad groups:
– built-in QA functionality in CAT tools (offline and online),
– stand-alone QA tools (offline),
– custom QA tools developed by LSPs and translation buyers (offline).

Built-in QA checks in CAT tools range from the completely basic to the quite sophisticated, depending on which CAT tool you’re looking at. Stand-alone QA tools are mainly designed with error detection/correction in mind, but there are some that use translation quality metrics for assessment purposes (so they’re not quite QA tools as such). Custom tools are usually developed in order to address specific needs for a client or a vendor who happens to be using a proprietary translation management system or something similar. This obviously presupposes that the technical and human resources are available to develop such a tool, so this practice is rather rare and exclusive to large companies that can afford it.

Regardless of which of these three types of QA tool we examine, in an average localization workflow there are issues of integration which are worth looking at in more detail – and we will do exactly that in our next article. For now, let’s focus on what this technology can do for us.

Consistency, consistency, consistency. (And nothing else?)

Terminology and glossary/wordlist compliance, empty target segments, untranslated target segments, segment length, segment-level inconsistency, different or missing punctuation, different or missing tags/placeholders/symbols, different or missing numeric or alphanumeric structures – these are the most common checks that one can find in a QA tool. On the surface at least, this looks like a very diverse range that should cover the needs of most users. All these are effectively consistency checks. If a certain element is present in the source segment, then it should also exist in the target segment. It is easy to see why this kind of “pattern matching” can be easily automated and translators/reviewers certainly appreciate a tool that can do this for them a lot more quickly and accurately than they can.

Despite the obvious benefits of these checks, the methodology on which they run has significant drawbacks. Consistency checks are effectively locale-independent and that creates noise, i.e. false positives (the tool detects an error when there is none) and false negatives (the tool doesn’t detect an error when there is one). Let’s look at an example:

Source (en-GB): Could we meet on 3/4/2017 at 2:30pm?
Target (fr-FR): Est-ce qu’on peut se rencontrer le 3 avril 2017 à 14h30?

With the exception of a couple of systems (after substantial customisation by the user), QA tools which rely on consistency checks would produce no less than four instances of noise in this plain-looking segment:
3/4/2017 – 3 avril 2017: number ‘4’ is missing from the target, so that would be marked as an error – however we know the date is correctly localized, so that’s a false positive.
2:30pm – 14h30: number ‘2’ doesn’t exist in the target and number ‘14’ doesn’t exist in the source, so both of these would be marked as errors – however we know the time has been correctly localized, so these are both false positives.
2:30pm? – 14h30?: the required space which is missing before the question mark in the target would not be marked as an error – however we know that’s an error in the target locale, so that’s a false negative. (Interestingly, this would not be a false negative if the target locale was fr-CA.)

What’s that noise?

One can imagine how many issues such as the above can show up in a QA error report. Noise is one of the biggest shortcomings of QA tools currently available and that is because of the lack of locale specificity in the checks provided. It is in fact rather ironic that the benchmark for QA in localization doesn’t involve locale-specific checks. To be fair, in some cases users are allowed to configure the tool in greater depth and define such focused checks on their own (either through existing options in the tools or with regular expressions). But, this makes the process more intensive for the user and it comes as no surprise that the majority of users of QA tools never bother to do that. Instead they perform their QA duties relying on the sub-optimal consistency checks which are available by default.

All these issues aside, and acknowledging the fact that no QA tool is meant to replace the reviewer (but, still, it should be there to help, not obstruct), the technology currently available can perform better and therefore it should perform better. In a future instalment we will review what the options are for improving performance and how the tools can cope with the expectations for greater accuracy.

For now, however, stay tuned for our next article, when we will examine the challenges of linguistic QA with respect to workflow, process integration and management in the context of localization.

Vassilis Korkas
COO at lexiQA