Tuesday 31 March 2015

Translation quality measurement in practice



TRANSLATION QUALITY MEASUREMENT IN PRACTICE 
Riccardo Schiaffino Aliquantum 
Franco Zearo Lionbridge Technologies  
Abstract: This paper provides an overview of the Translation Quality Index (TQI), a measurement methodology that can be used as a reliable indicator of translation quality. The authors have been developing the TQI methodology for the past five years. The TQI was first implemented for commercial use in 2004.  
1. TRANSLATION QUALITY MEASUREMENT 
1.1 How did you start working on translation quality measurement? 
We have known each other for years—we both graduated in translation from the same university and currently live in the same area—but we had never worked together. In 2000, we were both giving presentations at the ATA Conference in Orlando, Florida. We started to talk about how people generally assumed that translation quality could not be measured. One thing led to another, and we decided to see if we could find a way to measure the quality of translation as a basis for process improvement. 
In the following three years, we developed our research and gave three presentations on how to measure and control translation quality. At the same time, we applied our work at the companies for which we worked: Lionbridge in Franco’s case and J.D. Edwards in Riccardo’s. Riccardo’s career at J.D. Edwards ended on December 31, 2003, following PeopleSoft’s acquisition. However, he continued researching translation quality measurement, developing tools to help the assessment and measurement of translation quality, and developing what we called the Translation Quality Index, or TQI. In the meantime, Franco proceeded to implement the translation quality measurement methodology at Lionbridge. The spreadsheet that Lionbridge uses to measure translation quality is the result of our joint collaboration. 
1.2 Why try to measure translation quality? 
Some people say, “You cannot manage what you cannot measure.” Applied to translation, this means that without some means to assess the quality of translation, it is not possible to improve translation quality, nor is it possible to know if the translation quality is good; and, if it is good, how to keep it that way. 
1.3 Is it possible to measure translation quality? 
We believe that it is possible to measure translation quality, although perhaps not directly: When measuring translation quality, we really measure the incidence of various types of errors and defects in the translated material; for example, errors of terminology, grammar, spelling, meaning, and others. Therefore, a good translation is one in which fewer errors are made.
1
 Experienced translators would summarize the criteria for recognizing a poor translation as follows: “I know it when I see it” (Note 1). However, this simplistic approach is not adequate in meeting the demands of today’s high-paced business environment. Like many business processes where the desired outcome is a product or service, quality measurements are not only possible, but necessary. Without objective ways to measure the quality of our work, we are left at the mercy of fickle evaluations by lay people who can be highly subjective and not entirely fair.  
We believe it is the translation profession’s responsibility to develop criteria that constitute an objective and fair evaluation of translation quality. Having said that, we heed the warning of the ATA. “Although the use of points may impart a certain impression of objectivity, it is in truth still subjective” (Doyle, 2003). 
1.4 Why measure errors when measuring translation quality? 
One important thing to consider is that the assessment of translation quality should be as objective as possible. What I like and what you like may be very different, but we should have some means to agree on certain standards.  
We believe it is easier to agree on what constitutes an error rather than on what constitutes “quality” in the abstract, and that an important factor in quality is the absence of errors. 
We also believe that summarizing all of the error points in a single index value will help us to synthesize the translation quality of a given text. Moreover, we can use statistical methods to determine if a translation process is in statistical control, if special causes are present, or even if we are improving our translation process. 
1.5 Do you believe there is one “ideal” translation process to ensure the best possible quality level? 
The process does not really matter as long as it yields the desired result. We believe that the very purpose of translation measurement is to obtain useful information for benchmarking the relative merits of various translation processes. 
The real question then is, “What is the most efficient process in terms of quality versus cost?” We believe that the ingredients of good quality translations are fairly reasonable, but very seldom found all together. These ingredients include the following: • Good translators with a sound linguistic and specific technical background • Detail-oriented editors and knowledgeable proofreaders • Thorough terminology work up front • Sufficient time to provide a good translation • Meaningful feedback and support from the customer 
2
1.6 How does translation quality measurement differ from other methods of translation quality assessment? 
Over the past 30 years, many methods of evaluating translation quality have been developed and proposed. Malcom Williams (2004) classifies these methods into two categories: Quantitative- centered systems and argumentation-centered systems. Williams characterizes quantitative- centered methods by some method of error counting, while argumentation-centered methods take a more holistic approach. Each method has its advantages and disadvantages, which we cannot elaborate here. Suffice it to say, the advantage of the quantitative-centered methods is that they lend themselves to quantifying errors and, therefore, make measurements possible.  
2. THE TRANSLATION QUALITY INDEX (TQI) 
2.1 What is the TQI methodology? 
The TQI methodology (along with similar initiatives such as the LISA QA Model and SAE J2450) is a quantitative-based method of translation quality assessment. It measures the number and type of errors found in a text and calculates a score, or TQI, which is indicative of the quality of a given translation.  
The distinctive traits of the TQI methodology are as follows: 
• Translation Quality Index. The Translation Quality Index is a number that is indicative of the quality of a given translation. It is obtained by the rigorous application of a quality assurance methodology. 
The Translation Quality Index attributes a value to a translated text, with 100 being an “error-free” translation. It is based on the number of error points in a given text or sample. Negative values are possible. The TQI is analogous to a temperature scale. We all have subjective interpretations of “cold,” “warm,” and “hot.” The use of a temperature scale (Fahrenheit, Celsius, or Kelvin) makes it possible to move from subjective perceptions to objective measurements. 
• Separation between error type and severity. There are no pre-assigned penalties for the different error categories. Each error can be marked as critical, major, or minor, depending on its consequences. Sometimes, an error can be classified in different ways; for example, if I type “car” instead of “cab”, it could be classified as a mistranslation, a terminology error, or even a typo. While a precise classification of translation errors might be of interest in an academic setting, such as translation training programs, it is often unnecessary in a business environment. 
• Strict criteria for the severity levels of errors. A TQI measurement should be objective, reproducible, and repeatable. To achieve these criteria, the evaluator has to follow certain rules when marking errors. 
3
2.2 What are error points and how do error points differ from errors? 
Using a typo as an example: if we find five typos, we count five errors. That is a rather simple form of error measurement, but not all errors are equal. There is a difference between a typo on the front cover of a manual and the same typo in a footnote. There are also typos that alter the meaning of a word, and typos that do not lead to confusion; for example, the word “*atttention” spelled with three ’t’s. This observation prompts us to assign different weights to errors depending on their consequences. In our previous example, we can decide to give minor typos a weight of “1,” and major typos a weight of “5,” “100,” or whatever. We call these weights “error points.” 
2.3 What were the difficulties when you started to put the TQI into practice? 
The purpose of the TQI and its ancillary tools is to make translation assessment as objective as possible. However, when we started to use the TQI tool, we realized that how we configured the score was not always a true representation of the translation quality. It is easy to form an idea about how good or bad a translation is and then semiconsciously try to convince oneself that a major error is minor, or a minor error is only a “preference,” so as not to push the TQI below the threshold that would make the translation fail. Also, accuracy errors are difficult to evaluate when there is a slight loss in meaning. Even grammatical errors are sometimes not as straightforward as one would think. Language, after all, is not a precise science.  
2.4 What makes a good evaluator? 
A good evaluator must be able to be as objective as possible. He or she must be able to distinguish between factual, tangible errors and stylistic preferences. We all have our pet peeves when it comes to translation choices. An objective evaluator realizes that he or she might have translated a sentence differently, but that the version chosen by the original translator is also acceptable. 
You can roughly classify evaluators into purists and descriptivists. The purists are those who like to think of language in terms of how it ought to be used. Descriptivists, on the other hand, take into account how people use the language in their daily lives. Each point of view has its pros and cons, and they each lead to very different interpretations of what is considered “right” and “wrong.”  
Moreover, if you give the same translation to two different evaluators, chances are that they will find a different number of errors or mark the same errors differently. A better solution would be to have the translation evaluated by a group of evaluators, in the same way that gymnastics resort to a panel of judges. Unfortunately, this solution proves to be too expensive in most commercial settings. 
What would be helpful is a certification program for evaluators, possibly sponsored by an independent, not-for-profit organization such as the ATA. This not-for-profit organization might create standards regarding error classification, severity levels, error points, and others. 
4
2.5 How do you distinguish between errors and stylistic preferences? 
Bruno Osimo says that translation is a process with one entry point and multiple exit points. (2004). As discussed earlier, there is more than one way to translate a given sentence, each version being roughly equivalent and any differences being a matter of style and personal preference. 
By definition, stylistic preferences are not errors and are ignored in the computation of the quality score. Therefore, it is necessary to establish clear rules that define what is an error and what is not an error. 
We have developed a three-pronged rule to determine whether a marked error is preferential or not. Basically, the evaluator has to answer the following three questions: 1. Is it grammatically correct? 2. Is the translation accurate? 3. Is the translation compliant with the glossary, style guide, guidelines, and client instructions? 
Answering the first two questions is not as easy as it might seem. In the case of grammatical correctness, for example, some languages might have authoritative language bodies; for example, Real Academia de la Lengua Española in Spain; Académie française in France; Nederlandse Taalunie in The Netherlands, and so forth. Other languages that lack such language authorities, such as American English, might have to rely on commonly accepted language conventions as described in authoritative reference books; for example Merriam-Webster’s Dictionary, The Chicago Manual of Style, and others. A third group of languages does not have established language conventions, as is the case with many languages in India. In such cases, it is important to develop glossaries and style manuals. 
Evaluating the degree of accuracy is another challenging task. We have developed flow charts similar to those created by the ATA for test evaluation for certification purposes. The intent is to see if there have been significant deviations in meaning. 
The last question serves the business purpose of delivering quality that conforms to the client’s specifications. This is generally more straightforward: Either the term is in the translation glossary or it is not. Either the translators followed the style guide and the instructions, or they did not. 
2.6 Can the TQI help in assessing the quality of machine translation? 
Absolutely. Some argue that human-based evaluations are too subjective, that MT should not be evaluated using human-based methods, and that such evaluations are too subjective. We do not agree. The TQI is a sort of Turing test. The Turing test was developed to indicate whether a machine was “intelligent” by testing its capability to perform human-like conversation. If a user cannot tell the difference between a text translated by a human and one by MT, then we could say that the two texts are equivalent. The TQI can help with this evaluation. If we agree that a TQI score of 80 or above is the mark of a good translation, it does not matter which localization process we used to obtain the score. In our experience, raw MT outputs have TQI scores below
5
zero. Processes that combine MT with human post-editing can elevate the TQI scores to levels that are more acceptable.  
2.7 Is there anything that the TQI methodology cannot measure?  
Yes. In our experience, there are a couple of cases where relying on the TQI methodology would be inappropriate. 
Because the TQI methodology is designed to measure tangible, factual errors, it shows its shortcomings when it comes to evaluating so-called “literal” or “word-for-word” translations. A literal translation might comply with the three-point preferential rule, grammaticality, accuracy, and compliance, and still be regarded as a poor translation.  
Another case where the TQI methodology proves to be ineffective is when a high degree of creativity is expected on the translator’s part, which is often the case with translations for marketing and advertising. In these types of text, translators and copyeditors might have a certain degree of freedom. It is an acceptable practice to deviate from the source text as long as the translator maintains the core message. Conversely, the TQI system penalizes deviations from the source text as accuracy errors, something that a translator in other circumstances is not allowed to do. 
In our experience with marketing texts, the translation might contribute 60-75% of the final version, the remainder coming from additions, deletions, and textual changes as deemed appropriate.  
3 ADDITIONAL RESOURCES 
Copies of our previous presentations, a translation quality web blog, and other materials can be found on our website, www.translationquality.com.  
4 NOTES 
1. We are referring to U.S. Supreme Court Justice Stewart’s remark about the difficulty in finding an objective definition for an obscene motion picture. In JACOBELLIS v. OHIO, 378 U.S. 184 (1964) he remarked:  
“I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.”   
5 REFERENCES 
1. Doyle, Michael Scott (2003). “Translation Pedagogy and Assessment: Adopting ATA's Framework for Standard ErrorMarking”, in The ATA Chronicle, November/December 2003. 
6
2. Osimo, Bruno (2004). Traduzione e qualità: la valuazione in ambito accademico e professionale, Hoepli, Milano, p. 25 
3. Williams, Malcom (2004). Translation Quality Assessment: An Argumentation-Centred Approach, University of Ottawa Press, Ottawa

1 comment:

  1. Visite-nos para usar o tradutor portugues alemao! Fazemostradução de germen para portuguêsusando APIs do Google ou da Microsoft.

    ReplyDelete