May 4, 2021 by Alison Tunley
Recently I have done some comparison of post-editing and human translation to evaluate the overall quality of the MT engines before they are used. These evaluations can also help a translation agency calculate the likely effort required for post editing the machine translated text by allocating a score to each segment in a sample text with a scale ranging from “zero editing required” up to a “complete retranslation”.
This work led me to an interesting research paper on the phenomenon of “post editese”, which compares the output of PEMT (post editing machine translation) with human translation and describes some of the linguistic differences found in these texts. The author, Antonio Toral, works in the Computational Linguistics group at the Center for Language and Cognition at the University of Groningen. Below I have linked to a blog post, which gives an accessible summary of Toral’s research in this area.
Three datasets were used in the research, all involving news and subtitles. Toral reflects that different findings may result if the focus is on more technical texts, particularly where terminology and consistency are key. What I found interesting was the nature of the differences found between post-edited and human translated text (PE versus HT). Firstly, in terms of lexical variety (i.e. size of vocabulary) HT comes out on top, followed by PE, with pure machine translation (MT) bringing up the rear. Lexical density reflects the amount of information conveyed in a text and, once again, HT scores more highly, this time with PE and MT being roughly equivalent.
The researchers also investigated the overall length of the translated text compared to the source text, capturing the result in a score they refer to as the length ratio. The higher the score, the greater the translated text diverges from the source in length. Once again, HT has the highest score, with MT having the lowest score and PE somewhere in between. This is interpreted as reflecting the greater freedom a human translator may have in producing the target text.
The final measure used to evaluate the translations captures the impact on part-of-speech sequences and reflects potential interference from the source text on the translated output. Once again, a significant difference was found between PE and HT, suggesting residual interference from the source language on PE output.
The overall conclusions are clear: post editing may resolve some of the issues involved in machine translation, but the resulting text still exhibits features that distinguish it from human translation with no machine input. This will inevitably impact the subjective quality of the translated output; the question is whether this is an acceptable trade-off given the greater speed of PEMT and its relatively good score when it comes to errors. The answer will naturally depend on the context and application, but it is important that clients understand the nature of the translation options available to them, and if possible, perform their own comparison of post-editing and human translation.
Alison is a seasoned freelance translator with over 15 years of experience, specialising in translating from German to English. Originally from Wales, she has been a Londoner for some time, and she holds a PhD in Phonetics and an MPhil in Linguistics from the University of Cambridge, where she also completed her First Class BA degree in German and Spanish… Read Full Bio
Sources
http://kv-emptypages.blogspot.com/2019/10/post-editese-is-real.html
Get a Free Quote
Our Accreditations
Recent Updates
International Day of Sign Languages
What is the International day of Sign Languages? The International Day of Sign Languages is one of the most recent arrivals in the annual procession of official days, having been introduced by the United Nations as recently as 2017. Read More
© 2024 All Rights Reserved
Rosetta Translation, 133 Whitechapel High St, London E1 7QA · 0207 248 2905
Comments
Add Comment