This Is the Speed at Which We Are Approaching the Singularity in AI

– During the opening keynote of the AMTA conference in Orlando, Translated quantified the progress towards the singularity in artificial intelligence using data from machine translation quality improvement.
– The discovery was made possible by the analysis of a large amount of post-editing tasks performed by thousands of professional translators over many years.

Language translation was one of the first problems investigated by researchers in the domain of artificial intelligence. Yet it remains one of the most complex and challenging problems for a machine to perform at a human skill level. "That’s because language is the most natural thing for humans. Nonetheless, the data Translated collected clearly show that machines are not that far from closing the gap," said Translated’s CEO Marco Trombetti, while showing a preview of our discovery in the field during the Association for Machine Translation in the Americas 2022 conference, where he was invited to present the opening keynote speech.

Many AI researchers even say that solving the language translation problem is equivalent to producing Artificial General Intelligence (AGI). Therefore, the evidence Translated provided about the progress in reducing the gap between what expert human translators produce and what a properly optimized machine translation (MT) system can produce is quite possibly the most compelling evidence of success at a scale seen in both the MT and AI community in general.

The finding relies on data representing a concrete sample of the translation production demand. It consists of records of the time taken to edit over 2 billion MT suggestions by tens of thousands of professional translators worldwide working across multiple subject domains, ranging from literature to technical translation and including fields in which MT is still struggling, such as speech transcription.

Time to Edit as a KPI for MT quality

Over the years, Translated has continually worked to evaluate and monitor machine translation quality. In 2011, we finally standardized our methodology and settled on a metric we call “Time to Edit,” the time required by the world’s highest-performing professional translators to check and correct MT-suggested translations. Since then, we've been tracking the average Time to Edit (TTE) per word in the source language.

Time to Edit is calculated as the total time that a translator spends post-editing a segment of text divided by the number of words making up that segment. We consider TTE the best measure of translation quality as there is no concrete way to define it other than measuring the average time required to check and correct a translation in a real working scenario.

Averaged over many text segments, TTE gives an accurate estimate with low variance. Machine translation researchers have not yet had the opportunity to work with such large quantities of data collected from production settings. It is for this reason that they have had to rely on automated estimates such as the edit distance, or the number of operations required to make a correction.

By switching from automated estimates to measurements of human cognitive effort, we reassign the quality evaluation to those traditionally in charge of the task: professional translators.

This way, for example, a sentence with a single character mismatch that requires a significant amount of time to be understood and corrected by a professional translator would not receive an unduly high-quality estimation, as could be the case if using metrics like BLEU (Bilingual Evaluation Understudy).

Additionally, both edit distance and semantic difference measurements cannot be used as a consistent and accurate indication of MT quality in a production scenario. As a matter of fact, this is greatly influenced by varying content type, translator competence and turnaround time expectations - all elements that are not considered by the aforementioned methods.

In over 20 years of business, Translated has gathered evidence that TTE is a much more reliable indicator of progress in MT quality than automated metrics like BLEU or COMET (Crosslingual Optimized Metric for Evaluation of Translation), as it represents a more accurate approximation of the cognitive effort required to correct a translation.

According to data collected across billions of segments, TTE has been regularly shrinking since Translated started monitoring it as an operational KPI.

When plotted graphically, the TTE data show a surprisingly linear trend. If this trend in TTE continues its decline at the same rate as it has since 2011, TTE is projected to decrease to one second within the next several years, approaching a point where MT would provide what could be called “a perfect translation.” This would be equivalent to the time top professionals spend checking a translation produced by their colleagues which doesn't require any editing. The exact date of this outcome may vary, but the trend is evident.

Our initial hypothesis to explain the surprisingly consistent linearity in the trend is that every unit of progress towards closing the quality gap requires exponentially more resources than the previous unit, and we accordingly deploy those resources: computing power (doubling every two years), data availability (the number of words translated increases at a compound annual growth rate of 6.2% according to Nimdzi Insights), and machine learning algorithms’ efficiency (computation needed for training, 44x improvement from 2012-2019, according to OpenAI).

Another surprising aspect of the trend is how smoothly it progresses. We expected drops in TTE with every introduction of a new major model, from statistical MT to RNN-based architectures to the Transformer and Adaptive Transformer. The impact of introducing each new model has likely been distributed over time because translators were free to adopt the upgrades when they wanted.

About the Data and Process

Translated has collected over 2 billion edits on sentences effectively translated in a production setting. These edits and corrections were made by 136,000 of the best-performing freelancers worldwide working with our computer-assisted translation (CAT) tool Matecat. We began working on this software as a research project funded by the European Union, developed by a consortium consisting of Translated, Fondazione Bruno Kessler (led by Marcello Federico), the University of Edinburgh (led by Philipp Koehn), and the Université du Maine (led by Holger Schwenk). Matecat was finally released as open-source software in 2014. The European Commission included Matecat among the projects considered to have the highest potential for innovation funded by the Seventh Framework Program.

Translated relies on a proprietary AI-based technology called T-Rank to pick the best-performing professional translator for a given task. This system gathers work performance and qualification data on over 300,000 freelancers who have worked with the company over the last two decades. The AI ranking system considers over 30 factors, including resume match, quality performance, on-time delivery record, availability, and expertise in domain-specific subject areas.

Working in Matecat, translators check and correct translation suggestions provided by the MT engine of their choice. The data were initially collected using Google's statistical MT (2015-2016), then Google's neural MT and most recently today by ModernMT's adaptive neural MT, introduced in 2018 and quickly becoming the preferred choice of almost all our translators.

To refine the sample, we only considered the following:

  • Completed jobs delivered at a high level of quality.
  • Sentences with MT suggestions that had no match from translation memories.
  • Jobs in which the target language has a vast amount of data available along with proven MT efficiency (English, French, German, Spanish, Italian and Portuguese).

From the resulting pool of sentences, we removed the following:

  • Sentences that didn’t receive any edits, since they don’t provide information about TTE, and sentences that took more than 10 seconds per word to be edited, as they suggest interruptions and/or unusually high complexity. This refinement was required to enable TTE comparison across multiple years.
  • Locale adaptation work, i.e. translations between variants of a single language (e.g., en-GB to en-US), as it is not representative of the problem at hand.
  • Large customer jobs, as they leverage highly customized language models and translation memories, where TTE performance is far better than average.

Time to Edit is impacted by two main variables beyond MT quality: the evolution of the editing tool (Matecat) and the quality delivered by translators. The impact of the first has a smaller order of magnitude than the typical TTE value and the stability of perfect translations’ TTE further confirms its low impact. The second variable has an even smaller impact because the quality of the translations delivered by professionals, measured in Errors per Thousand (see below), has not significantly changed during monitoring. As such, the influence of these two factors can be considered negligible when considering the long-term trend of improvement we observe in Time to Edit.

Impact on Translators and Industry

Our progress in machine translation is a collaborative achievement built on a perfect symbiosis between humans and machines.

Translated has always recognized and valued the contribution of translators. Ever since we started using MT, we've been paying freelancers for both the words they translated and those processed by the MT. This approach has resulted in an average increase of 25% in translator compensation.

To provide better quality translation in less time, Translated has always focused on removing redundant, productivity-hampering tasks from the translator’s workflow. We have developed AI-powered tools combined with highly responsive and adaptive neural machine translation (ModernMT) and given the professionals working with us access to these powerful assistive tools: the more corrective feedback given to the machine, the better the translation suggestion supplied to the translator on a continuing, dynamic basis, in a perfect symbiosis between human creativity and machine intelligence.

Machines won't ever replace humans: indeed, AI is already proving to be a valuable tool for translation professionals, helping them translate more content at high-quality levels.

Quality management is an important aspect of MT use at Translated. To measure the overall quality of an MT suggestion, Translated uses a measurement called Errors per Thousand (EPT) words. Currently, translations performed by MT regularly score at an EPT rate of around 50, meaning there are about 50 linguistic errors in a thousand translated words. After review by top translators, the EPT decreases to around 10 on average. An additional review by a second professional further reduces the EPT to 5.

As the average quality of MT output continues to improve, as highlighted by the overall TTE trend, the MT suggestions start to be comparable with work produced by a top translator. Therefore, the same double review process described can reduce the EPT from 10 to 2 within the same budgetary constraints. Continuously improving MT allows more content to be translated at higher quality levels without increasing budgets.

Translated has noted that its clients are doubling down on making their content available in more languages as they see the increased return on investment driven by this strategy. This enhanced momentum is directly connected to and enabled by the progress in AI-powered translation capabilities.

Based on the observed trend, we estimate that we will soon witness at least a tenfold increase in requests for professional translations and at least 100 times higher demand for machine translation. This estimation is based on our observations of the growing demand for translation required by an increasingly global world, and the awareness of evolving quality in machine translation which enables translation of more content while reducing costs. We envision a future in which an increasing amount of new global business opportunities will emerge.

"All of us understand that we are approaching singularity in AI. For the first time, we have been able to quantify the speed at which we are progressing towards it."
Marco Trombetti – Translated CEO

Get in touch.

We are here to answer your questions,
and help you get what you want.

Contact us