This Is the Speed at Which We Are Approaching Singularity in AI

During the opening keynote of the AMTA conference in Orlando, Translated quantified progress toward singularity in AI through data that highlighted quality improvements in machine translation

Language translation was one of the first challenges taken on by researchers in the domain of artificial intelligence (AI). It remains one of the most complex and difficult problems for a machine to perform at the level of a human. "That’s because language is the most natural thing for humans. Nonetheless, the data Translated collected clearly shows that machines are not that far from closing the gap," said Translated’s CEO Marco Trombetti at the Association for Machine Translation in the Americas 2022 conference, where he previewed our discovery during his opening keynote speech.

Many AI researchers believe that solving the language translation problem is the closest thing to producing Artificial General Intelligence (AGI). This is because natural language is by far the most complex problem we have in AI. It requires accurate modeling of reality in order to work, more so than any other narrow AI. Thus, the evidence that Translated has provided regarding the closing of the gap between what expert human translators and an optimized machine translation (MT) system can produce is quite possibly the most compelling evidence of success at scale seen in both the MT and AI communities.

The finding relies on data representing a concrete sample of the translation production demand. The data consists of records of the time taken to edit over 2 billion MT suggestions by tens of thousands of professional translators worldwide. These translations span multiple subject domains, ranging from literature to technical translation, and include fields in which MT is still struggling in, such as speech transcription.

Time to Edit as a KPI for MT quality

Over the years, Translated has continually worked to evaluate and monitor machine translation quality. In 2011, we finally standardized our methodology and settled on a metric we call “Time to Edit,” the time required by the world’s highest-performing professional translators to check and correct MT-suggested translations. Since then, we've been tracking the average Time to Edit (TTE) per word in the source language.

Time to Edit is calculated as the total time that a translator spends post-editing a segment of text divided by the number of words making up that segment. We consider TTE the best measure of translation quality as there is no concrete way to define it other than measuring the average time required to check and correct a translation in a real working scenario.

Averaged over many text segments, TTE gives an accurate estimate with low variance. Machine translation researchers have not yet had the opportunity to work with such large quantities of data collected from production settings. It is for this reason that they have had to rely on automated estimates such as the edit distance, or the number of operations required to make a correction.

By switching from automated estimates to measurements of human cognitive effort, we reassign the quality evaluation to those traditionally in charge of the task: professional translators.

Thus, a sentence with a single character mismatch that requires significant translator time to understand and correct the mismatch would not receive an unduly high-quality estimation. Metrics like Edit Distance and BLEU (Bilingual Evaluation Understudy) would miss the need for the special cognitive effort needed for such corrections and tend to inflate quality estimations.

Additionally, both edit distance and semantic difference scores cannot be used as a consistent and accurate indication of MT quality in a production scenario. Consistent scoring and quality measurement are challenging in the production scenario because this is greatly influenced by varying content type, translator competence, and turnaround time expectations. These factors are not considered by the aforementioned methods.

In over 20 years of business, Translated has gathered evidence that TTE is a much more reliable indicator of progress in MT quality than automated metrics like BLEU or COMET (Crosslingual Optimized Metric for Evaluation of Translation), as it represents a more accurate approximation of the cognitive effort required to correct a translation.

According to data collected across billions of segments, TTE has been continuously improving since Translated started monitoring it as an operational KPI.

When plotted graphically, the TTE data shows a surprisingly linear trend. If this trend in TTE continues its decline at the same rate as it has since 2014, TTE is projected to decrease to one second within the next several years, approaching a point where MT would provide what could be called “a perfect translation.” This is the point of singularity at which the time top professionals spend checking a translation produced by the MT is not different from the time spent checking a translation produced by their colleagues which doesn't require any editing. The exact date of when this point will be reached may vary, but the trend is clear.

Our initial hypothesis to explain the surprisingly consistent linearity in the trend is that every unit of progress toward closing the quality gap requires exponentially more resources than the previous unit, and we accordingly deploy those resources: computing power (doubling every two years), data availability (the number of words translated increases at a compound annual growth rate of 6.2% according to Nimdzi Insights), and machine learning algorithms’ efficiency (computation needed for training, 44x improvement from 2012-2019, according to OpenAI).

Another surprising aspect of the trend is how smoothly it progresses. We expected drops in TTE with every introduction of a new major model, from statistical MT to RNN-based architectures to the Transformer and Adaptive Transformer. The impact of introducing each new model has likely been distributed over time because translators were free to adopt the upgrades when they wanted.

About the Data and Process

Translated has collected over 2 billion edits on sentences effectively translated in a production setting. These edits and corrections were made by 136,000 of the highest-performing freelancers worldwide working with our computer-assisted translation (CAT) tool Matecat. We began working on this software as a research project funded by the European Union, developed by a consortium consisting of Translated, Fondazione Bruno Kessler (led by Marcello Federico), the University of Edinburgh (led by Philipp Koehn), and the Université du Maine (led by Holger Schwenk). Matecat was finally released as open-source software in 2014. The European Commission included Matecat among the projects considered to have the highest potential for innovation funded by the Seventh Framework Program.

Translated relies on a proprietary AI-based technology called T-Rank to pick the highest-performing professional translator for a given task. This system gathers work performance and qualification data on over 300,000 freelancers who have worked with the company over the last two decades. The AI ranking system considers over 30 factors, including resume match, performance quality, on-time delivery record, availability, and expertise in domain-specific subject areas.

Working in Matecat, translators check and correct translation suggestions provided by the MT engine of their choice. The data was initially collected using Google's statistical MT (2015-2016), then Google's neural MT, and most recently by ModernMT's adaptive neural MT, introduced in 2018 and quickly becoming the preferred choice of almost all our translators.

To refine the observation sample, we only considered the following:

  • Completed jobs delivered at a high level of quality.
  • Sentences with MT suggestions that had no match from translation memories.
  • Jobs in which the target language has a vast amount of data available along with proven MT efficiency (English, French, German, Spanish, Italian and Portuguese).

From the resulting pool of sentences, we removed the following:

  • Sentences that didn’t receive any edits, since they don’t provide information about TTE, and sentences that took more than 10 seconds per word to be edited, as they suggest interruptions and/or unusually high complexity. This refinement was required to enable TTE comparison across multiple years.
  • Locale adaptation work, i.e. translations between variants of a single language (e.g., en-GB to en-US), as it is not representative of the problem at hand.
  • Large customer jobs, as they leverage highly customized language models and translation memories, where TTE performance is far better than average.

Time to Edit is impacted by two additional variables beyond MT quality: the evolution of the editing tool (Matecat) and the quality delivered by translators. The impact of the first has a smaller order of magnitude than the typical TTE value and the stability of perfect translations’ TTE further confirms its low impact. The second variable has an even smaller impact because the quality of the translations delivered by professionals, measured in Errors per Thousand (see below), has not significantly changed during monitoring. As such, the influence of these two factors can be considered negligible when considering the long-term trend of improvement we observe in Time to Edit.

Impact on the Translators and the Industry

Our progress in machine translation is a collaborative achievement built on a perfect symbiosis between humans and machines.

Translated has always recognized and valued the contribution of translators. Ever since we started using MT, we've been paying freelancers for both the words they translated and those processed by the MT. This approach has resulted in an average increase of 25% in translator compensation.

To provide higher quality translations in less time, Translated has always focused on removing redundant, productivity-hampering tasks from the translator’s workflow. We have developed AI-powered tools combined with highly responsive and adaptive neural machine translation (ModernMT) and given professionals working with us access to these powerful assistive tools: the more corrective feedback given to the machine, the better the translation suggestion supplied to the translator on a continuing, dynamic basis. It’s a perfect symbiosis between human creativity and machine intelligence.

Machines won't ever replace humans. AI is already proving to be a valuable tool for translation professionals, helping them translate more content at a higher level.

At Translated, quality measurement is an important aspect of MT. To measure the overall quality of an MT suggestion, Translated uses a measurement called Errors per Thousand (EPT) words. Currently, translations performed by MT regularly score at an EPT rate of around 50, meaning there are about 50 linguistic errors in 1000 translated words. After review by top translators, the EPT decreases to around 10 on average. An additional review by a second professional further reduces the EPT to 5.

As the average quality of MT output continues to improve, highlighted by the overall TTE trend, the MT suggestions start to be comparable with work produced by a top translator. Therefore, the same double review process described can reduce the EPT from 10 to 2 within the same budgetary constraints. Continuously improving MT enables more content to be translated at a higher quality without increasing budgets.

Translated has noted that its clients are doubling down on making their content available in more languages as they see an increased return on investment driven by this strategy. This increased momentum is directly connected to and enabled by the progress of AI-powered translation capabilities.

Based on this trend, we estimate that we will soon witness at least a tenfold increase in requests for professional translations and at least 100 times higher demand for machine translation. This estimate is based on observing the growing demand for translation in an increasingly global world, and awareness of the evolving quality in machine translation, which enables more content translation while reducing costs. We envision a future where an increasing amount of new global business opportunities will emerge.

"All of us understand that we are approaching singularity in AI. For the first time, we have been able to quantify the speed at which we are progressing toward it."
Marco Trombetti – Translated CEO

Get in touch.

We are here to answer your questions,
and help you get what you want.

Contact us