Toward the Universal Translator

Translated and Cineca partner to create the world's most advanced translation model, trained on a unique dataset using one of the world's most powerful supercomputers.

Rome – November 12, 2024

Translated and Cineca, one of the world's largest computing centers, today announced a research project to create an AI that can provide translations of the same quality as the best professional translators. The project represents a significant step forward in the evolution of translation technologies and the global accessibility of languages. It aims to train the most advanced language model, developed by Translated, using a unique translation dataset and the powerful Leonardo supercomputer, ranked seventh most powerful supercomputer globally and third in Europe-managed by Cineca at the Tecnopolo di Bologna. The result will be the most advanced language model in the world, which in the Italian-English and English-Italian language pairs will be released as "open-source" with "open weights", marking a major milestone in developing a universal translator.

"Language is at the core of everything we do as humans. This is why we’ve made it our mission to ensure that one day, everyone will be able to understand and be understood in their own language. We are excited that, through the combination of our 25 years of AI research and Cineca’s immense computational power, we can create a translation technology with the potential to profoundly impact society and bring us a step closer to achieving this goal."
Marco Trombetti – founder and CEO of Translated

Cineca will support the training of Translated's language model with 10 million hours of GPU training. This immense computational capacity will significantly accelerate model training, ensuring rapid progress. Developed using fifteen years of meticulous collection, the dataset provided by Translated surpasses other commonly used, datasets by including entire documents with rich contexts, including incorrect translations, revisions, and the reasoning behind translators’ and reviewers’ decisions during disagreements. By harnessing this wealth of sophisticated and complex data, Translated, with Cineca’s support, is making significant strides in training artificial intelligence models that possess a deep understanding of languages.

"We are excited to be engaged in a project that makes such a significant contribution to research on AI applied to language for the benefit of both national and global communities. This collaboration is an example of how supercomputing can drive innovation and have a considerable social impact to enhance people's lives worldwide."
Francesco Ubertini – President of Cineca

The research project, presented during the event, ‘The Power of Languages - Toward the Universal Translator,’ is built on public-private collaborations with universities and research centers supported by European Union grants in which Translated has participated since the early 2000s. These collaborations have facilitated the development of increasingly advanced technologies to support translation, making them accessible to all members of society.

The language model to be trained by the Leonardo supercomputer was developed by Translated using an innovative Chain-of-Thought technique, having already been tested with a small group of companies in real production contexts during 2024, demonstrating the ability to translate conversational data with an error rate of fewer than three in every thousand words- lower than the errors made by professional translators on the same content, and four times lower than the most advanced machine translation systems. Thanks to the computing power of Leonardo, Translated and Cineca expect to reduce the margin of error to one error per thousand words, achieving accuracy comparable to the top 1% of professional translators.

After the initial release of both the Italian-English and English-Italian pairs, Translated plans to extend the new model to all 200 languages supported by its current AI, further broadening its geographical footprints while also expanding the influence of its technological developments to continue breaking barriers in the field of translation.