The European Commission has translated a collection of around one million sentences into 22 of the 23 official languages of the European Union, in an initiative designed to promote computer-assisted translation.

This, according to the Commission, is the biggest ever collection in so many languages and is now freely available over the internet.

Through co-operation between its translators and its in-house scientists, the Commission, is releasing large collections of sentences from legal documents covering technical, political and social issues in 22 languages.

According to the EC, this kind of data is highly sought after by developers of machine translation systems in which automatic translation software ‘learns’ from manually translated texts how words and phrases are correctly and contextually translated.

The data will help with the development of other linguistic software tools such as grammar and spell checkers, online dictionaries and multilingual text classification systems.

Leonard Orban, European Commissioner for multilingualism at the EC, said: “By this initiative the European Commission intends to boost human language technologies, support multilingualism and make computer-assisted translation easier, cheaper and more accessible.”

The EU institutions have more multilingual texts than any other organisation because of the requirements that EU law exist in each of its 23 official languages.

Although large amounts of translations of English or French texts can be found on the internet already, the EU says resources are scarce for languages such as Latvian or Romanian, and they are practically nonexistent for the combination of two languages.

The website will make it possible to find sentences with their equivalent in all other official languages. Only Irish translations are not yet available.

The EC says that this release of language data is a good example of their open policy of re-use of its information resources and follows the opening of the EU’s documentary and terminological databases Eur-Lex and IATE.

The Commission already offers publicly accessible news search sites covering up to 35 languages via its European Media Monitoring tool. 

Links

EC Directorate-General for Translation

 

Joe Fernandez