The DGT Multilingual Translation Memory of the Acquis Communautaire: DGT-TM — a parallel corpus of all European Union legislation, called the Acquis Communautaire, translated into all 22 languages of the EU nations — has been expanded to include EU legislation from 2004-2010, according to an April 2012 announcement on the DGT-TM Website. The updated corpus is called DGT-TM-2011.
The new content comes from the EU Official Journal Series L, 2004-2010.
According to the announcement, DGT-TM-2011 is the largest parallel corpus in the world, and is intended to be used for the following purposes:
- training automatic systems for statistical machine translation (SMT);
- producing monolingual or multilingual lexical and semantic resources such as dictionaries and ontologies;
- training and testing multilingual information extraction software;
- checking translation consistency automatically;
- testing and benchmarking alignment software (for sentences, words, etc.).
The DGT-TM-2011 should be a valuable resource for legal informatics and legal linguistics research and development.
For more information, please see:
- The official announcement;
- A new paper about DGT-TM-2011: Steinberger, R., et al. (2012). DGT-TM: A freely Available Translation Memory in 22 Languages, to be presented at LREC 2012;
- A new post about the DGT-TM-2011: Super-European language translation corpus, at Corpus Linguistics.
HT @moximer.
Tags: Acquis Communautaire, Corpora of legal texts, Corpora of legislative texts, Cross-language legal information systems, DGT-TM, DGT-TM-2011, EU, EU Official Journal, EU Official Journal Series L, European Commission Directorate General for Translation, European Union, European Union Legislation, Legal information extraction, Legal linguistics, Legal machine learning, Legal ontologies, Legal parallel corpora, Legal taxonomies, Legal text mining, Legal textual corpora, Legal translation, Legislative corpora, Multilingual legal dictionaries