DGT-TM-2011, Parallel Corpus of All EU Legislation in Translation, Expanded to Include Data from 2004-2010

The DGT Multilingual Translation Memory of the Acquis Communautaire: DGT-TM — a parallel corpus of all European Union legislation, called the Acquis Communautaire, translated into all 22 languages of the EU nations — has been expanded to include EU legislation from 2004-2010, according to an April 2012 announcement on the DGT-TM Website. The updated corpus is called DGT-TM-2011.

The new content comes from the EU Official Journal Series L, 2004-2010.

According to the announcement, DGT-TM-2011 is the largest parallel corpus in the world, and is intended to be used for the following purposes:

  • training automatic systems for statistical machine translation (SMT);
  • producing monolingual or multilingual lexical and semantic resources such as dictionaries and ontologies;
  • training and testing multilingual information extraction software;
  • checking translation consistency automatically;
  • testing and benchmarking alignment software (for sentences, words, etc.).

The DGT-TM-2011 should be a valuable resource for legal informatics and legal linguistics research and development.

For more information, please see:

HT @moximer.

Tags: , , , , , , , , , , , , , , , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 106 other followers

%d bloggers like this: