Participation is invited for Domain Adaptation for Dependency Parsing, which is “an evaluation campaign for the adaptation of dependency parsers to legislative texts,” according to Giulia Venturi of l’Istituto di Linguistica Computazionale del CNR di Pisa (ILC-CNR). According to Ms. Venturi, “the domain adaptation task aims to investigate techniques for adapting state-of-the-art dependency parsing systems to domains outside of the data from which they were trained or developed.”
This campaign is being held as part of EVALITA 2011, the third evaluation campaign of Natural Language Processing and Speech tools for Italian.
Organizers of the Domain Adaptation for Dependency Parsing campaign include Dr. Simonetta Montemagni of ILC–CNR and Ms. Venturi.
Registration for the campaign is open through 4 October 2011. Development data for the campaign becomes available as of 20 May 2011.
According to the Domain Adaptation for Dependency Parsing campaign announcement:
The goal of this task is to learn how to increase the accuracy of a parsing system when dealing with out-of-domain texts. In particular, the task will consist in learning how to derive labelled dependency relations for Italian by means of a parser developed for general language. The following data sets (in CoNLL format) will be distributed:
- for the source domain:
- a training set represented by the ISST-TANL corpus jointly developed by the Istituto di Linguistica Computazionale “Antonio Zampolli” (ILC-CNR) and the University of Pisa (UniPi) and already used in the dependency parsing track of EVALITA 2009 (pilot sub-task);
- a development set of about 5,000 tokens;
- for the target domain:
- a target corpus drawn from an Italian legislative corpus, gathering laws enacted by different releasing agencies (European Commission, Italian State and Regions) and regulating a variety of domains, ranging from environment, human rights, disability rights to freedom of expression. The target corpus includes automatically generated sentence splitting, tokenization and PoS tagging;
- a manually annotated development set of about 5,000 tokens, also including labeled dependency relations.
Evaluation will be carried out in terms of standard accuracy dependency parsing measures (labeled attachment score, unlabelled attachment score, label accuracy) with respect to a test set of texts from the target domain of about 5,000 tokens including manually revised PoS-tags.
For more information, please see the campaign announcement.
HT Giulia Venturi.