Surdeanu, Nallapati, & Manning on Legal Claim Identification: Information Extraction with Hierarchically Labeled Data

Dr. Mihai Surdeanu, Dr. Ramesh Nallapati, and Professor Christopher Manning, all of the Stanford University Department of Computer Science Natural Language Processing Group, will present a paper entitled Legal Claim Identification: Information Extraction with Hierarchically Labeled Data (for the full text of the paper, click here for the conference proceedings in PDF and scroll down to the page numbered 22) at SPLeT 2010: The 3rd Workshop on Semantic Processing of Legal Texts, to be held 23 May 2010 in Malta.

The workshop is part of LREC 2010: The 7th International Conference on Language Resources and Evaluation.

Here is the abstract of the paper:

This paper introduces a novel Information Extraction problem, where only parts of documents have relevance and linguistic annotations are available only for these segments. The data is hierarchical: the top layer marks the relevant text segments and the bottom layer annotates domain-specific entity mentions, but only in the segments marked as relevant in the top layer. We investigate this problem in the legal domain, where we extract the text corresponding to litigation claims and entity mentions such as patents and laws in each claim. Because entity mentions are not labeled outside claims in training data, a top-down approach that extracts claims first and entity mentions next seems the most natural. However, we show that other models are superior. Using a simple semi-supervised approach we implement a bottom-up Conditional Random Field model; we also implement a joint hierarchical CRF using a combination of pseudo-likelihood and Gibbs sampling. We show that both these models significantly outperform the top-down approach.

This entry was posted in Applications, Articles and papers, Conference papers and tagged , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s