Posts Tagged ‘Computational linguistics and law’

Quaresma on Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information

June 3, 2012

Professor Dr. Paulo Quaresma of Universidade de Évora Departamento de Informática has published Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 37-38.

Here is the abstract:

In order to automatically extract information from legal texts we propose the use of a mixed approach, using linguistic information and machine learning techniques. In the proposed architecture, lexical, syntactical, and semantical information is used as input for specialized machine learning algorithms, such as support vector machines. This approach was applied to collections of legal documents and the preliminary results were quite promising.

Larzi and Zarco-Tejada: JurWordNet and FrameNet Approaches to Meaning Representation: A Legal Case Study

June 2, 2012

Antonio Lazari of Scuola Superiore Sant’Anna and Dr. María Ángeles Zarco-Tejada of Universidad de Cádiz have published JurWordNet and FrameNet Approaches to Meaning Representation: A Legal Case Study, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 21-26.

Here is the abstract:

This paper describes JurWordNet, FrameNet and LOIS approaches towards meaning representation regarding the concept ‘State Liability’ from a cross-linguistic and comparative perspective. Our starting point has been the lexical and conceptual mismatching of legal terms that the process of harmonization in the European Union has manifested. Our study analyzes such concept in Italian, Spanish, French and English and shows how a deeper sub-language based representation of meaning is needed to account for such phenomena. We examine the most important computational-lexical models in an attempt to identify the most suitable and appropriate approach towards lexical-conceptual mismatching of the concept ‘State liability’ in the European legal tradition. Our proposal shows a formalization of the concept in the four systems mentioned and uses semantic features to represent lexical mismatching and cultural differences. With this study we show in a systematic way the differences in legal tradition and the reasons for divergence in the judicial use of related concepts.

Dell’Orletta et al. on The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts

June 1, 2012

Felice Dell’Orletta of l’Istituto di Linguistica Computazionale del CNR di Pisa (ILC-CNR), and colleagues, have published The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 42-51.

Here is the abstract:

The 4th Workshop on “Semantic Processing of Legal Texts” (SPLeT–2012) presents the first multilingual shared task on Dependency Parsing of Legal Texts. In this paper, we define the general task and its internal organization into sub–tasks, describe the datasets and the domain–specific linguistic peculiarities characterizing them. We finally report the results achieved by the participating systems, describe the underlying approaches and provide a first analysis of the final test results.

Papers Available for SPLeT 2012: Workshop on Semantic Processing of Legal Texts

May 27, 2012

Full text papers have been posted for SPLeT 2012: Workshop on Semantic Processing of Legal Texts, being held 27 May 2012 in Istanbul, Turkey.

Here is the list of papers:

  • Giulia Venturi: Design and Development of TEMIS: a Syntactically and Semantically Annotated Corpus of Italian Legislative Texts
  • Guido Boella, Luigi Di Caro, Llio Humphreys, Livio Robaldo: Using Legal Ontology to Improve Classification in the Eunomos Legal Document and Knowledge Management System
  • Antonio Lazari, Mª Ángeles Zarco-Tejada: JurWordNet and FrameNet Approaches to Meaning Representation: a Legal Case Study
  • Lorenzo Bacci, Enrico Francesconi, Maria Teresa Sagri: A Rule-based Parsing Approach for Detecting Case Law References in Italian Court Decisions
  • Adam Wyner, Wim Peters: Semantic Annotations for Legal Text Processing using GATE Teamware
  • Paulo Quaresma: Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information
  • Adam Wyner: Problems and Prospects in the Automatic Semantic Analysis of Legal Texts
  • Felice Dell’Orletta, Simone Marchi, Simonetta Montemagni, Barbara Plank, Giulia Venturi: The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts
  • Giuseppe Attardi, Daniele Sartiano and Maria Simi: Active Learning for Domain Adaptation of Dependency Parsing on Legal Texts
  • Alessandro Mazzei, Cristina Bosco: Simple Parser Combination
  • Niklas Nisbeth, Anders Søgaard: Parser combination under sample bias

Bommarito: Visualization of Reading Level Frequency by Congressional Bill Stage

April 15, 2012

Michael J. Bommarito II of Computational Legal Studies has posted Visualization of Reading Level Frequency by Congressional Bill Stage, on his blog.

Here are excerpts from the post:

Here’s a fun example of how you might use my data on Congressional bill length and complexity. Imagine you want to understand the empirical distribution of Flesch-Kincaid reading level for Congressional bills and how this distribution is related to bill stage. A first step might be to visualize this relationship. [...]

Based on this visualization, you might infer that engrossed bills tend to have less right-skew and have a lower mean reading level. The story behind this might be that Senators and Representatives are less likely to accept legislation they do not understand. To test this, you might run a simple [Kolmogorov-Smirnov] test to see if the introduced bill reading levels are greater than engrossed bill reading levels.

For graphs and sample code, please see the complete post.

Mouritsen on Assessing Corpus Linguistics as an Empirical Path to Plain Meaning

March 7, 2012

Stephen C. Mouritsen, M.A., Esq., of Cravath, Swaine and Moore LLP, has published Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, Columbia Science and Technology Law Review, 13, 156-205 (2011). Here is the abstract:

The Plain Meaning Rule is often assailed on the grounds that it is unprincipled — that it substitutes for careful analysis an interpreter’s ad hoc and impressionistic intuition about the meaning of legal texts. But what if judges and lawyers had the means to test their intuitions about plain meaning systematically? Then initial linguistic impressions about the meaning of a legal text might be viewed as hypotheses to be tested, rather than determinative criteria upon which to base important decisions.

There exists very little legal scholarship on corpus linguistics — the study of language function and use through large, electronic linguistic databases called corpora — and the role that corpus methods might play in legal interpretation. This omission becomes more and more striking as scholars and jurists (and even the United States Supreme Court) have found themselves persuaded by corpus-based arguments.

This Article argues that the plain or ordinary meaning of a given term in a given context is an empirical matter that may be quantified through corpus-based methods. These methods, when applied to questions of legal ambiguity, present significant advantages over existing empirical approaches to plain meaning and over the prevailing intuition-based interpretive approach of many courts. Because large, sophisticated linguistic corpora are widely available and easy to use, and because corpus methods offer a more principled and systematic alternative to the impressionistic interpretation of legal texts, corpus linguistics may one day revolutionize the process of legal interpretation.

HT @aabibliographer.

Bommarito: Statistics on the Length and Linguistic Complexity of Bills

February 13, 2012

Michael J. Bommarito II of Computational Legal Studies has posted Statistics on the length and linguistic complexity of bills on his blog.

This post presents a table of statistics on word count, word and sentence length, and Flesch-Kincaid reading level scores for the bills introduced in the 112th U.S. Congress, and a histogram showing the distribution of word counts in those bills.

Mr. Bommarito says that he will “be adding more automated analysis and figures over the next few weeks.”

HT @mbommar.

Katz, Bommarito et al., Legal Language Explorer

December 14, 2011

[NOTE: Updated 19 December 2011 to link to Mr. Bommarito's post describing the development of Legal Language Explorer.]

Professor Dr. Daniel Martin Katz of Michigan State University College of Law, Michael J. Bommarito II of Computational Legal Studies, and colleagues, have launched Legal Language Explorer, a new, free, Web-based software application that performs Google N-gram word counts on U.S. Supreme Court decisions.

Click here for the JURIX 2011 presentation slides describing the service.

Mr. Bommarito has described the development of the service in a new post, entitled Building Legal Language Explorer: Interactivity and drill-down, noSQL and SQL

One of the notable features of Legal Language Explorer is that it analyzes full-text court decisions published free on the Web by Public.Resource.Org, as part of the Law.gov legal open government data movement. Katz and Bommarito have previously argued that making more full-text legal resources available free on the Web would enable researchers to build new software tools for processing those resources, and to generate new knowledge through innovative analysis of those resources. Legal Language Explorer exemplifies this kind of software innovation fostered by open legal data, while the authors’ new paper, entitled Legal N-Grams? A Simple Approach to Track the ‘Evolution’ of Legal Language, illustrates the kinds of original research that may arise from analysis of such data.


Follow

Get every new post delivered to your Inbox.

Join 97 other followers

%d bloggers like this: