Posts Tagged ‘Legal text analysis’

Legal Document Cloud

March 15, 2013

There has been some discussion recently of a legal document cloud: a version, specifically for legal texts, of DocumentCloud, the online document repository for journalists that uses OpenCalais to perform semantic analysis and annotation of documents.

[Here is a recent example of the use of DocumentCloud to annotate a legal text, in this instance the U.S. federal district court decision, in the National Security Letters case.]

As he was leaving the Open Data Day DC 2013 hackathon, Alan deLevie tweeted about a legal document cloud.

In a Twitter discussion of this topic at the end of Open Data Day DC 2013, Jonathan Stray said that Docracy is a legal document cloud service, with version control. [Docracy has just opened a beta version of a new technology called The Document Genome, that performs legal document comparison, summarization, and versioning, for a number of applications including compliance.]

Stray also suggested using the Associated Press’s Overview platform to do classification (tagging) of legal document collections.

Then, on March 5, 2013, Alan deLevie posted a readme for a proposed legal document cloud, on GitHub. Here are excerpts of the readme:

What?

I’m trying to build a set of standardized tools for one basic task: Looping through lots of law-related text, processing it, and saving the results. [...]

Why?

Under the hood, you’ll get parallelism and remote code execution from IronWorker. This has several advantages over running this code on your laptop:

Performance. Splitting up the work into chunks is an obvious win.

Reliability. In the middle of a large processing job, and the power goes out and your laptop battery is about to die? No worries. Your job continues to run, with results stored safely.

Curation. The legal informatics/open government/open data communities are coalescing in a great way. Many standalone scripts are emerging for specific text processing tasks. I’d like this repo to be a central place where anyone can quickly make use of these great tools. Batteries included will lower barriers to entry.

Standardization. The legal informatics community could gain by adopting a standard project structure.

Verification. This builds off of point 4. Need to show how you arrived at a certain set of findings? This could be done in maybe ~20 lines of code.

I envision something as simple as installing a Ruby gem, adding some API keys, mixing and matching text processors to suit your needs, then running your corpus through in a simple loop. [...]

A related resource: in October 2012 Elmer Masters of CALI described his proposal for a new cloud-based repository of court decisions, called CourtCloud.

If you know of other information regarding a legal document cloud, please share it in the comments to this post.

[NOTE: Edited on 18 March 2013 to clarify that the idea of a legal document cloud was not discussed aloud at Open Data Day DC 2013 but was instead mentioned on Twitter by Alan deLevie as he was leaving Open Data Day DC 2013. HT @adelevie here and here.]

Grimmer and Stewart on Text as Data: Automatic Content Analysis Methods for Legislative Texts

February 2, 2013

Professor Dr. Justin Grimmer of Stanford University and Brandon M. Stewart of Harvard University have published Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts, forthcoming in Political Analysis.

Here is the abstract:

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

The paper includes discussion of automated content analysis of legislative texts.

HT @treycausey

Hinkle et al.: Theory & Empirical Analysis of Strategic Word Choice in District Court Opinions

December 2, 2012

Rachael K. Hinkle, JD, and Professor Dr. Andrew D. Martin, both of Washington University Department of Political Science, and colleagues, have published A Positive Theory and Empirical Analysis of Strategic Word Choice in District Court Opinions, Journal of Legal Analysis, 4(2), 402-444 (2012).

Here is the abstract:

Supported by numerous empirical studies on judicial hierarchies and panel effects, Positive Political Theory (PPT) suggests that judges engage in strategic use of opinion content—to further the policy outcomes preferred by the decision-making court. In this study, we employ linguistic theory to study the strategic use of opinion content at a granular level—investigating whether the specific word choices judges make in their opinions is consistent with the competitive institutional story of PPT regarding judicial hierarchies. In particular, we examine the judges’ pragmatic use of the linguistic operations known as “hedging”—language serving to enlarge the truth set for a particular proposition, rendering it less definite and therefore less assailable—and “intensifying”—language restricting the possible truth-value of a proposition and making a statement more susceptible to falsification. Our principal hypothesis is that district court judges not ideologically aligned with the majority of the overseeing circuit judges use more hedging language in their legal reasoning in order to insulate these rulings from reversal. We test the theory empirically by analyzing constitutional criminal procedure, racial and sexual discrimination, and environmental opinions in the federal district courts from 1998 to 2001. Our results demonstrate a statistically significant increase in the use of certain types of language as the ideological distance between a district court judge and the overseeing circuit court judges increases.

Chandler on Machine Learning Judicial Behavior Using a Mathematica to Weka Interface

June 5, 2012

Professor Seth J. Chandler of the University of Houston Law Center will present a paper entitled Machine Learning Judicial Behavior Using a Mathematica to Weka Interface, at IMS 2012: The International Mathematica Symposium, to be held 11-13 June 2012 in London, England, UK.

Here is the abstract:

Weka is a comprehensive and powerful Java library subject to a GNU General Public License that implements a large number of modern machine learning classification and other methods. These classification methods include Bayesian techniques, nearest neighbor voting, support vector machines, neural nets, and decision trees, meta-methods such as “dagging” as well as simple methods such as “OneR” and “ZeroR.” While Weka can be used both from a command line and using a variety of respectable GUI interfaces provided by its designers, the ability to further manipulate its output or conduct structured experiments can be challenging. This presentation will show how one can use a Mathematica foreign language interface (J/Link) to conduct structured experiments using Weka’s capabilities and to extract information produced by Weka algorithms as the basis for further analysis and visualization using Mathematica. The domain in which the matter will be presented should be topical: an effort to predict the behavior of United States Supreme Court Justices. Using the Mathematica to Weka interface both for construction of machine learning algorithms and as an engine for deriving real-time results, the presentation (a) predicts the results of important pending Supreme Court cases, (b) creates a “Fantasy Supreme Court” that predicts the results of imagined cases to imagined panels of justices, and (c) creates a kind of “time machine” that shows how actual cases might have come out – and might have changed legal and cultural history – had they been decided by different panels of justices.

For the full text of the paper or the slides, please contact the author.

Thanks to Professor Chandler for allowing me to post the abstract.

Katz and Quantitative Legal Prediction Discussed by Law Technology News

June 4, 2012

The work of Professor Dr. Daniel Martin Katz of the Michigan State University College of Law on quantitative legal prediction (QLP) is the topic of a new article by Tam Harbert entitled Big Data Meets BigLaw: Will algorithms be able to predict trial outcomes? in Law Technology News, 1 June 2012.

In the article, Professor Katz discusses the application of QLP to eDiscovery (in a method called “predictive coding”), and states that “some percentage of tasks that lawyers do are going to be replaced by machines and/or technology.”

The article also discusses several organizations that have developed, or are developing, QLP technologies, including:

  • TyMetrix, which offers technology to analyze legal fees
  • Harlan Institute, which is exploring combining “crowd-sourced data with data from publicly available court filings, [and] then us[ing] an algorithm and decision engine to make predictions” about judicial decisions
  • LexMachina, which has developed a large database of intellectual property (IP) litigation, and uses technology to analyze and make predictions about IP cases
  • The law firm of Seyfarth Shaw LLP, which has created and automated “a client service model based on Lean Six Sigma principles.”

For more information, please see the complete article.

Dell’Orletta et al. on The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts

June 1, 2012

Felice Dell’Orletta of l’Istituto di Linguistica Computazionale del CNR di Pisa (ILC-CNR), and colleagues, have published The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 42-51.

Here is the abstract:

The 4th Workshop on “Semantic Processing of Legal Texts” (SPLeT–2012) presents the first multilingual shared task on Dependency Parsing of Legal Texts. In this paper, we define the general task and its internal organization into sub–tasks, describe the datasets and the domain–specific linguistic peculiarities characterizing them. We finally report the results achieved by the participating systems, describe the underlying approaches and provide a first analysis of the final test results.

Papers Available for SPLeT 2012: Workshop on Semantic Processing of Legal Texts

May 27, 2012

Full text papers have been posted for SPLeT 2012: Workshop on Semantic Processing of Legal Texts, being held 27 May 2012 in Istanbul, Turkey.

Here is the list of papers:

  • Giulia Venturi: Design and Development of TEMIS: a Syntactically and Semantically Annotated Corpus of Italian Legislative Texts
  • Guido Boella, Luigi Di Caro, Llio Humphreys, Livio Robaldo: Using Legal Ontology to Improve Classification in the Eunomos Legal Document and Knowledge Management System
  • Antonio Lazari, Mª Ángeles Zarco-Tejada: JurWordNet and FrameNet Approaches to Meaning Representation: a Legal Case Study
  • Lorenzo Bacci, Enrico Francesconi, Maria Teresa Sagri: A Rule-based Parsing Approach for Detecting Case Law References in Italian Court Decisions
  • Adam Wyner, Wim Peters: Semantic Annotations for Legal Text Processing using GATE Teamware
  • Paulo Quaresma: Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information
  • Adam Wyner: Problems and Prospects in the Automatic Semantic Analysis of Legal Texts
  • Felice Dell’Orletta, Simone Marchi, Simonetta Montemagni, Barbara Plank, Giulia Venturi: The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts
  • Giuseppe Attardi, Daniele Sartiano and Maria Simi: Active Learning for Domain Adaptation of Dependency Parsing on Legal Texts
  • Alessandro Mazzei, Cristina Bosco: Simple Parser Combination
  • Niklas Nisbeth, Anders Søgaard: Parser combination under sample bias

Mouritsen on Assessing Corpus Linguistics as an Empirical Path to Plain Meaning

March 7, 2012

Stephen C. Mouritsen, M.A., Esq., of Cravath, Swaine and Moore LLP, has published Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, Columbia Science and Technology Law Review, 13, 156-205 (2011). Here is the abstract:

The Plain Meaning Rule is often assailed on the grounds that it is unprincipled — that it substitutes for careful analysis an interpreter’s ad hoc and impressionistic intuition about the meaning of legal texts. But what if judges and lawyers had the means to test their intuitions about plain meaning systematically? Then initial linguistic impressions about the meaning of a legal text might be viewed as hypotheses to be tested, rather than determinative criteria upon which to base important decisions.

There exists very little legal scholarship on corpus linguistics — the study of language function and use through large, electronic linguistic databases called corpora — and the role that corpus methods might play in legal interpretation. This omission becomes more and more striking as scholars and jurists (and even the United States Supreme Court) have found themselves persuaded by corpus-based arguments.

This Article argues that the plain or ordinary meaning of a given term in a given context is an empirical matter that may be quantified through corpus-based methods. These methods, when applied to questions of legal ambiguity, present significant advantages over existing empirical approaches to plain meaning and over the prevailing intuition-based interpretive approach of many courts. Because large, sophisticated linguistic corpora are widely available and easy to use, and because corpus methods offer a more principled and systematic alternative to the impressionistic interpretation of legal texts, corpus linguistics may one day revolutionize the process of legal interpretation.

HT @aabibliographer.

Call for Participation: First Shared Task on Dependency Parsing of Legal Texts, SPLeT 2012

January 14, 2012

A call for participation — with registration deadline of 30 January 2012 — has been issued for the First Shared Task on Dependency Parsing of Legal Texts, part of SPLeT 2012: The “Semantic Processing of Legal Texts” Workshop, to be held 27 May 2012, in Istanbul, Turkey. (SPLeT 2012 is being held in conjunction with LREC-2012: The Eighth International Conference on Language Resources and Evaluation.)

According to the call:

[T]he goal of the shared task at SPLeT 2012 is to provide common and consistent task definitions and evaluation criteria for dependency parsing of legal texts in order to identify specific challenges posed by the analysis of this type of texts, to obtain a clearer idea of the current state-of-the-art, and to develop and share multilingual domain specific resources.

The languages dealt with will be English and Italian. Participants are expected to submit parsing results for at least one of the two languages involved, but they are strongly encouraged to submit results for both languages.

The task will be organized into two subtasks:

  • a basic subtask (mandatory) focusing on dependency parsing of legal texts, aimed at testing the performance of general parsing systems on legal texts;
  • a more challenging subtask (optional) focusing on the adaptation of general purpose dependency parsers to the legal domain, aimed at investigating methods and techniques for automatically extracting knowledge from large unlabelled target domain corpora to improve the performance of general parsing systems on legal texts.

For all deadlines, and for other information, please see the call for participation.

HT Dr. Giulia Venturi.

Abstracts for Papers Presented at Current Legal Issues Colloquium 2011: Law and Language

July 10, 2011

Abstracts have been posted for the papers presented at Current Legal Issues Colloquium 2011 – Law and Language, held 4-5 July 2011 at University College London Faculty of Laws. The papers concern a range of current issues in the fields of linguistics, text analysis, rhetoric, and textual interpretation, all as applied to legal texts.


Follow

Get every new post delivered to your Inbox.

Join 106 other followers

%d bloggers like this: