Posts Tagged ‘Legal text processing’

Legal Document Cloud

March 15, 2013

There has been some discussion recently of a legal document cloud: a version, specifically for legal texts, of DocumentCloud, the online document repository for journalists that uses OpenCalais to perform semantic analysis and annotation of documents.

[Here is a recent example of the use of DocumentCloud to annotate a legal text, in this instance the U.S. federal district court decision, in the National Security Letters case.]

As he was leaving the Open Data Day DC 2013 hackathon, Alan deLevie tweeted about a legal document cloud.

In a Twitter discussion of this topic at the end of Open Data Day DC 2013, Jonathan Stray said that Docracy is a legal document cloud service, with version control. [Docracy has just opened a beta version of a new technology called The Document Genome, that performs legal document comparison, summarization, and versioning, for a number of applications including compliance.]

Stray also suggested using the Associated Press’s Overview platform to do classification (tagging) of legal document collections.

Then, on March 5, 2013, Alan deLevie posted a readme for a proposed legal document cloud, on GitHub. Here are excerpts of the readme:

What?

I’m trying to build a set of standardized tools for one basic task: Looping through lots of law-related text, processing it, and saving the results. [...]

Why?

Under the hood, you’ll get parallelism and remote code execution from IronWorker. This has several advantages over running this code on your laptop:

Performance. Splitting up the work into chunks is an obvious win.

Reliability. In the middle of a large processing job, and the power goes out and your laptop battery is about to die? No worries. Your job continues to run, with results stored safely.

Curation. The legal informatics/open government/open data communities are coalescing in a great way. Many standalone scripts are emerging for specific text processing tasks. I’d like this repo to be a central place where anyone can quickly make use of these great tools. Batteries included will lower barriers to entry.

Standardization. The legal informatics community could gain by adopting a standard project structure.

Verification. This builds off of point 4. Need to show how you arrived at a certain set of findings? This could be done in maybe ~20 lines of code.

I envision something as simple as installing a Ruby gem, adding some API keys, mixing and matching text processors to suit your needs, then running your corpus through in a simple loop. [...]

A related resource: in October 2012 Elmer Masters of CALI described his proposal for a new cloud-based repository of court decisions, called CourtCloud.

If you know of other information regarding a legal document cloud, please share it in the comments to this post.

[NOTE: Edited on 18 March 2013 to clarify that the idea of a legal document cloud was not discussed aloud at Open Data Day DC 2013 but was instead mentioned on Twitter by Alan deLevie as he was leaving Open Data Day DC 2013. HT @adelevie here and here.]

Jaquith: Two Mini-Projects spun off from The State Decoded: Subsection Identifier & Definition Scraper

March 2, 2013

Waldo Jaquith has posted Two Mini-Projects: Subsection Identifier and Definition Scraper, at The State Decoded blog.

Here are excerpts from the post:

The State Decoded project has spun off a couple of sub-projects, components of the larger project that can be useful for other purposes, and that deserve to stand alone. (Both are found on our GitHub repository.)

The first is Subsection Identifier, which turns theoretically structured text into actually structured text. It is common for documents in outline form (contracts, laws, and other documents that need to be able to cross-reference specific passages) to be provided in a format in which the structural labels flow into the text. [...]

The second mini-project is Definition Scraper, which extracts defined terms from passages of text. Many legal documents begin by defining words that are then used throughout the document, and knowing those definitions can be crucial to understanding that document. So it can be helpful to be able to extract a list of terms and their definitions. Definition Scraper needs only be handed a passage of text, and it will determine whether it contains defined terms and, if it does, it will return a dictionary of those terms and their definitions. [...]

For more details, please see the complete post.

The State Decoded is Waldo’s free and open legal data and e-participation platform for U.S. states.

Click here for other posts about The State Decoded.

HT @StateDecoded here and here

Grimmer and Stewart on Text as Data: Automatic Content Analysis Methods for Legislative Texts

February 2, 2013

Professor Dr. Justin Grimmer of Stanford University and Brandon M. Stewart of Harvard University have published Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts, forthcoming in Political Analysis.

Here is the abstract:

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

The paper includes discussion of automated content analysis of legislative texts.

HT @treycausey

Awad et al.: An Iterative Approach to Synthesize Business Process Templates from Compliance Rules

October 26, 2012

Professor Dr. Ahmed Awad of Cairo University Faculty of Computers and Information, and colleagues, have published An iterative approach to synthesize business process templates from compliance rules [paywalled version ; preprint version] forthcoming in Information Systems, 37(8), 714–736 (2012).

Here is the abstract:

Companies have to adhere to compliance requirements. The compliance analysis of business operations is typically a joint effort of business experts and compliance experts. Those experts need to create a common understanding of business processes to effectively conduct compliance management. In this paper, we present a technique that aims at supporting this process. We argue that process templates generated out of compliance requirements provide a basis for negotiation among business and compliance experts. We introduce a semi-automated and iterative approach to the synthesis of such process templates from compliance requirements expressed in Linear Temporal Logic (LTL). We show how generic constraints related to business process execution are incorporated and present criteria that point at underspecification. Further, we outline how such underspecification may be resolved to iteratively build up a complete specification. For the synthesis, we leverage existing work on process mining and process restructuring. However, our approach is not limited to the control-flow perspective, but also considers direct and indirect data-flow dependencies. Finally, we elaborate on the application of the derived process templates and present an implementation of our approach.

Call for Proposals: ReInventLaw Dubai 2012: An ‘Un’conference on Law, Technology, Innovation, and Entrepreneurship

September 23, 2012

A call for presentation proposals — with submission deadline of 15 October 2012 — has been issued for ReInventLaw Dubai 2012: “an ‘un’conference devoted to law, technology, innovation, and entrepreneurship” — to be held 10 December 2012 at Media City in Dubai.

The organizers particularly welcome presentations about innovations in legal services or legal education. Presentations can take the form of 6 Minute Ignite Style Presentations or 12 Minute “TED Style” Presentations.

Registration is free.

The event Website describes the event as follows:

ReInvent Law Dubai is an “un”conference devoted to law, technology, innovation, and entrepreneurship.

Anyone interested in the future of law or technology or entrepreneurship will want to participate. Come hear about the innovative ideas generated by the highly-engaging atmosphere of the event!

The event is being sponsored by The ReInventLaw Laboratory at Michigan State University College of Law, and is modeled on the LawTechCamp London 2012 event held last summer.

For more information, please see the ReInventLaw Dubai 2012 Website.

HT @computational.

Call for Papers: ICAIL 2013: International Conference on Artificial Intelligence and Law

September 23, 2012

A call for papers — with paper submission deadline of 18 January 2013 — has been issued for ICAIL 2013: 14th International Conference on Artificial Intelligence and Law, to be held 10-14 June 2013 in Rome, Italy.

The Twitter account for the conference is @ICAIL2013 . The Twitter hashtag for the conference is #ICAIL2013. The conference organizers invite those interested to follow the Twitter account and hashtag and to comment and contribute with the latest news.

The conference features two tracks: one for “regular papers” and one for “innovative applications papers.”

Here is the complete list of deadlines:

  • Mentoring program request deadline: November 9, 2012
  • Mentoring program paper deadline: November 16, 2012
  • Submission of workshop and tutorial proposals: December 7, 2012
  • Submission of abstracts (optional): January 11, 2013
  • Submission of papers deadline: January 18, 2013
  • Notification of acceptance: March 20, 2013
  • Final revised and formatted papers due: April 19, 2013
  • Conference: June 10 – June 14, 2013

Papers are invited on the following topics:

  • Formal and computational models of legal reasoning
  • Knowledge acquisition techniques for the legal domain, including natural language processing and data mining
  • Computational models of argumentation and decision making
  • Legal knowledge representation including legal ontologies and common sense knowledge
  • Automatic legal text classification and summarization
  • Automated information extraction from legal databases and texts
  • Machine learning and data mining applied to legal databases
  • Conceptual or model-based legal information retrieval
  • E-discovery and e-disclosure
  • E-government and e-justice
  • Computational models of evidential reasoning
  • Modeling norms for multi-agent systems
  • Modeling negotiation and contract formation
  • Computational models of case-based legal reasoning
  • Online dispute resolution
  • Intelligent legal tutoring systems
  • Intelligent support systems for the legal domain
  • Interdisciplinary applications of legal informatics methods and systems

For more information, please see the call for papers.

HT Anne Gardner

[NOTE: Updated 23 November 2012 to add the Twitter account and hashtag. HT Enrico Francesconi]

Oldfather et al. on Automated Content Analysis, Court Opinions, and Legal Scholarly Methodology

September 8, 2012

Professor Chad M. Oldfather of Marquette University School of Law, Professor Dr. Joseph P. Bockhorst of the University of Wisconsin Madison Department of Electrical Engineering and Computer Science, and Brian P. Dimmer, Esq., have published Triangulating Judicial Responsiveness: Automated Content Analysis, Judicial Opinions, and the Methodology of Legal Scholarship, Florida Law Review, 64, 1189-1242 (2012).

Here is the abstract:

The increasing availability of digital versions of court documents, coupled with increases in the power and sophistication of computational methods of textual analysis, promises to enable both the creation of new avenues of scholarly inquiry and the refinement of old ones. This Article advances that project in three respects. First, it examines the potential for automated content analysis to mitigate one of the methodological problems that afflicts both content analysis and traditional legal scholarship — their acceptance on faith of the proposition that judicial opinions accurately report information about the cases they resolve and courts’ decisional processes. Because automated methods can quickly process large amounts of text, they allow for assessment of the correspondence between opinions and other documents in the case, thereby providing a window into how closely opinions track the information provided by the litigants. Second, it explores one such novel measure — the responsiveness of opinions to briefs — in terms of its connection to both adjudicative theory and existing scholarship on the behavior of courts and judges. Finally, it reports our efforts to test the viability of automated methods for assessing responsiveness on a sample of briefs and opinions from the United States Court of Appeals for the First Circuit. Though we are focused primarily on validating our methodology, rather than on the results it generates, our initial investigation confirms that even basic approaches to automated content analysis provide useful information about responsiveness, and generates intriguing results that suggest avenues for further study.

ReInvent Law Dubai 2012: Unconference on Law, Technology, Innovation, and Entrepreneurship

July 9, 2012

ReInvent Law Dubai 2012: Unconference on Law, Technology, Innovation, and Entrepreneurship will be held 10 December 2012 at Dubai Knowledge Village, Dubai, UAE, according to an announcement at Computational Legal Studies.

The event’s organizers will be Professor Dr. Daniel Martin Katz and Professor Renee Newman Knake, both of the Michigan State University College of Law and its new ReInvent Law Laboratory.

According to the event brochure:

ReInvent Law Dubai is an (un)conference focusing on law, technology, innovation and entrepreneurship. Building upon the success of the recent London event, leaders in the fields of law, technology and beyond will come together to share ideas about innovation in the delivery of legal services.

This event is Free, Open and Participatory. Anyone can propose a topic. Entrepreneurs, new media/technology enthusiasts, legal professionals, social networkers, and those curious about future innovation in law and technology will want to attend.

The Michigan State University College of Law Graduate Program at MSU Dubai is a primary sponsor.

For more information, please see the announcement.

HT @computational.

June 29: LawTechCamp London

June 28, 2012

LawTechCamp London 2012 — “a BarCamp-style community UnConference for new media and technology enthusiasts and legal professionals” — will be held 29 June 2012 in London, England, UK.

The Twitter hashtag for the conference is #lawtechcamplondon.

Click here for archived Twitter tweets — in .csv format — from the event.

Click here for the conference program.

A notable characteristic of this event is that it gathers together in one place individuals from most of the different subgroups of the legal informatics community.

The event’s organizers include:

HT @reneeknake.


Follow

Get every new post delivered to your Inbox.

Join 97 other followers

%d bloggers like this: