Legal Document Cloud

There has been some discussion recently of a legal document cloud: a version, specifically for legal texts, of DocumentCloud, the online document repository for journalists that uses OpenCalais to perform semantic analysis and annotation of documents.

[Here is a recent example of the use of DocumentCloud to annotate a legal text, in this instance the U.S. federal district court decision, in the National Security Letters case.]

As he was leaving the Open Data Day DC 2013 hackathon, Alan deLevie tweeted about a legal document cloud.

In a Twitter discussion of this topic at the end of Open Data Day DC 2013, Jonathan Stray said that Docracy is a legal document cloud service, with version control. [Docracy has just opened a beta version of a new technology called The Document Genome, that performs legal document comparison, summarization, and versioning, for a number of applications including compliance.]

Stray also suggested using the Associated Press’s Overview platform to do classification (tagging) of legal document collections.

Then, on March 5, 2013, Alan deLevie posted a readme for a proposed legal document cloud, on GitHub. Here are excerpts of the readme:


I’m trying to build a set of standardized tools for one basic task: Looping through lots of law-related text, processing it, and saving the results. […]


Under the hood, you’ll get parallelism and remote code execution from IronWorker. This has several advantages over running this code on your laptop:

Performance. Splitting up the work into chunks is an obvious win.

Reliability. In the middle of a large processing job, and the power goes out and your laptop battery is about to die? No worries. Your job continues to run, with results stored safely.

Curation. The legal informatics/open government/open data communities are coalescing in a great way. Many standalone scripts are emerging for specific text processing tasks. I’d like this repo to be a central place where anyone can quickly make use of these great tools. Batteries included will lower barriers to entry.

Standardization. The legal informatics community could gain by adopting a standard project structure.

Verification. This builds off of point 4. Need to show how you arrived at a certain set of findings? This could be done in maybe ~20 lines of code.

I envision something as simple as installing a Ruby gem, adding some API keys, mixing and matching text processors to suit your needs, then running your corpus through in a simple loop. […]

A related resource: in October 2012 Elmer Masters of CALI described his proposal for a new cloud-based repository of court decisions, called CourtCloud.

If you know of other information regarding a legal document cloud, please share it in the comments to this post.

[NOTE: Edited on 18 March 2013 to clarify that the idea of a legal document cloud was not discussed aloud at Open Data Day DC 2013 but was instead mentioned on Twitter by Alan deLevie as he was leaving Open Data Day DC 2013. HT @adelevie here and here.]

This entry was posted in Applications, Projects, Technology developments, Technology tools and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s