Kuhlman: Legal Synonyms Project, and word cloud of U.S. Code

Casey Kuhlman of the U.S. Open Data Institute has posted a word cloud of the U.S. Code.

Casey says the data are “actual word counts piped into a JQuery lib,” and that he is also working on “N grams and POS tags” for the U.S. Code.

This visualization is an outcome of his Legal Synonyms Project. (HT @benbalter). Here is a description of that project, from the readme:

A synonym.txt for Solr Instances. Solr is a great search engine but it is even better with a bit of training. One of the most used ways to train Solr is to add a synonyms.txt file. Building a synonyms.txt file for a particular corpus of language is not an easy exercise. This repository is an attempt to build a synonyms.txt file for a legal corpus so that Solr can be used to search a corpus of documents of a legal nature.

The results of this effort rather than being strictly and traditionally versioned are contained in different synonyms.txt files. […]

It will be interesting to compare this n-gram application to Daniel Martin Katz, Michael Bommarito, and colleagues’ Legal Language Explorer, which displays n-gram data for U.S. federal court decisions.

For more details, please see the Legal Synonyms Project repository.

HT @compleatang

This entry was posted in Applications and tagged , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s