Kuhlman: Legal Synonyms Project, and word cloud of U.S. Code

Casey Kuhlman of the U.S. Open Data Institute has posted a word cloud of the U.S. Code.

Casey says the data are “actual word counts piped into a JQuery lib,” and that he is also working on “N grams and POS tags” for the U.S. Code.

This visualization is an outcome of his Legal Synonyms Project. (HT @benbalter). Here is a description of that project, from the readme:

A synonym.txt for Solr Instances. Solr is a great search engine but it is even better with a bit of training. One of the most used ways to train Solr is to add a synonyms.txt file. Building a synonyms.txt file for a particular corpus of language is not an easy exercise. This repository is an attempt to build a synonyms.txt file for a legal corpus so that Solr can be used to search a corpus of documents of a legal nature.

The results of this effort rather than being strictly and traditionally versioned are contained in different synonyms.txt files. […]

It will be interesting to compare this n-gram application to Daniel Martin Katz, Michael Bommarito, and colleagues’ Legal Language Explorer, which displays n-gram data for U.S. federal court decisions.

For more details, please see the Legal Synonyms Project repository.

HT @compleatang

