Waldo Jaquith has posted Two Mini-Projects: Subsection Identifier and Definition Scraper, at The State Decoded blog.
Here are excerpts from the post:
The State Decoded project has spun off a couple of sub-projects, components of the larger project that can be useful for other purposes, and that deserve to stand alone. (Both are found on our GitHub repository.)
The first is Subsection Identifier, which turns theoretically structured text into actually structured text. It is common for documents in outline form (contracts, laws, and other documents that need to be able to cross-reference specific passages) to be provided in a format in which the structural labels flow into the text. [...]
The second mini-project is Definition Scraper, which extracts defined terms from passages of text. Many legal documents begin by defining words that are then used throughout the document, and knowing those definitions can be crucial to understanding that document. So it can be helpful to be able to extract a list of terms and their definitions. Definition Scraper needs only be handed a passage of text, and it will determine whether it contains defined terms and, if it does, it will return a dictionary of those terms and their definitions. [...]
For more details, please see the complete post.
The State Decoded is Waldo’s free and open legal data and e-participation platform for U.S. states.
Click here for other posts about The State Decoded.
HT @StateDecoded here and here
Tags: Definition Scraper, Definition scrapers, Free access to law, Legal definition, Legal definition scrapers, Legal text processing, Legislative information systems, Public access to legal information, State Decoded, Subsection Identifier, Text structuring applications, The State Decoded, Waldo Jaquith