Here are excerpts from the post:
The State Decoded project has spun off a couple of sub-projects, components of the larger project that can be useful for other purposes, and that deserve to stand alone. (Both are found on our GitHub repository.)
The first is Subsection Identifier, which turns theoretically structured text into actually structured text. It is common for documents in outline form (contracts, laws, and other documents that need to be able to cross-reference specific passages) to be provided in a format in which the structural labels flow into the text. […]
The second mini-project is Definition Scraper, which extracts defined terms from passages of text. Many legal documents begin by defining words that are then used throughout the document, and knowing those definitions can be crucial to understanding that document. So it can be helpful to be able to extract a list of terms and their definitions. Definition Scraper needs only be handed a passage of text, and it will determine whether it contains defined terms and, if it does, it will return a dictionary of those terms and their definitions. […]
For more details, please see the complete post.
The State Decoded is Waldo’s free and open legal data and e-participation platform for U.S. states.