Eric O. Scott, Haleh Vafaie, Zelal Gungordu, Charles E. Horowitz, and Bradford C. Brown are scheduled to present a paper entitled Text Mining for Quality Control of Court Records, at SemADoc 2014: Semantic Analysis of Documents Workshop, to be held 16 September 2014 in Fort Collins, Colorado, USA. The workshop is being held in conjunction with the DocEng 2014 conference.
Here is the abstract, from the event program:
Attorneys across the United States use government-provided electronic databases to submit docket entries and associated case ﬁles for processing and archival in public judicial records. Data entry errors in these repositories, while rare, can disrupt the court process, confuse the public record, or breach privacy and conﬁdentiality. Docket quality assurance is thus a high priority for the courts, but manual review remains resource-intensive.
We have developed a prototype application of text mining and human language technologies to partially automate quality assurance review of electronic court documents. This solution uses document classiﬁcation and named entity recognition to extract metadata directly from documents. Discrepancies between the extracted metadata and the user-provided metadata indicate a possible data entry error. On two independent samples of publicly available court documents, we ﬁnd that for a small number of classes with a sufficient number of training documents, the document class can be automatically classified with greater than 94\% accuracy in one case, but only 81\% in the other. Our attempts to extract case numbers and the names of parties from documents via a conditional random ﬁeld model met with less success. Future work with more extensive training data is necessary to more accurately evaluate both applications.