Posts Tagged ‘Privacy and court documents’

Lee: What Gets Redacted in Pacer?

June 19, 2011

Timothy B. Lee of the Princeton University Department of Computer Science and Center for Information Technology Policy (CITP) has posted What Gets Redacted in Pacer?, on the CITP’s blog, Freedom to Tinker.

In this post, Mr. Lee reports on research respecting documents from the U.S. federal courts’ PACER database. Using customized software, Mr. Lee — using a non-random sample of 1.8 million PACER documents, of which 11,000 appeared to contain redactions — identifies the types of information most frequently redacted in PACER documents. In this sample, social security numbers were the most frequently redacted type of information. Mr. Lee summarizes:

[...][O]ut of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn’t seem to be required by the rules of procedure, and 419 “trade secrets” whose release will typically only harm the party who fails to redact it.

That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.

Mr. Lee’s post doesn’t appear to explain the difference between the 11,000 documents found to contain redactions, and the 6,208 documents described in his statistical analysis.

Mr. Lee concludes:

This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn’t a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.

For more information, please see the complete post.

Lee on Redaction Failures in PACER

May 28, 2011

Timothy B. Lee of the Princeton University Department of Computer Science and Center for Information Technology Policy (CITP) has posted Studying the Frequency of Redaction Failures in PACER, on the CITP’s blog, Freedom to Tinker.

In this post, Mr. Lee reports on research respecting documents from the U.S. federal courts’ PACER database. Using customized software, he found that, respecting some of these documents, redactions have been attempted, but have failed. The information not redacted included:

trade secrets such as sales figures and confidential product information. Other improperly redacted documents contain sensitive medical information, addresses, and dates of birth. Still others contain the names of witnesses, jurors, plaintiffs, and one minor.

Mr. Lee then offers recommendations to the U.S. federal judiciary respecting how to avoid this problem. He links to a letter, stating many of these recommendations, that he recently sent to a committee of the Judicial Conference of the United States.

Mr. Lee has also has posted the software code that he used to identify the unsuccessfully redacted documents.

Mr. Lee says that this research was funded by Public.Resource.Org.

For more information on CITP’s PACER-related research, please see Stephen Schultze’s recent VoxPopuLII post, PACER, RECAP, and the Movement to Free American Case Law.


Follow

Get every new post delivered to your Inbox.

Join 97 other followers

%d bloggers like this: