The workshop entitled Open Government: Defining, Designing, and Sustaining Transparency (POGW), held 21-22 January 2010 at Princeton University’s Center for Information Technology Policy (CITP), featured much valuable discussion about legal information.
(The Twitter hashtag for the workshop is #pogw. An apparently complete collection of tweets from the workshop is available here. My tweets from the workshop are available here.)
The POGW Law.gov panel participants were:
- Tom Bruce, Director of Cornell’s Legal Information Institute;
- John Joergensen, creator of the Rutgers University Camden Law Library Digital Collections;
- Stephen Schultze, Associate Director of Princeton’s CITP; and
- Carl Malamud of Public.Resource.Org, as moderator.
Here are my notes on the legal-information-related discussion at POGW:
During the panel on “Defining Transparency” (click here for video), Professor Helen Nissenbaum of New York University argued in favor of editing court records before they are released online to the public. Under a “principle of reduction,” according to which government information should be rationalized and streamlined before public dissemination in order to prevent information overload and violation of privacy rights, Nissenbaum urged the adoption of information policies requiring each government agency to:
- identify its purposes and functions;
- identify its needs and the needs of its public users respecting the information the agency produces; and
- before disseminating the information the agency produces, edit that information in light of the agency’s purposes and functions and the information needs of the agency and its public users.
According to Nissenbaum, in the case of court records, a judicial system might identify such concepts as “dispute resolution” and “access to justice” among its purposes; might identify “protecting privacy rights”, “facilitating accountability of judges,” and “enabling self-represented litigants to navigate the court system” as among the information needs a judicial system might identify; and might require redaction of personally identifying information from court records prior to online dissemination, in order to protect those privacy rights. Nissenbaum suggested that a court system might engage in other types of editing of court documents prior to online dissemination, but did not specify.
During the panel entitled “Designing Transparency” (click here for video), Joshua Tauberer of GovTrack.us announced the availability of a new service called govtrackinsider.com, which combines journalism with the legislative data from GovTrack.us.
During the panel entitled “Sustaining Transparency” (click here for video), Mike Wash, Chief Information Officer of the U.S. Government Printing Office (GPO), discussed a number of issues respecting digital legal information, with a focus on GPO’s new FDsys content management system. Wash noted that FDsys has been developed to serve four key functions: versioning, preservation, permanent public access, and authentication. He said that FDsys uses Public Key Infrastructure and the digital signature tool in Adobe Acrobat to authenticate digital legal information. He said that, although the XML bulk data versions of the Federal Register and the Code of Federal Regulations (CFR) now available for download from FDsys are not currently authenticated, GPO plans to implement authentication of those sources in 2010. He stated that GPO and other federal agencies currently use an outdated locator-code composition system to write federal legal documents, and that GPO is currently transitioning to a new system that enables composition of legal documents in XML. In addition, he observed that GPO is reformatting SGML documents contained in GPO’s legacy GPO Access system into XML.
The Law.gov panel on 22 January (scroll down) (click here video) began with an overview of the Law.gov project given by panel moderator Carl Malamud of Public.Resource.Org. (Click here for more information about the Law.gov project.) Malamud described the history of digital legal databases in the U.S., beginning with the U.S. Air Force’s FLITE database and the U.S. Department of Justice’s JURIS database, and continuing through the development of the Mead Data Central/LexisNexis and Westlaw databases, to the rise of free access to law services beginning with the Legal Information Institute at Cornell University Law School, and then to low-cost legal databases such as FastCase and Justia, and new free services such as AltLaw, the Rutgers Camden Digital Collections, and Princeton’s RECAP service.
Malamud stated that the Law.gov project will proceed in three stages:
- 1) hold meetings (including this one) across the U.S.;
- 2) write a report;
- 3) lobby the U.S. federal government to create a national registry of federal law, and to require each law-making federal entity to authenticate all digital legal information it produces.
Malamud identified two rationales for Law.gov:
- 1) enabling innovation: he noted the high cost — which he estimated at $10 million to $30 million — to acquire a complete collection of U.S. primary law, and he characterized this cost as a barrier to innovation; and
- 2) democracy: he said that the current system of requiring U.S. citizens to purchase access to primary law restricted their ability to participate in politics and in society.
Tom Bruce, Director of the Legal Information Institute (LII) at Cornell University Law School, began his presentation by recounting the history of the free access to law movement. He noted that in 1992, he and Cornell’s Peter Martin founded the LII as the first legal information site on the Internet. A short time later, the Canadian CanLII service and the Australian AustLII service began operations. He stated that today, there are more than 20 free access to law services around the world.
Bruce identified several factors that foster demand for free access to law, including globalization, which requires millions of market participants to know the laws of foreign jurisdictions; and government employees’ dissatisfaction with the quality of access to the laws promulgated by their own governments, available via print sources or fee-based legal information services.
Bruce then identified several normative arguments in favor of free access to law, including:
Bruce noted that free access to law services differ markedly in the scope of information that they publish, with most offering selected content for their jurisdictions, while a few offer comprehensive collections of primary law. He also observed that some free legal information services publish only national legal material, while others publish some combination of national and either local or international legal information.
Bruce asserted as well that some free access to law services view themselves as permanent, while others, such as SAFLII, consider at least some of their functions to be temporary, and to be transferred to other entities at some point in the future.
Bruce stated that free access to law services exhibit a variety of models of sustainability, with some services, such as CanLII and AustLII, imposing a tax on users, and others, including Cornell’s LII, relying on grants and donations. He stated that for most free legal information services, sustainability has been challenging, because, although grant funding is readily availability for starting a free access to law service, few institutional resources are available to fund ongoing operations, and few public users of these services are willing to furnish financial support.
Bruce observed that a number of standards for free access legal publishing have been developed, including URN: LEX and MetaLex, but that free access to law services have been reluctant to embrace these standards. He suggested that resistance to standards has been based largely on the claims that legal resources feature too many exceptions to be standardized, and that implementing standards would divert scarce time and resources from the more important task of publishing the law. He asserted that these concerns were overstated, and that legal document attributes around the world are much more similar than has usually been acknowledged.
Stephen Schultze, Associate Director of Princeton’s CITP, then discussed PACER, the U.S. federal court document database, and CITP’s RECAP tool. Schultze explained that PACER, which was developed and is run by the Administrative Office of the U.S. Courts (AOUSC), is built to gather filings from the electronic case filing system now used by all U.S. federal courts.
Schultze identified several shortcomings of PACER:
- Because Congress requires PACER to be self sustaining, the AOUSC has placed PACER behind a paywall, and charges 8 cents per page for access to all PACER documents, including dockets (with an 8 cent charge for each PACER search that retrieves no documents); yet charging for PACER may be inconsistent with other federal law;
- PACER’s search functionality is poor;
- PACER documents, including dockets, are not structured, and this characteristic inhibits interoperability and reuse of those documents;
- PACER documents are not authenticated;
- PACER features a poor user interface design; and
- Many court records in PACER display personally identifying information, contrary to law.
Schultze noted that recently passed Congressional language urges the AOUSC to take steps to offer free access to PACER.
Schultze then described RECAP, a Firefox plugin that takes downloaded PACER documents, uploads them to an open online repository, marks them up in XML, and alerts PACER users when a document they are seeking from PACER is available free of charge from another source.
Schultze explained that the RECAP team at CITP is now urging the AOUSC to reform PACER in the following ways:
- As soon as possible:
- make PACER searches and dockets free of charge;
- lower PACER page charges;
- authenticate all PACER documents;
- By mid 2010:
- make all PACER documents available in a structured format, such as XML; and
- offer RSS feeds to alert users of new PACER content;
- By the end of 2010:
- make all PACER content available to the public free of charge.
John Joergensen, creator of the Rutgers University Camden Law Library Digital Collections, began his presentation by describing the development of those digital collections. He recalled admiring Cornell’s LII, and desiring to offer a similar service for New Jersey law. Eventually, he succeeded in gaining access first to New Jersey Supreme Court decisions, due to the willingness of the Chief Justice of that court to provide long-term, free of charge public access to the court’s case law. Next Joergensen arranged to publish New Jersey’s administrative decisions, and then slip opinions of the U.S. District Court for the District of New Jersey.
Joergensen stated that this digital publishing effort has been sustained because of the support of the dean of the Rutgers Camden Law School and the director of the Rutgers Camden Law Library, and the inclusion of funding for the effort in the law library’s regular operating budget. Joergensen noted that the role of these digital collections is to furnish long-term, free of charge online public access to legal documents, to complement the short-term online access provided by the tribunals that create these documents.
Joergensen then discussed authentication of digital legal information. He noted the tension between the values of authentication and wide public access to digital legal documents, and argued that, when these values conflict, access should prevail: authentication should not be used as an excuse to restrict public access to the law. He also observed that judges and practicing lawyers rarely use authenticated legal information: rather, they use unauthenticated documents retrieved from Westlaw, Lexis, PACER, and other online services.
Joergensen recommended relinquishing the term “authentication” in favor of different terminology: he preferred the notion of a “definitive” original version of each legal document, from which multiple “accurate” copies are made. He asserted that the “accuracy” of the copies can be determined by checksums, and descriptions of the chain of custody of, and alterations to, the copies, with all of this information being recorded as embedded metadata in the copies.
Joergensen argued that the key issue respecting the integrity of digital legal information concerned alterations to digital legal documents due to fraud or mistake. He contended that checksums and provenance information, embedded in each copy of a digital document, suffice to address that issue.
With respect to preservation of digital legal information, Joergensen emphasized the importance of having multiple copies of each document held by a variety of repositories, so that public access to each document is secure notwithstanding the failure of one or more repositories. He argued that, in the current, volatile economic and technological environment, characterized by the frequent “creative destruction” of public and private entities, public access to digital legal information is at risk if any one entity becomes the sole source of that information.
In the discussion following the panelists’ presentations, the following comments were made:
- Joergensen argued that law school users and legal practitioners prefer legal information in digital format, because that format is easier to use and saves space. He argued that long-term preservation of digital legal information is a core responsibility of legal information institutes and law libraries.
- Bruce discussed the funding model for CanLII. He noted that CanLII’s board consists of representatives of the Canadian bar associations, to which Canadian lawyers are required to belong, and which tax their members to fund CanLII. He argued that this model can’t be duplicated in the U.S., where bar associations have less authority than in Canada.
- On the prevalence of fraud- or mistake-related errors in digital legal information, Bruce cited research done by Cornell’s LII showing little evidence of such errors, but he noted that cases involving such errors usually settle, resulting in a meager paper trail. Joergensen said he was unaware of any incidents of such fraud or mistake.
- Malamud noted that Law.gov seeks to require authentication of all federal digital legal information by the promulgating agency at the time of publication.
- Bruce contended that legal research is a form of risk management or insurance, and that individuals and organizations will only engage in, or pay for, a limited amount of legal research, because users “will only insure to the value of the goods.” He noted that authentication adds costs to legal information that many users will not pay for. He further observed that many users of legal information use branding as a proxy for authentication; to the extent they have confidence in the publisher’s brand, they trust that the information obtained from that publisher is reliable.
- Respecting redaction of legal documents prior to publication, Malamud noted that existing law already prohibits publication of personally identifying information in federal legal documents, but that his audit found inadequate enforcement of that law. Schultze contended that software can detect most personally identifying information. Malamud and Schultze agreed with Nissenbaum that courts should decide what legal information they release. Bruce cited http://www.whosarat.com/ as an example of problems arising from public disclosure of personally identifying information: after a firm published names of individuals (gathered from PACER) who entered into plea agreements with federal prosecutors, federal authorities considered barring public access to documents relating to those agreements. Bruce observed that privacy concerns vary considerably among legal contexts. He cited the reports of the New York State Commission on Public Access to Court Records, and the work of Peter Winn on privacy issues respecting public records. Joergensen and Malamud both said that when they receive a privacy-related complaint about a digital document in one of their collections, they block search-engine access to the document by means of a robots.txt file, but they do not withdraw the document from the collection.
- Respecting preservation, Schultze and Joergensen discussed the need to replicate the definitive original copy of each digital legal document. Joergensen expressed concern that for some U.S. digital legal information, a private firm was the sole source, and that this lack of redundancy jeopardized long-term public access to that information.
Apart from the panel presentations, legal information was also discussed during informal conversations at the workshop. Here are some topics that were discussed informally:
- Some participants expressed concern that rapid technological change coupled with expected severe federal government budget cuts in the coming years will lead to the elimination of a number of federal government agencies. Since some of those agencies may be publishers of digital legal information, some participants urged that steps be taken soon to achieve redundancy of digital legal information resources published directly by federal government agencies, and to prepare for the transfer of such publishing functions to other agencies, or to civil society organizations.
- Princeton’s CITP is considering adding federal legislation to its FedThread service, and is considering testing tools that would enable users to comment on each paragraph of bills currently before Congress.
- Semantic Web and Linked Data technology were much discussed at the workshop, and several more U.S.-based law-related applications of this technology may be announced in the coming months.