Professor Dr. Maarten Marx and Anne Schuth, both of the Universiteit van Amsterdam Informatics Institute, and Nelleke Aders of Tweede Kamer der Staten-Generaal, will present a paper entitled Digital Sustainable Publication of Legacy Parliamentary Proceedings, at dg.o 2010: The 11th Annual International Conference on Digital Government Research, to be held 17-20 May 2010 in Puebla, Mexico. Here is the abstract:
We address the problem of publishing parliamentary proceedings in a digital sustainable manner. We give an extensive requirements analysis, and based on that propose a uniform XML format. We evaluated our approach by collecting and automatically processing proceedings from six parliaments spanning almost 200 years in total. Most of this data is real legacy data consisting of scanned and OCRed documents. The approach scales very well and produces high quality data.
All documents are transformed into UTF-8 encoded XML files with extensive metadata in Dublin Core Standard. The text itself is divided into pages which are divided into paragraphs. Every document, page, and paragraph has a unique URN which resolves to a Web page. Every page element in the XML files is connected to a facsimile image of that page in PDF or JPEG format. We created a viewer in which both versions can be inspected simultaneously. A search engine for the complete collection is available online.
Tags: Anne Schuth, dg.o, dg.o 2010, Digital law libraries, Digital legal publishing, Digitizing legal information, Digitizing legislative documents, Dublin Core and legal informatics, Legal descriptive metadata, Legal metadata, Legal XML, Legislative information systems, Maarten Marx, Metadata for parliamentary proceedings, Nelleke Aders, Preservation of digital legal documents, Preservation of digital legal information