[NOTE: Updated 6-1-2009 to add final paragraph.]
Professor Masato Hagiwara et al. of Nagoya University Graduate School of Information Science have published Bootstrapping-Based Extraction of Legal Terms from Unsegmented Legal Text, most of which is available here in full-text on Google Books. Here is the abstract:
“Recent demands for translating Japanese statutes into foreign languages necessitate the compilation of standard bilingual dictionaries. To support this costly task, we propose a bootstrapping-based lexical knowledge extraction algorithm Monaka, to automatically extract dictionary term candidates from unsegmented Japanese legal text. The algorithm is based on the Tchai algorithm and extracts reliable patterns and instances in an iterative manner, but instead uses character n-grams as contextual patterns, and introduces a special constraint to ensure proper segmentation of the extracted terms. The experimental results show that this algorithm can extract correctly segmented and important dictionary terms with higher accuracy compared to conventional methods.”
This paper, and several other papers originally delivered at the JURISIN 2008 conference, have been published in New Frontiers in Artificial Intelligence JSAI 2008 Conference and Workshops, Asahikawa, Japan, June 11 – 13, 2008 ; Revised Selected Papers (Hiromitsu Hattori et al. eds., 2009), several articles of which are available from Google Book here; the WorldCat record for this book is here. The call for papers for JURISIN 2009 is available here.
To find other recent legal informatics scholarship, see the Preprints, Articles, Indexes, Dissertations, Conferences, and Monographs sections of our sister Website, Legal Information Systems & Legal Informatics Resources.
Tags: Automated dictionary construction, Dictionaries, Japan, JURISIN, Knowledge representation, Statutes, Text analysis