Posts Tagged ‘Computational linguistics and law’
June 3, 2012
Professor Dr. Paulo Quaresma of Universidade de Évora Departamento de Informática has published Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 37-38.
Here is the abstract:
In order to automatically extract information from legal texts we propose the use of a mixed approach, using linguistic information and machine learning techniques. In the proposed architecture, lexical, syntactical, and semantical information is used as input for specialized machine learning algorithms, such as support vector machines. This approach was applied to collections of legal documents and the preliminary results were quite promising.
Like this:
Like Loading...
Tags:Computational linguistics and law, Deep linguistic information and legal texts, Legal computational linguistics, Legal information extraction, Legal knowledge extraction, Legal knowledge representation, Legal natural language processing, Legal ontologies, Machine learning in legal texts, Natural language processing of legal texts, Paulo Quaresma, SPLeT, SPLeT 2012, Workshop on Semantic Processing of Legal Texts
Posted in Uncategorized | Leave a Comment »
June 2, 2012
Antonio Lazari of Scuola Superiore Sant’Anna and Dr. María Ángeles Zarco-Tejada of Universidad de Cádiz have published JurWordNet and FrameNet Approaches to Meaning Representation: A Legal Case Study, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 21-26.
Here is the abstract:
This paper describes JurWordNet, FrameNet and LOIS approaches towards meaning representation regarding the concept ‘State Liability’ from a cross-linguistic and comparative perspective. Our starting point has been the lexical and conceptual mismatching of legal terms that the process of harmonization in the European Union has manifested. Our study analyzes such concept in Italian, Spanish, French and English and shows how a deeper sub-language based representation of meaning is needed to account for such phenomena. We examine the most important computational-lexical models in an attempt to identify the most suitable and appropriate approach towards lexical-conceptual mismatching of the concept ‘State liability’ in the European legal tradition. Our proposal shows a formalization of the concept in the four systems mentioned and uses semantic features to represent lexical mismatching and cultural differences. With this study we show in a systematic way the differences in legal tradition and the reasons for divergence in the judicial use of related concepts.
Like this:
Like Loading...
Tags:Antonio Lazari, Computational linguistics and law, Cross-language legal knowledge representation, FrameNet, JurWordNet, Legal computational linguistics, Legal knowledge representation, Legal lexical databases, Legal lexical mismatching, Legal linguistics, Legal ontologies, Legal translation, Lexical mismatching in law, LOIS, María Ángeles Zarco-Tejada, Multilingual legal information systems, Multilingual legal knowledge representation, SPLeT, SPLeT 2012, Workshop on Semantic Processing of Legal Texts
Posted in Articles and papers, Conference papers, Research findings | Leave a Comment »
June 1, 2012
Felice Dell’Orletta of l’Istituto di Linguistica Computazionale del CNR di Pisa (ILC-CNR), and colleagues, have published The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts, in LREC 2012 Conference Proceedings: Semantic Processing of Legal Texts (SPLeT-2012) Workshop, pp. 42-51.
Here is the abstract:
The 4th Workshop on “Semantic Processing of Legal Texts” (SPLeT–2012) presents the first multilingual shared task on Dependency Parsing of Legal Texts. In this paper, we define the general task and its internal organization into sub–tasks, describe the datasets and the domain–specific linguistic peculiarities characterizing them. We finally report the results achieved by the participating systems, describe the underlying approaches and provide a first analysis of the final test results.
Like this:
Like Loading...
Tags:Computational linguistics and law, Dependency parsing and legal texts, Dependency parsing and legislative texts, Dependency parsing of legal texts, Dependency parsing of legislative texts, Felice Dell’Orletta, Giulia Venturi, Legal computational linguistics, Legal linguistics, Legal text analysis, Legal text processing, Parsing of legal texts, Parsing of legislative texts, Simonetta Montemagni, SPLeT, SPLeT 2012, Workshop on Semantic Processing of Legal Texts
Posted in Articles and papers, Conference papers, Research findings | Leave a Comment »
May 27, 2012
Full text papers have been posted for SPLeT 2012: Workshop on Semantic Processing of Legal Texts, being held 27 May 2012 in Istanbul, Turkey.
Here is the list of papers:
- Giulia Venturi: Design and Development of TEMIS: a Syntactically and Semantically Annotated Corpus of Italian Legislative Texts
- Guido Boella, Luigi Di Caro, Llio Humphreys, Livio Robaldo: Using Legal Ontology to Improve Classification in the Eunomos Legal Document and Knowledge Management System
- Antonio Lazari, Mª Ángeles Zarco-Tejada: JurWordNet and FrameNet Approaches to Meaning Representation: a Legal Case Study
- Lorenzo Bacci, Enrico Francesconi, Maria Teresa Sagri: A Rule-based Parsing Approach for Detecting Case Law References in Italian Court Decisions
- Adam Wyner, Wim Peters: Semantic Annotations for Legal Text Processing using GATE Teamware
- Paulo Quaresma: Legal Information Extraction ← Machine Learning Algorithms + Linguistic Information
- Adam Wyner: Problems and Prospects in the Automatic Semantic Analysis of Legal Texts
- Felice Dell’Orletta, Simone Marchi, Simonetta Montemagni, Barbara Plank, Giulia Venturi: The SPLeT–2012 Shared Task on Dependency Parsing of Legal Texts
- Giuseppe Attardi, Daniele Sartiano and Maria Simi: Active Learning for Domain Adaptation of Dependency Parsing on Legal Texts
- Alessandro Mazzei, Cristina Bosco: Simple Parser Combination
- Niklas Nisbeth, Anders Søgaard: Parser combination under sample bias
Like this:
Like Loading...
Tags:Adam Wyner, Automatic classification of legal documents, Automatic classification of legal information, Computational linguistics and law, Dependency parsing and legal texts, Eunomos, FrameNet, GATE, GATE and legal documents, JurWordNet, Legal computational linguistics, Legal information extraction, Legal knowledge representation, Legal lexical databases, Legal linguistics, Legal natural language processing, Legal ontologies, Legal text analysis, Lexical databases and legal informatics, Machine learning and law, Machine learning and legal texts, Natural language processing, Natural language processing of legal texts, NLP, Parsing court decisions, Parsing judicial decisions, Parsing legal texts, Semantic analysis of legal texts, Semantic annotation of legal text, Semantic annotation of legislation, SPLeT, SPLeT 2012, TEMIS, Workshop on Semantic Processing of Legal Texts
Posted in Articles and papers, Conference papers, Conference proceedings | Leave a Comment »
April 15, 2012
Michael J. Bommarito II of Computational Legal Studies has posted Visualization of Reading Level Frequency by Congressional Bill Stage, on his blog.
Here are excerpts from the post:
Here’s a fun example of how you might use my data on Congressional bill length and complexity. Imagine you want to understand the empirical distribution of Flesch-Kincaid reading level for Congressional bills and how this distribution is related to bill stage. A first step might be to visualize this relationship. [...]
Based on this visualization, you might infer that engrossed bills tend to have less right-skew and have a lower mean reading level. The story behind this might be that Senators and Representatives are less likely to accept legislation they do not understand. To test this, you might run a simple [Kolmogorov-Smirnov] test to see if the introduced bill reading levels are greater than engrossed bill reading levels.
For graphs and sample code, please see the complete post.
Like this:
Like Loading...
Tags:Complexity of legal language, Complexity of legislative language, Computational Legal Studies, Computational linguistics and law, Legal computational linguistics, Michael Bommarito, Michael James Bommarito, Reading-level of legal texts, Reading-level of legislation, Statistical analysis of legal language, Statistical analysis of legal texts, Statistical analysis of legislative language, Visualization of legal information, Word counts of legislation
Posted in Applications, Technology developments | Leave a Comment »
March 7, 2012
Stephen C. Mouritsen, M.A., Esq., of Cravath, Swaine and Moore LLP, has published Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, Columbia Science and Technology Law Review, 13, 156-205 (2011). Here is the abstract:
The Plain Meaning Rule is often assailed on the grounds that it is unprincipled — that it substitutes for careful analysis an interpreter’s ad hoc and impressionistic intuition about the meaning of legal texts. But what if judges and lawyers had the means to test their intuitions about plain meaning systematically? Then initial linguistic impressions about the meaning of a legal text might be viewed as hypotheses to be tested, rather than determinative criteria upon which to base important decisions.
There exists very little legal scholarship on corpus linguistics — the study of language function and use through large, electronic linguistic databases called corpora — and the role that corpus methods might play in legal interpretation. This omission becomes more and more striking as scholars and jurists (and even the United States Supreme Court) have found themselves persuaded by corpus-based arguments.
This Article argues that the plain or ordinary meaning of a given term in a given context is an empirical matter that may be quantified through corpus-based methods. These methods, when applied to questions of legal ambiguity, present significant advantages over existing empirical approaches to plain meaning and over the prevailing intuition-based interpretive approach of many courts. Because large, sophisticated linguistic corpora are widely available and easy to use, and because corpus methods offer a more principled and systematic alternative to the impressionistic interpretation of legal texts, corpus linguistics may one day revolutionize the process of legal interpretation.
HT @aabibliographer.
Like this:
Like Loading...
Tags:Artificial intelligence and law, Computational linguistics and law, Corpus linguistics, Corpus linguistics and law, Legal corpus linguistics, Legal interpretation, Legal linguistics, Legal text analysis, Legal text corpora, Plain meaning in statutory interpretation, Plain meaning rule, Statutory interpretation, Stephen C. Mouritsen
Posted in Articles and papers, Research findings | Leave a Comment »
February 13, 2012
Michael J. Bommarito II of Computational Legal Studies has posted Statistics on the length and linguistic complexity of bills on his blog.
This post presents a table of statistics on word count, word and sentence length, and Flesch-Kincaid reading level scores for the bills introduced in the 112th U.S. Congress, and a histogram showing the distribution of word counts in those bills.
Mr. Bommarito says that he will “be adding more automated analysis and figures over the next few weeks.”
HT @mbommar.
Like this:
Like Loading...
Tags:Complexity of legal language, Complexity of legislative language, Computational Legal Studies, Computational linguistics and law, Legal computational linguistics, Michael Bommarito, Michael James Bommarito, Reading-level of legal texts, Reading-level of legislation, Statistical analysis of legal language, Statistical analysis of legal texts, Statistical analysis of legislative language, Word counts of legislation
Posted in Applications, Others' scholarly or sophisticated blogposts, Research findings | 1 Comment »
December 21, 2011
Dr. Meritxell Fernández-Barrera of Cersa (Centre d’Études et de Recherches de Sciences Administratives et Politiques)- CNRS has successfully defended her Ph.D. thesis, entitled User-generated knowledge through legal ontologies: how to bring the law into the Semantic Web 2.0, at the European University Institute Department of Law, under the supervision of Professor Dr. Giovanni Sartor of Università di Bologna CIRSFID.
Here is the abstract:
This thesis presents a study of the epistemological and cognitive assumptions which currently underlie knowledge acquisition for legal ontology engineering. The hypothesis is that such assumptions might have a qualitative effect on the final ontological-terminological resources and therefore on the performance of the systems which use them.The first part of the thesis presents the state of the art in legal ontology engineering (the computational concept of ontology, a review of available legal ontologies and modelling methodologies). The second part of the thesis shows that currently knowledge acquisition in legal ontology learning is limited to very concrete legal genres, namely, legislation, case law and legal doctrine. The third part presents a case study in which two different legal genres are used for building a consumer law ontology: a traditional legal genre, Italian consumer regulation, and a Web 2.0 genre, namely an online corpus of citizens‟ queries regarding consumer justice. Results proof the impact of legal genre variation on the construction of the domain ontology. Thus main findings suggest that Web 2.0 corpora are a rich source for the construction of ontological resources, and at the same time these new types of ontological resources might be useful in e-government applications aimed at increasing online communication with citizens.
Some parts of the thesis are summarized in Dr. Fernández-Barrera’s recent VoxPopuLII post, entitled Legal Prosumers: How Can Government Leverage User-Generated Content?
For the full text of the thesis, please contact Dr. Fernández-Barrera.
Like this:
Like Loading...
Tags:Analysis of law-related user generated content, Computational linguistics and law, Consumer law information systems, Consumer Mediation Ontology, Crowdsourcing and legal information systems, Gov 2.0, Government 2.0, Law-related user generated content, Legal knowledge representation, Legal linguistics, Legal natural language processing, Legal ontologies, Legal social media, Legal text mining, Legal text processing, Legal user generated content, Linguistics and law, Mediation-Core Ontology, Meritxell Fernández-Barrera, Natural language processing and law, ONTOMEDIA project, Semantic Web and law, Social media and law, User-generated content and legal information, User-generated knowledge through legal ontologies how to bring the law into the Semantic Web 2.0, VoxPopuLII
Posted in Applications, Dissertations and theses, Research findings | Leave a Comment »
December 14, 2011
[NOTE: Updated 19 December 2011 to link to Mr. Bommarito's post describing the development of Legal Language Explorer.]
Professor Dr. Daniel Martin Katz of Michigan State University College of Law, Michael J. Bommarito II of Computational Legal Studies, and colleagues, have launched Legal Language Explorer, a new, free, Web-based software application that performs Google N-gram word counts on U.S. Supreme Court decisions.
Click here for the JURIX 2011 presentation slides describing the service.
Mr. Bommarito has described the development of the service in a new post, entitled Building Legal Language Explorer: Interactivity and drill-down, noSQL and SQL
One of the notable features of Legal Language Explorer is that it analyzes full-text court decisions published free on the Web by Public.Resource.Org, as part of the Law.gov legal open government data movement. Katz and Bommarito have previously argued that making more full-text legal resources available free on the Web would enable researchers to build new software tools for processing those resources, and to generate new knowledge through innovative analysis of those resources. Legal Language Explorer exemplifies this kind of software innovation fostered by open legal data, while the authors’ new paper, entitled Legal N-Grams? A Simple Approach to Track the ‘Evolution’ of Legal Language, illustrates the kinds of original research that may arise from analysis of such data.
Like this:
Like Loading...
Tags:Computational Legal Studies, Computational linguistics and law, Daniel Martin Katz, Evolutionary theory and legal information systems, Google N-Grams, Legal computational linguistics, Legal Language Explorer, Legal N-Grams, Michael J Bommarito II, Statistical analysis of legal documents, Statistical analysis of legal language, Statistical methods in legal informatics
Posted in Applications, Articles and papers, Technology developments, Technology tools | 3 Comments »
November 17, 2011
Dr. Meritxell Fernández-Barrera of Cersa (Centre d’Études et de Recherches de Sciences Administratives et Politiques)- CNRS has posted Legal Prosumers: How Can Government Leverage User-Generated Content?, on the VoxPopuLII blog, published by the Legal Information Institute at Cornell University Law School.
In this post, Dr. Fernández-Barrera describes new, innovative methods of analyzing very large quantities of law-related user-generated content.
In two recent studies described in the post, Dr. Fernández-Barrera and colleagues analyzed thousands of consumer law queries and complaints submitted by citizens to consumer protection agencies in Spain and Italy. Using a combination of automated text extraction techniques and expert input from lawyers, the researchers mapped the citizens’ lay terminology to formal legal terms. The technical legal language was expressed in legal ontologies — the Mediation Core Ontology and the Consumer Mediation Ontology — or in statutes: the Italian Consumer Code. The results of this research give us new insights about citizens’ knowledge of consumer law, and about the relationships between formal legal language and the way law is expressed in lay language.
Dr. Fernández-Barrera then describes her recent research into methods for making legal semantic analysis of user-generated content scalable. In studies of citizens’ online queries about consumer law and noise-nuisance complaints, she and her colleagues found that by focusing on language patterns involving emotions, events, and “stereotypical situations appearing in the description of legal cases by citizens,” automated techniques alone could successfully analyze very large quantities of user-generated content. Dr. Fernández-Barrera concludes by reflecting on the ethical dimensions of governments’ use of citizen comments in law- and policy making.
This post should be of interest to policy makers, the e-government and Government 2.0 communities, the Web 2.0 community, those who study legal language, and developers of legal information systems.
Like this:
Like Loading...
Tags:Analysis of law-related user generated content, Computational linguistics and law, Consumer law information systems, Consumer Mediation Ontology, Crowdsourcing and legal information systems, Gov 2.0, Government 2.0, Law-related user generated content, Legal knowledge representation, Legal linguistics, Legal natural language processing, Legal ontologies, Legal social media, Legal text mining, Legal text processing, Legal user generated content, Linguistics and law, Mediation-Core Ontology, Meritxell Fernández-Barrera, Natural language processing and law, ONTOMEDIA project, Semantic Web and law, Social media and law, User-generated content and legal information, VoxPopuLII
Posted in Applications, Others' scholarly or sophisticated blogposts, Research findings, Technology developments | Leave a Comment »