Robert Shaffer of the University of Texas will present a paper entitled Power in Text: Grammar and Language in Comparative Delegation, this week at APSA 2013.
Here is the abstract:
Throughout the political science and legal literatures, scholars often use statutes and national constitutions as key sources of data. However, most of these analyses rely on labor-intensive coding schemes, offering precise results but requiring long hours from trained researchers. Existing quantitative measures (e.g., word counts of particular documents) have produced insightful results, but provide imprecise measures for variables of interest. Computational linguistics techniques, including natural language processing (NLP), provide an alternative approach; because legal documents are written so systematically, these texts lend themselves well to automated analysis, allowing computers to extract information in a repeatable fashion. By combining NLP tools with existing coding schemes and close readings of individual documents, scholars can identify and measure key traits of particular texts, creating powerful and innovative measurement schemes.
As a sample application of these techniques, I use NLP programming packages to develop a new measure for the level of executive discretion offered by a particular legal document. I conceptualize “discretion” as the average number of other players involved at each decision point in a statute or national constitution. I then use computational linguistics tools to develop two proposed measures, based on normalized word count and on proper noun incidence, respectively. In particular, I attempt to measure both the number of powers offered by a document, and the number of veto players involved in each power. Finally, I conduct validity tests on each measure, as well as on competing approaches from the literature. For my validity tests, I use data obtained from Elkins, Ginsburg, and Melton’s Comparative Constitutions Project (CCP), which hand-codes national constitutions based on a wide array of attributes. Using CCP data, I generate summary “discretion” statistics for a sample of post-1945 constitutions, which I treat as the “true values” for each document. I then compare the results for each of my measures to the CCP data. Generally speaking, I find that my NLP-based measures are more strongly correlated with these “true values” than the competing approaches, highlighting the potential power of these tools.