ABSTRACT Title of Thesis: COMPUTATIONAL ANALYSIS OF THE CONVERSATIONAL DYNAMICS OF THE UNITED STATES SUPREME COURT Timothy W. Hawes, Master of Arts, 2009 Thesis directed by: Profesor Jimy Lin The iSchool Profesor Philip Resnik Department of Linguistics The decisions of the United States Supreme Court have far-reaching implications in American life. Using transcripts of Supreme Court oral arguments this work looks at the conversational dynamics of Supreme Court justices and links their conversational interaction with the decisions of the Court and individual justices. While several studies have looked at the relationship betwen oral arguments and case variables, to our knowledge, none have looked at the relationship betwen conversational dynamics and case outcomes. Working from this view, we show that the conversation of Supreme Court justices is both predictable and predictive. We aim to show that conversation during Supreme Court cases is paterned, this paterned conversation is asociated with case outcomes, and that this asociation can be used to make predictions about case outcomes. We present thre sets of experiments to acomplish this. The first examines the order of speakers during oral arguments as a paterned sequence, showing that cohesive elements in the discourse, along with references to individuals, provide significant improvements over our ?bag-of-words? baseline in identifying speakers in sequence within a transcript. The second graphicaly examines the asociation betwen speaker turn-taking and case outcomes. The results presented with this experiment point to interesting and complex relationships betwen conversational interaction and case variables, such as justices? votes. The third experiment shows that this relationship can be used in the prediction of case outcomes with acuracy ranging from 62.5% to 76.8% for varying conditions. Finaly, we offer recommendations for improved tools for legal researchers interested in the relationship betwen conversation during oral arguments and case outcomes, and suggestions for how these tools may be applied to more general problems. COMPUTATIONAL ANALYSIS OF THE CONVERSATIONAL DYNAMICS OF THE UNITED STATES SUPREME COURT by Timothy W. Hawes Thesis submited to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfilment of the requirements for the degre of Master of Arts 2009 Advisory Commite: Profesor Jimy Lin, Co-Chair Profesor Philip Resnik, Co-Chair Profesor Wayne McIntosh ?Copyright By Timothy W. Hawes 2009 ii Acknowledgments I couldn?t possibly list al the people I want to thank and al the things they have done for me. Please know whether it is listed here or not, I am extremely grateful for everything my friends, family and acquaintances have done for me. I would like to thank: Dr. Jimy Lin and Dr. Philip Resnik, my advisors on this project, for their continualy invaluable support, fedback, encouragement and advice not just on this project, but in general. Dr. Wayne McIntosh and Dr. Michael Evans, for their generosity with their time, opinions and ideas throughout the course of this project. It was a discussion with them that gave initial shape to the conversational view taken in this thesis. Dr. Amy Weinberg, my first oficial advisor in the Department of Linguistics, for her excelent guidance and understanding. Dr. Stephan Grene, for his time and ideas at the earliest stages of this work. The Department of Linguistics and al of its profesors, for their support and guidance. Al of my sources of funding over the past 3 years. My crack team of proof-readers: Kely Schultz, Dan Knudsen, Mindy Watson, Mischa Bauermeister, Gordon Freman, Indira Sriram and Brian Hawes. They noticed more typos than I?d care to admit and each provided excelent suggestions on how to improve my thesis. Al of my friends at the University of Maryland and especialy Johannes, Josh, and Greg, for their good humor, support, advice and fedback over the years; Asad for his last minute help saving me hours of highway driving, and also his always enjoyable iii conversations; and the many more who should be thanked for everything from invaluable help and support to just being good friends. Al of my friends who have since dispersed across the globe: Dan, Gordon, John, Tim, Mischa, Kevin, Kara and others. You have done more for me than I could ever recount. I thank Kely, for her love and support over the years. And my family: Mom, Dad, Kendra, Tim, Gam, my aunts and uncles (especialy Aunt Jane and Uncle Wayne), my cousins (especialy John and Kyle) and Chris (who while she isn?t technicaly ?family?, should be listed here). I appreciate everything you al have done for me. iv Table of Contents Acknowledgments..........................................................................................................ii Table of Contents...........................................................................................................iv List of Tables.................................................................................................................vi List of Figures...............................................................................................................vii Chapter 1 Introduction.....................................................................................................1 Chapter 2 Background.....................................................................................................5 2.1 Oral Arguments/Supreme Court.............................................................................5 2.2 Discourse Analysis................................................................................................7 2.3 Conversation Analysis.........................................................................................10 2.3 Computational Conversational/Discourse Analysis..............................................12 2.4 Quantitative Oral Arguments Research................................................................13 2.5 Spaeth Supreme Court Database..........................................................................19 Chapter 3 Sequence Labeling........................................................................................20 3.1 Methods...............................................................................................................21 Data Preparation...................................................................................................21 Corpus Description................................................................................................22 Feature extraction.................................................................................................22 Labeling................................................................................................................23 Features................................................................................................................25 3.2 Experiments.........................................................................................................30 Results...................................................................................................................30 Discussion.............................................................................................................34 Chapter 4 Visualizing Dynamics...................................................................................35 4.1 Methods...............................................................................................................36 Corpus description................................................................................................36 Case Segmentation................................................................................................37 Labeling description..............................................................................................38 The Rose Charts....................................................................................................39 4.2 Results.................................................................................................................41 How to read the charts...........................................................................................41 Vote Split Condition (VOTE)..................................................................................44 Direction Condition (DIR).....................................................................................46 Justice Direction (JDIR)........................................................................................50 4.3 Discussion...........................................................................................................54 Chapter 5 Vote Prediction..............................................................................................56 5.1 Prior approaches..................................................................................................56 5.2 Forecasting votes.................................................................................................59 5.3 Methods...............................................................................................................60 Corpus Description................................................................................................60 Turn Distribution...................................................................................................61 Data Preparation...................................................................................................64 Baselines...............................................................................................................66 5.4 Experiments.........................................................................................................67 Results...................................................................................................................70 v Discussion.............................................................................................................74 Chapter 6 Conclusions...................................................................................................76 6.1 Future work and Unanswered Questions..............................................................77 Appendix A Rose Charts...............................................................................................81 Al Cases...................................................................................................................81 DIR Condition...........................................................................................................83 JDIR Condition.........................................................................................................86 Vote Split..................................................................................................................89 Appendix B Discourse Markers.....................................................................................97 References...................................................................................................................102 vi List of Tables Table 1 Example conjunctive relation markers (Brown and Yule 1983; 191).................10 Table 2 Sumary of previous studies. ?Manual? indicates whether or not the study used manual methods of outcome forecasting (the alternative being automatic methods). ?Cases? indicates the number of cases tested in the study...............................................15 Table 3 Examples of non content items from the transcript of the oral arguments from Ali v. Federal Bureau of Prisons (06-9130) with the special symbols used to identify these items in our experiments................................................................................................22 Table 4 Mean Martin-Quinn scores for the 2005-2007 terms. Note, negative scores indicate a liberal ideology and positive scores indicate a conservative ideology. The higher (lower) the number the more conservative (liberal) the ideal point is...................36 Table 5 Comparison of ?most atention given? approaches with varying interpretation of ?question?. ?By turn? indicates that we count each turn as a ?question?. ?By ?s? indicates we counted ?s in the transcribed justices? speech, usualy indicating an interogative statement.......................................................................................................................57 Table 6 Comparison of ?most atention given? rule for extreme cases (i.e. diference in words or questions is > 2 s.d. from the mean). The ?Cases? column indicates how many cases met this criterion..................................................................................................58 Table 7 Speakers and their coresponding symbols. The count column identifies the frequency with which each symbol appears in the corpus..............................................61 Table 8 20 most frequent n-grams grouped by correspondence pair, ranked by most frequent n-gram in pair..................................................................................................62 Table 9 Infrequent n-grams containing 3-4 instances of justice turns.............................63 Figure 17 Clasification results including prior approaches (Court I only), baseline, and absolute acuracy. Eror bars are the 90% confidence interval as calculated by the Clopper-Pearson method for infering exact binomial confidence intervals....................70 vii List of Figures Figure 1 Empirical probability of each justice symbol in the corpus (Hawes et al. 2009). ......................................................................................................................................24 Figure 2 Diagram of a linear chain of labels, where X i is a group of observed features and Y i is a label....................................................................................................................25 Figure 3 Example of features extracted from a transcript segment..................................29 Figure 4 1 st CRF 10-fold Cross-Validation Results. Annotations represent the relative improvement over Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 2009)........................................................................................................................31 Figure 5 2 nd order CRF 2-fold Cross-Validation Results. Annotations represent the relative improvement over Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 2009)........................................................................................................32 Figure 6 Overal acuracy of first and second order CRFs. Bars are annotated with the relative improvement over Unigram baseline. Eror bars are the 95% confidence interval as calculated by the Clopper-Pearson method for infering exact binomial confidence intervals.........................................................................................................................33 Figure 7 Sequence of truncated turns, the sequence extracted from these turns and the resulting trigrams...........................................................................................................38 Figure 8 Stevens - Rose Diagram of Al Cases...............................................................42 Figure 9 Kennedy ? Rose Diagrams for 5-4 and 9-0 split cases......................................45 Figure 10 Alito - Rose Diagrams for the DIR Condition................................................47 Figure 11 Ginsburg - Rose Diagrams for the DIR Condition..........................................48 Figure 12 Kennedy - Rose Diagrams for the DIR Condition..........................................49 Figure 13 Alito - Rose Diagrams for the ALTODIR Condition......................................51 Figure 14 Souter - Rose Diagrams for the SOUTDIR Condition....................................53 Figure 15 Kennedy - Rose Diagrams for the KENDIR Condition..................................54 Figure 16 Examples of ?Laughter? and interuptions in the transcript............................64 Figure 17 Clasification results including prior approaches (Court I only), baseline, and absolute acuracy. Eror bars are the 90% confidence interval as calculated by the Clopper-Pearson method for infering exact binomial confidence intervals....................70 Figure 18 Informative sequences from Thomas decision tres with examples from transcripts......................................................................................................................73 1 Chapter 1 Introduction The United States Supreme Court plays a significant role in the U.S. Government; the decisions reached by Supreme Court justices have far-reaching implications for the entire American legal system. In this work, we aim to combine conversation analysis with computational techniques in novel approaches for the analysis of the behavior of the U.S. Supreme Court, in terms of both the justices individualy and the Court as a whole. Considerable amounts of work have been done applying computational techniques to the political domain. For example, Mosteler and Walace (1964) utilized models based on function word counts to identify the authorship of The Federalist Papers. Laver et al. (2003) used party manifestos and legislative speeches to identify the ideological positions of political parties in Britain, Ireland and Germany. More directly related to this work is that of Thomas et al. (2006), who examined the content of congresional floor debates and the relationships betwen congrespersons to determine whether individuals were in support of or opposition to the legislation under discussion. Also, Evans et al. (2007) clasified the ideological position of third-party briefs from the briefs? content. We leave further discussion of related work to Chapter 2. This thesis explores justice turn-taking during United States Supreme Court oral arguments and its relationship to other aspects of justice behavior. For our purposes, we wil treat each speech segment in the argument transcripts with a single speaker identifier as one turn. 1 Thus, the oral arguments are organized into a series of turns produced by the 1 Due to the Courtrom reporter?s handling of factors such as interuption and overlaping spech, this definition of turn is somewhat diferent from that used in conversation analysis, where they are ?turns at talk? composed of units that are gramaticaly and phoneticaly realized and ?constitute a recognizable 2 justices and the atorneys before the Court. The first experiments we discuss look at the prediction of the turn-taking behavior of justices by exploring the task of labeling turns with their speakers when this information is unavailable in an oral arguments transcript. The next Chapter is a broad-scale analysis of the turn-taking paterns of justices in various conditions by looking at paterns of when justices typicaly follow-up on other justices? lines of questioning. Chapter 5 discusses a group of experiments that looks at the turn-taking behavior of justices as a predictor of case outcomes. This work wil be imediately relevant to researchers exploring the behavior of the United States Supreme Court. This view of the conversational dynamics betwen the justices as both predictable and predictive is one that has received litle atention in the literature. By applying computational models to this approach, this work wil provide new tools that may be able to open up novel avenues of research for legal scholars. Moreover, this work should also have broader implications. While we have concentrated on applying existing computational tools to a new approach to understanding the Supreme Court, the methods we develop here wil be applicable to similar setings where one may wish to link conversational actions to other actions with a real world impact. If this is the case, then these methods wil help to provide a deeper understanding of other social institutions and human conversational interaction in general. While the narow focus of this work is to produce methods for clasification and labeling of the oral arguments of the U.S. Supreme Court, this research was conducted with the broader goal of creating novel approaches for judicial scholars to use in examining the dynamics of the Supreme Court. action in context? (Scheglof 207; 3-4). Despite this diference, there wil stil be significant overlap betwen what we are defining as a turn and what a conversation analyst would define as a turn. 3 Our primary objective is to gain a clearer understanding of the role of the conversational dynamics of Supreme Court justices. We aim to show that: a) predictable high level paterns exist in the conversational dynamics of the Supreme Court, b) these paterns may be asociated with other areas of interest to legal scholars such as voting paterns of the justices, c) this asociation betwen linguistic paterns and judicial paterns may be utilized to provide both short term insights (i.e. predicting the outcome of a particular case) and deeper insights about the behavior of the Supreme Court. In the proces of pursuing these objectives we have decided to minimize the need for specialized knowledge and training for feature identification. In order to do this, we minimize theoretical commitments, thus reducing the need for an extensive background in any particular theory of discourse. Moreover, we want to reduce reliance on features that can only be encoded with human judgment and expertise, by favoring features that can be automaticaly recognized. By restricting ourselves to such conditions we hope to maximize the applicability and reproducibility of our methods, as the reliance on human judgment has hampered both of these qualities in some previous work. Despite this, we expect that higher level information from more sophisticated approaches, such as sentiment analysis, would only add to the value and power of these basic approaches. Producing any positive result for this work is a contribution to the overal understanding of the Court. While smal studies using human judgments have produced relatively large positive results, larger studies using automatic methods stil achieve relatively smal improvements (Se Section 2.4). In one case, these automatic methods achieve comparable results to our own work with an order of magnitude more data. Also, when tested on our dataset, these methods achieve considerably lower results. Just as 4 these larger studies have contributed to the understanding of the relationship betwen one aspect of oral arguments and case outcomes, positive results in this work should contribute to the understanding of the relationship betwen conversational interaction and case outcomes. Moreover, given the relative simplicity of our feature sets, the fact that we are able to gain some predictive power at al from these features may be a surprising result for legal scholars (Evans, M. personal correspondence, August 28, 2009). Thesis Organization The remainder of this thesis is organized as follows: ? Chapter 2 discusses background on oral arguments, discourse and conversation analysis, computational approaches to discourse and conversation analysis, quantitative research on oral arguments, and the Supreme Court case database used in two of our experiments. ? Chapter 3, Chapter 4 and Chapter 5 cover our thre experiment groups dealing with turn sequence labeling, ?rose diagrams? of turn-taking and case outcomes and case outcome prediction, respectively. ? The final Chapter offers conclusions from this work and suggests some future research and unanswered questions. 5 Chapter 2 Background This chapter contains thre main parts. The first part covers the domain knowledge regarding the area of study contained in this thesis, namely, oral arguments and the Supreme Court. The second introduces the linguistic area of study we utilize in this thesis, specificaly, conversation and discourse analysis. The third part is an overview of computational studies in discourse analysis as wel as a review of both computational and manual studies of the Supreme Court. We include one final section to introduce our source of Supreme Court case data (not including oral argument transcripts). 2.1 Oral Arguments/Supreme Court As one of the last, and only public, stages a case goes through before the Supreme Court, the importance of oral arguments is often questioned. At this stage, al briefs have ben submited by each side of a case and by amici curiae, and the justices have had time to study the details of the case. It is believed that by this time, justices have had sufficient opportunity to make up their mind regarding a case, and so it is often suggested oral arguments play litle if any role in justices? decision making proces (Rhode & Spaeth 1976; Kurland & Hutchinson 1983; Segal & Spaeth 2002). Kurland and Hutchinson (1983) argue, ?There are a few cases in which oral argument serves as a means of discovery by the Justices. But there is no reason why this discovery could not be conducted beter by interogatories than by oral deposition.? This view is not just held by academics either: some justices have also expresed these views. Justice Thomas once said, ?99 per cent of the time justices have made up their mind when they go to the 6 bench. Also, there are so many questions you have to elbow your way in? (Rombeck 2002; 5B). Even for those justices who do view oral arguments as important, it would sem that they do not believe oral augments typicaly lead a justice to change his or her mind. On the topic of whether oral argument maters, Justice Rehnquist wrote, ?I think it does make a diference? though only in ?a significant minority of cases?.The change is seldom a full one-hundred-and-eighty-degre swing, and I find that it is most likely to occur in cases involving areas of law ith which I am least familiar? (Rehnquist 2002). In a 2009 interview, Justice Scalia (who admits that he once believed oral arguments were a ?dog and pony show? (Johnson 2004)) said, ?A lot of people are under the impresion that [oral advocacy] is a dog and pony show. The judges have read the briefs, they come in with their minds made up, and this is just a performance for the benefit of your client. If that?s the impresion you have, you are just wrong. I have never met a judge who doesn?t think that oral argument is important? (Duke Law 2009). However, similar to Rehnquist, he suggested that only in cases where he has not already made up his mind do oral arguments play a role in his decision making. While the view that oral arguments are unimportant is commonly held, some scholars have also argued against it, suggesting that justices do in fact utilize information gained during oral arguments to make decisions (Johnson 2001, Johnson 2004, Shullman 2004, Johnson et al. 2006). Johnson (2001; 2) points out that up to oral arguments, the majority of information the justices have sen is that which ?other actors want them to se and consider?, and that justices use oral arguments as an opportunity to get at what they want to ?se and consider? in order to make a decision in the case. However, even in 7 these studies, the strongest conclusion made is that, in typical cases, oral arguments at best are used to refine a justices? opinion, thus having an important impact on the details of a case?s outcome but not necesarily on the case?s overal outcome. Johnston et al. (2009a) note David Frederick?s observation that oral arguments are composed of conversations betwen a lawyer, a justice and another ?potentialy persuadable justice?. While the above description of oral arguments should indicate that the existence of ?potentialy persuadable justices? may be in question, it sems natural to presume that even if justices cannot be persuaded during oral arguments, other justices wil stil atempt to do so. 2.2 Discourse Analysis Discourse analysis is a fairly broad subfield of linguistics. Schifrin et al. (2001; 1) note that discourse analysis is often not strictly defined but usualy refers to one of thre domains of study; ?(1) anything beyond the sentence, (2) language use, and (3) a broader range of social practice that includes nonlinguistic and nonspecific instances of language.? Given this broad definition of discourse analysis, it is clear that there is an open view of what exactly is meant by ?discourse?. Typicaly, however, the term is used to indicate a language-based communication forming a ?unified whole? (refered to as a text in the discourse analysis literature), and such communications can take on a variety of forms including writen, spoken or signed (Haliday and Hasan 1976, Johnstone 2007). With regard to the domains of study discourse analysis may involve, aspects of this work could fal under each of these categories; while our first experiment looks at (potentialy) extra-sentential linguistic units, overal this work is looking at language use in a particular social seting, the Supreme Court, and the relationship betwen that language 8 use and the overal behavior of the Supreme Court. As for our particular version of discourse, we are dealing with transcribed spontaneous speech which inherently incorporates both writen and spoken language. Regardles of the form of communication under consideration, thre of the key aspects of discourse an analyst is often concerned with are texture, cohesion, and coherence. Texture, the defining characteristic of a text, is identified by Haliday and Hasan (1976; 2) as ?the property of being a text?this [texture] is what distinguishes it [a text] from something that is not a text?. Take (1) for example. (1) A: Does the store cary galvanized wire? B: Yeah, they do. This simple exchange can be said to have texture, because it can stand alone as (or at least be a part of) a unified conversation. Contributing to the texture of (1) is the use of reference (anaphora; they refers to the store) and substitution (do stands in for carry galvanized wire) in B. Taken together, these lend cohesion to the text, creating texture. Cohesion refers to the relations that exist within a text betwen separate units in that text and the idea that ?the INTERPRETATION of some element in the discourse is dependent on that of another? (Haliday and Hasan 1976; 4). In the example above, in order to interpret B correctly we need A. Cohesion can take on a number of forms, faling under the headings of gramatical cohesion and lexical cohesion. Gramatical cohesion refers to the use of gramatical tools to create cohesive relations in a text; including reference and substitution as in the example above as wel as elipsis (omision of clauses; e.g. Who stole the book? ? John stole the book) and conjunction (linking of clauses; e.g. John went to the bank. Later he went to the 9 movies). 2 We wil discuss conjunction more thoroughly later in this Section. Lexical cohesion includes repetition of the same word, or semanticaly related words such as holonyms (tre-forest), hypernyms (hat-clothing), semanticaly ?close? terms (banana- apple), etc. (Haliday and Hasan 1976, Brown and Yule 1983). While cohesion deals with overt relations in a text, coherence deals with relations that must be interpreted by an individual listening to or reading a text. Coherent relations are the underlying relations that hold betwen segments of text (Brown and Yule 1983). Returning to (1) above, while B is a cohesive response to A, we need to appeal to coherence in order to describe it as an appropriate response to A, as cohesion is no guarante of coherence. For example, suppose we changed B in (1) as we have done in (2). While B is cohesive with A in (2), they stil refers to the store, it is no longer a coherent answer to A. (2) A: Does the store cary galvanized wire? B: They are open on Sundays. Thus, coherence too is a necesary aspect in building an interpretable discourse. For this work, we make the asumption that the texts we are dealing with, as spontaneous conversations betwen multiple individuals, are in fact coherent discourses at least for the parties involved. And, while it is not necesarily the case across al sorts of text and al relations within a text, we are making the asumption that the majority of cohesive relations existing in the text are representative of underlying coherent relations. The connection betwen conjunction and the coherence relations they signal plays a role in Chapter 3. While the collection of potential conjunctive elements in English is extensive, Brown and Yule (1983) offer several examples as summarized in Table 1. 2 Note that the usage of some terms, such as anaphora and elipsis, is somewhat diferent in discourse analysis than in generative linguistics. 10 Type Examples Additive and, or, furthermore, similarly, in addition, Adversative but, however, on the other hand, nevertheles Causal so, consequently, for this reason, it follows from this Temporal then, after that, an hour later, finaly, at last Table 1 Example conjunctive relation markers (Brown and Yule 1983; 191). It is important to note that because of the role of cohesion in the interpretation of discourse, these elements do not always identify the relations they are paired with in Table 1, nor are explicit elements required to mark these sorts of relations (Brown and Yule 1983). Nevertheles, overt markers of such relations are abundant in many forms of discourse, and do tend to exhibit some regularity in the relations they identify (as indicated by Table 1), even if the relationship is at times variable. 2.3 Conversation Analysis Because this work deals with transcripts of oral arguments, it is most closely related to conversation analysis which may be viewed as a branch of discourse analysis. 3 Hutchby and Wooffit (2008; 13) write the ?aim? of conversation analysis (CA in their terms) ?is to focus on the production and interpretation of talk-in-interaction as an orderly acomplishment that is oriented by the participants themselves?. CA seks to uncover the organization of talk?from the perspective of how the participants display for one another their understanding of ?what is going on??. Because of this view, there is a focus on conversation as a sequence of ?turns at talk?, with each subsequent speaker turn in a conversation indicating the speaker?s understanding of the preceding conversation 3 However, conversation analysis comes with its own tols, methods and procedures for recording and analyzing conversation that we do not make use of. Despite this, many of the topics of interest to the conversation analyst are relevant to this discusion. 11 (Hutchby and Wooffit 2008). In the present work we are particularly interested in this sequence of turns, how predictable that sequence is in a seting like the Supreme Court, and the relationship betwen this sequence and other actions taken by the Court. The previous discussion of cohesion and coherence can be tied into conversation analysis through a particular aspect of conversational sequence organization known as adjacency pairs. Adjacency pairs include two turns that are usualy, but not necesarily, adjacent in conversation, where the first turn ?initiates some exchange? and the second turn is ?responsive? to the first. These are treated as pairs because not al types of initiations can be followed by al sorts of responses. So while Question/Answer (e.g. (1) and Apology/Aceptance (e.g. (3) are typical adjacency pairs, Question/Aceptance and Apology/Answer are not (Schegloff 2007; 13-14). (3) A: Sorry I broke your mug. B: That?s ok. Regardles of the pair, recognizing a pair as a member of a particular type requires a coherent interpretation of that pair. However, responses to the first part of a pair may include or be entirely composed of elements that are cohesive with the previous turn (4). (4) A: When are we going to the movies? B: Later. Often times, as in the example given, these cohesive elements are conjunctive, linking the first turn to the second with relations related to those in Table 1. For example, if the initiating turn is a statement, a possible response may be to disagre with the statement. In this case, the response may begin with an ?adversative? element (5). (5) A: Let?s go to the movies. B: But I don?t want to. 12 As stated before, this relationship betwen cohesive elements and coherence relations offers insight into the discussion in Chapter 3. 2.3 Computational Conversational/Discourse Analysis Though considerable work has been done in the domain of computational discourse analysis, interest in multi-party discourse involving more than two parties is relatively new, instead favoring single and two-party discourse. Broadly speaking, much of computational linguistics that explores language on the document level has focused on single-party discourse, since texts typicaly represent a single-party discourse. The following is a sampling of representative papers for single, two, and multi-party discourse. We concentrate on a variety of the more popular areas of research in discourse including coherence relation identification and topic segmentation and identification. For single party discourse (including text and monologue), Mann and Thompson?s (1988) Rhetorical Structure Theory (RST) has been used as a framework for identifying coherence relations in texts from a single author (Marcu, 1997; Corston- Oliver, 1998). Marcu and Echihabi (2002) developed an approach to automaticaly identify discourse relations that hold betwen sentences and within sentence parts from a very large corpus of unannotated sentences drawn from textual resources. Grosz and Hirschberg (1992) used a Clasification and Regresion Tre analysis to identify discourse segments (building on the theory of discourse discussed in Grosz and Sidner (1986)) in Asociated Pres articles read aloud by news broadcasters. Morris and Hirst (1991) explored ?Lexical Chains? (spans of related words in a discourse; in this case text) as a means for modeling lexical cohesion. 13 In the area of two-party dialog, Stolcke et al. (2000) modeled ?dialogue acts? in telephone conversations for automatic labeling. 4 Forbes-Riley and Litman (2004) used acoustic and non-acoustic cues in spoken dialogs to predict the emotional state of students in one-on-one interaction with tutors via AdaBoost with decision tres. Gurevych and Strube (2004) used (manualy disambiguated) noun senses from WordNet to summarize the content of telephone-based conversations. Finaly, Wiliams and Young (2007) developed an approach for managing spoken human-machine dialogue. Much of the existing research on conversation involving thre or more parties has been conducted using the International Computer Science Institute (ICSI) meting corpus (Janin et al. 2003), though other corpora are available (e.g. TalkBank, which includes U.S. Supreme Court oral arguments as a subset of its documents (MacWhinney et al. 2007)). Galey et al. (2003) use a lexical cohesion approach to create an unsupervised method of topic segmentation in multi-party ICSI metings, while Purver et al. (2006) offer an unsupervised method for topic segmentation and identification using Bayesian inference. Galey et al. (2004) used lexical, contextual and durational cues to identify agrement and disagrement betwen speakers turns in ICSI metings. 2.4 Quantitative Oral Arguments Research To date, there have been several studies dealing with Supreme Court oral arguments. Johnson et al. (2009b) examine factors that may be involved in determining why and when justices wil give a disent from the bench, including the number of questions asked by the Court during oral arguments. This study found a smal efect in the relationship betwen disents from the bench and case activity measured by the 4 Dialog acts are often one part of an adjacency pair, e.g. ?STATEMENT, QUESTION,? AGREMENT, DISAGREMENT, and APOLOGY? (Stolcke et al. 2000). 14 number of questions asked during oral arguments. In work related to our first experiments, Yuan and Liberman (2008) conducted speaker identification experiments using audio transcripts of oral arguments from 78 cases from the 2001 term. 5 For the 800 ?clean? test samples used, 98% speaker identification acuracy was achieved by training 8 justice specific speech recognition models, applying each model to a test utterance, and using the model with the highest score to identify the justice. We wil now discuss several studies aimed at forecasting case outcomes, which are summarized in Table 2. Wrightsman (2008) details several atempts to use manual quantitative and qualitative analysis to predict votes. The first of these examples recounts New York Times Supreme Court reporter Linda Grenhouse?s prediction of case outcomes based solely on oral arguments using her experience as a courtroom reporter. Of 27 articles she prepared based on oral arguments 17 contained predictions, 12 of which were correct (and one held-out because the case was dismised). The second example is an analysis of 28 cases from the 1980 and 2003 terms by John Roberts. By determining which side was asked the most questions he was able to determine the winner in 24 of those 28 cases studied. The third is a study by law student Sarah Shullman, who atended 10 argument sesions, and recorded information about each question asked including the content, the speaker, the level of ?hostility?, and the tone of the speaker?s voice. After analyzing 7 cases, Shullman also setled on a ?most questions asked? rule that predicted the winner in 6 of the 7 cases analyzed and the 3 held out cases. However, as Wrightsman (2008; 133) notes, ?determining what constitutes a ?question? is not so simple?. For example, Wrightsman (2008; 136) writes, ?interaction 5 Audio transcripts were acompanied by writen transcripts, speaker identifications and manual word- alignment from the OYEZ project (htp:/ww.oyez.org/) (Yuan and Liberman 208). 15 betwen advocates and justices do not follow in a discrete manner; two justices may begin to speak at the same time, a justice may interupt an advocate, and justices may make elongated statements that may contain several questions.? From an even more basic standpoint, it is not clear whether or not researchers limit questions to interogative statements. Without explicitly identifying how questions are to be counted, replicability of these sorts of experiments wil be inherently shaky. Study Cases Acuracy Method Manual Grenhouse 16 75.0% Experience yes Roberts 28 85.7% Most Questions Asked yes Shullman 10 90.0% ost Questions Asked yes Wrightsman 24 42% Most Questions Asked yes Ruger et al. 68 75% Case metadata no Johnson et al. ~2000 66.2%/67.5% Most Questions Asked / Words Used no Table 2 Sumary of previous studies. ?Manual? indicates whether or not the study used manual methods of outcome forecasting (the alternative being automatic methods). ?Cases? indicates the number of cases tested in the study. The final study discussed in Wrightsman (2008) was conducted by Wrightsman and a student. It examined 24 cases from the October 2004 term, 12 of which were identified as ?very ideological? and 12 of which were identified as ?definitely not- ideological?. For each of these cases they determined whether each justice?s ?overal patern of questions? was ?unsympathetic? to a particular side in the case, as wel as the number of questions asked of each side. While no definition of ?unsympathetic questioning? is provided, they do provide an example of an unsympathetic statement from Small v. United States: in arguing for the side of Smal, Justice O?Connor said, ?Congres thinks about the United States, our country, and if it means to say something wil take place in other places in the world, it says so clearly?. 16 While they do not report absolute acuracy values for the ?unsympathetic? questioning approach, they do point out that 87% of the unsympathetic comments were directed at the losing side in the ideological cases and 69% of the unsympathetic comments were directed at the losing side in non-ideological cases. 6 Perhaps more importantly, they report that the ?more questions asked? rule employed by Shullman and Roberts led to 42% acuracy. In an atempt to rectify the discrepancy for the ?most questions asked? rule, results remained mixed, though a potential patern emerged; namely this rule sems to be most useful in ideological cases and least useful in non- ideological cases. While there has been extensive quantitative study on Supreme Court forecasting, computational work has been rather limited with only two studies (Ruger et al. 2002, 2004 and Johnson et al. 2009a). Ruger et al. (2002, 2004) utilized clasification tres built from 6 metadata features for 8 years? worth of Supreme Court cases under Rehnquist (658 cases). The metadata used include: (1) the circuit of origin for the case; (2) the isue area of the case, coded from the petitioner?s brief using Spaeth?s protocol; (3) the type of petitioner (e.g., the United States, an injured person, an employer); (4) the type of respondent; (5) the ideological direction of the lower court ruling, also coded from the petitioner?s brief using Spaeth?s protocol; and (6) whether or not the petitioner argued the constitutionality of a law or practice. (Ruger et al. 2004) 6 Though presumably not the case, their method of reporting leaves open the extreme posibility that only two cases contained unsympathetic questioning and for those two cases 87% and 69% of the unsympathetic questions were directed at the losing side. Of course, if this posibility is open, les extreme scenarios about the distribution of the questions are posible. In any case, this does not give a clear picture of the acuracy provided by this aproach. 17 The authors argued that each of these features could be identified by a non-expert, and indeed al but the 6 th feature can be found in the Spaeth database (Spaeth 2009). They used the clasification tres to predict cases for the 2002 term prior to the case?s decision (68 cases). Finaly, results from their clasification tres were compared to those of legal experts including ?71 academics and 12 appelate atorneys?, each of whom have ?writen and taught about, practiced before, and/or clerked at the Supreme Court?. The model performed with an absolute acuracy of 75%, while experts performed at only 58.8% (with results for 10.3% of cases ?inconclusive?). Not reported for this timeframe is the proportion of cases decided in favor of the petitioner or respondent. However, based on the term they report using and the cases they held out, it appears that the Court reversed 69.1% of cases during this period. 7 A more recent and much more comprehensive study was conducted by Johnson et al. (2009a). This study examines al cases from 1979 to 1995 (?over 2000 hours?), testing the ?most questions asked? hypothesis. Two logistic regresion models are created in this study, the first utilizing the diference in number of questions asked of each side, and the second utilizing the diference in number of words used to discuss the case for each side. In addition to these two main features, features are included in each model to control for potentialy confounding factors. These include a ?measure of the ideology of the median justice on the Court?, the direction of the lower court?s decision, a variable to code the interaction of these two previous variables, two variables to code if the Solicitor General participated as amicus curiae on behalf of the petitioner and the respondent and two variables indicating whether amicus briefs were submited on behalf of the petitioner 7 Note that there is generaly a reversal bias, but that this varies over time. 69.1%, however, is somewhat higher than the typical rate of reversal which is closer to 64%-6%. 18 and/or on behalf of the respondent. While each of the ?questions used? and ?words used? variables were the least informative variables in each of the models, they report smal, but noticeable efects for these two models with, 66.2% acuracy for the question diference model and 67.5% acuracy for the word diference model. While the results show relatively low acuracy, given that the Court?s tendency to reverse cases is around 64%, they do provide information to suggest that in extreme cases (>2 standard deviations from the mean diference in questions asked) the probability of a case being afirmed ranges betwen 18% and 39%. They report similar correlations with the distribution of the diference in words used for each side. Thus these results do suggest that despite the conflicting results presented by Wrightsman (2008), there is in fact some relevance to the ?most questions asked? hypothesis (and more generaly, a ?more atention given? hypothesis). However, as is discussed in Chapter 5, we find that for our own data set, the ?most questions asked? rule is not predictive across the corpus, though, as suggested by Johnston et al. (2009a) it does provide some benefit in the extreme cases. Though not explicitly a forecasting study, the work of Johnson et al. (2006, 2007) is also closely related to this work. They used Justice Blackmun?s records of the quality of arguments by individuals before the Court to examine the relationship betwen quality of oral arguments and case outcomes. In addition to Justice Blackmun?s records, they atempted to determine if any other factors such as atorney background and justice and atorney policy preferences had an impact on the quality of arguments presented to the Court. Their findings suggest that when the quality of one side?s oral arguments are significantly beter than another?s, the case is more likely to go to the side with the higher 19 quality arguments, and that an atorney?s background may be helpful in determining the quality of arguments they wil present. This advantage is as high as a 77.9% chance of reverse when the petitioner?s arguments are ?manifestly beter? than the respondents, and as low as 34.9% chance of reverse in the converse situation. 2.5 Spaeth Supreme Court Database Much of the work in this thesis utilized the Spaeth Supreme Court Database (Spaeth 2009; henceforth Spaeth database). The Spaeth database is a comprehensive listing of Supreme Court cases and acompanying variables dealing with the ?background? of the case (e.g. the origin of the case, the parties involved in the case, the isue area), ?chronological variables? including important dates of the case, the identity of the chief justice and the natural court, ?substantive variables? such as the isue area of the case and the direction of the decision, ?outcome variables? including the winner of the case, and ?voting and opinion variables? identifying the votes and opinions isued in the case. Often cases can involve multiple legal provisions or isues. In these instances, multiple listings are provided for each case. These listings separate variables that would otherwise be conflated. As suggested in Benesh (2002) we concentrate on the ?case citation? listing as we ?[want] to study decisions in the aggregate and [want] to count each decision only once.? 20 Chapter 3 Sequence Labeling 8 The work contained in this section aims to addres our first objective; to demonstrate that conversational paterns exist in Supreme Court oral arguments. This is acomplished by constructing a sequence labeling task that identifies speakers from turn content. Given a sequence labeling task, if speakers can be identified from the content of the turns and increasing the turn history in a model for sequence labeling improves performance, it indicates that paterns exist in the turn-taking behavior of Supreme Court justices. In a typical labeling task the objective is to identify present, but unobservable information (hidden variables) from observable information (observed variables). An example of a common sequence labeling task is part-of-speech (POS) tagging. In POS tagging, the objective is to identify the parts-of-speech (e.g. noun, adjective, preposition, determiner, conjunction, etc.) for words in a sentence. Framed as a sequence labeling problem, the hidden variables are the POS of each word and, in the simplest case, the observed variables are the words. Because the same words in diferent sequences may have diferent POS, one usualy wants to make use not only of the words themselves, but of sequential information as wel, such as the order of words or the sequence of the predicted POSs. Because of this, POS tagging is often approached with graph based statistical models that can easily make use both of the features in a sequence (i.e. words) and the sequence itself (e.g. DeRose 1988, Laferty et al. 2001, Toutanova et al. 2003). 8 This work was originaly published in Hawes et al. (209). Figures in the folowing Sections are from this paper. Other discusion wil either closely coincide with or match the content of this paper. Discusion is expanded and details are included to highlight the relevance of this work to this thesis. 21 Similar to POS tagging, we can construct a task where the observable information is a sequence of turns, and the hidden variables are the identities of the speaker for each turn. Supreme Court transcripts prior to 2004 offer an imediately relevant example, as justices were not uniquely identified for these cases. 3.1 Methods Data Preparation Though the cases used for each experiment set vary, al experiments share a common data preparation approach. Transcripts of oral arguments are posted the same day a case is argued in PDF format. Transcription is conducted by the Courtroom reporter, Alderson Reporting Company. While details of the transcription proces are not given, the character and infrequency of erors would indicate that transcripts are created manualy. 9 For each segment of speech by a single speaker, transcripts contain the speaker?s name (i.e. Speaker ID) and the content of the speech segment. For al experiments, each segment is treated as one speaker turn and thus the transcript is treated as an approximation of the turn sequence during the entire case. 10 Finaly, transcripts contain several non-content items including opening and closing time stamps and headers for the oral and rebuttal arguments of each litigant (Table 3). 9 For example, typos in speaker IDs (i.e. non-content text) such as JUSTICE KENNY instead of JUSTICE KENNEDY, or JUDGE ALITO instead of JUSTICE ALITO. 10 Of course, this sequence can only be an aproximation; there is no duration information, only coarse overlap information, and other discourse information such as filers (i.e. um) are often disregarded. 22 Symbol Examples TIME (11:08 a.m.), (Whereupon, at 12:08 p.m., the case in the above-entitled mater was submited.) START-ORAL ORAL ARGUMENT OF JEAN-CLAUDE ANDRE ON BEHALF OF THE PETITIONER, ORAL ARGUMENT OF KANON SHANMUGAM ON BEHALF OF THE RESPONDENTS START-REBUTAL REBUTAL ARGUMENT OF JEAN-CLAUDE ANDRE ON BEHALF OF THE PETITIONER Table 3 Examples of non-content items from the transcript of the oral arguments from Ali v. Federal Bureau of Prisons (06-9130) with the special symbols used to identify these items in our experiments. Al transcript PDFs were converted to XML format using an off the shelf utility, followed by custom built automatic cleanup to remove extraneous formating. Cleanup code and cleaned transcripts wil be made available at http:/ww.umiacs.umd.edu/~twhawes/oralarguments/index.html. Corpus Description At the beginning of this study the Court?s 2007 term had not yet completed, and prior to the 2004 term justices did not have unique speaker IDs. Thus we limited the corpus to the 2004-2006 terms. For the sake of consistency, we also filtered out cases that followed an atypical format. 11 For example, those cases that included arguments from amici curiae. Feature extraction From the XML formated cases we extracted the case content including: speaker IDs, speaker turn content and non-content items in the transcript. Turns were extracted as speaker ID/content pairs. From the content of each turn, we extracted features as shown in the Features Section (c.f. Figure 3). 11 Filtered out cases include: 02-1472, 04-1067, 04-473b (Garceti v. Cebalos (Reargued), 04-94, 05- 1342, 05-1575, 05-204, 05-705, 05-746, 05-922, 06-484, 06-5247, 06-5306, 06-593, 105 Orig. (Kansas v. Colorado) and 128 Orig. (Alaska v. United States). 23 Labeling We extract from each unit x a set of features, and our models predict the labels y i for a sequence, yielding {( 1 , y 1 ), . . . , ( n , y n )}. The labels y i comprise a set of 15 symbols: 11 for the justices (one for each), one to represent the lawyers (either on behalf of the petitioner or respondent), plus one special symbol for time stamps and two additional special symbols to encode the section headings (i.e. START-ORAL and START-REBUTAL). Figure 1 shows the frequency with which each of the justices spoke across al cases in the corpus. Not included are the non-justice parties from each side, who produce 47.4% of al turns. Also not included are the special symbols, which comprise 2.2% of symbols in the corpus. While the Court is only composed of 9 justices at any given time, we report 11 in Figure 1 due to changes in court membership, including Robert?s replacement of Rehnquist and Alito?s replacement of O?Connor. Because these justices do not span this entire corpus, their empirical probability should be lower than that of the justices? true tendency to speak during oral arguments (this, in turn, has an impact on our experimental results). 24 Figure 1 Empirical probability of each justice symbol in the corpus (Hawes et al. 209). Because we are predicting sequential labels from a collection of features, conditional random fields (CRFs; Laferty, McCalum, & Pereira, 2001) are a straightforward choice for this task. CRFs utilize undirected graphs to model the conditional probability of an unobserved sequence of labels (Y) given some observable sequence of features (X). CRFs are preferable to Hidden Markov Models (HMs) in many sequence-labeling tasks because they relax stringent conditional independence asumptions made by generative models. CRFs have been empiricaly shown to work wel for a variety of text procesing tasks, including POS tagging (Laferty et al. 2001), shalow parsing (Sha & Pereira, 2003), and named-entity recognition in the biomedical domain (Setles, 2004). Although the underlying structure of a CRF can take a variety of forms, a linear chain of labels (Figure 2) is often asumed for sequence-labeling tasks because they alow for eficient inference and decoding using the forward-backward and Viterbi algorithms (Sutton and McCalum 2006). Figure 2 corresponds to a first-order CRF, which determines probabilities using features at the current label along with the 25 previous label; similarly, a second-order CRF corresponds to a model that determines probabilities using features at the current label along with the previous two labels. For this work we used the MALET implementation of CRFs (http:/malet.cs.umas.edu). Figure 2 Diagram of a linear chain of labels, where X i is a group of observed features and Y i is a label Features The following is a discussion of the features used for this task. Note that an additional, contentles feature (T) was also used for every turn in order to ensure that al turns had at least one feature in the sequence. Unigrams Unique tokens, white space and punctuation separated, were extracted from each turn, ignoring stop-words. One feature for each token used in a particular turn was included in the feature set for that turn indicating the presence of that token. By including unigrams in our feature set, we are esentialy creating a ?bag-of-words? language model. 26 Because this is among the simplest possible approaches for this task, we treat unigrams as our baseline feature set. Discourse Markers (DM) Al interpretable discourse is composed of discourse relations, which serve to connect each unit of discourse. Correct interpretation of these relations is necesary in order to correctly interpret a discourse. Because we can safely asume that oral arguments are an interpretable discourse (at least for al parties involved) we can infer the presence of these coherence relations, not only betwen an individual speaker?s utterance but betwen the utterances of separate speakers. Instead of atempting to identify al of these relations automaticaly, however, we instead rely on discourse markers, which have traditionaly been viewed as overt cues for underlying discourse relations (cf. conjunctive cohesive elements, Section 2.2). Both semanticaly and syntacticaly optional, discourse markers are typicaly viewed as pragmatic units used to link clauses in a discourse (Schifrin 1987). As overt cues of discourse relations, discourse markers are a prime example of conjunctive cohesive elements of a discourse. For this task, we compiled a list of approximately 700 potential discourse markers identified through manual examination of the corpus and in the literature (Marcu, 1997; Oates, 2001). 12 Finaly, we make the simplifying asumption that any turn initial string that matches a member of this list is a discourse marker; a condition met in approximately 50% of turns. If multiple adjacent discourse markers appear at the beginning of the string, al were included. Consider an example from 12 Manual examination of the corpus may be sen as viewing test data prior to testing. The author readily admits this list would have idealy ben compiled from out-of-sample documents. However, note that the task is to examine the impact of discourse markers, not to identify discourse markers. Because al potential discourse markers were included using this method, we view this as paralel to anotations in the test data for a task that requires such information. 27 Kansas v. Marsh (Reargued) (2006): ?JUSTICE BREYER: Okay, wel, what do you say to ??, from which we extract two discourse markers (italicized). Because the discourse marker list is composed of both single and multi-word discourse markers, and because the majority of single word discourse markers are also stop-words, there is very litle overlap betwen the Unigram feature set and the DM feature set. Personal Reference (Ref) Finaly, we included a feature set for references to individuals. This feature set included four types of features: justice?s names, honorifics (i.e. ?Your Honor?), second person pronouns, a single feature for any justices mentioned, and a single feature for every non-justice name. 13 Instances of these features were identified using simple patern matching, which we found to be sufficient for most instances of addres due to the formal nature of Supreme Court discourse. Thus, this works wel as a basic model for direct addres closely related to that discussed in Jovanovic and Akker (2004). However, one should note that as a consequence of using simple patern matching and no additional or more sophisticated approaches, al instances of reference are included regardles of the referent. While a subset of these references include direct references to an individual who either spoke or wil speak in adjacent turns, the direct addres feature set also includes references to individuals present, but not currently participating in the discourse, and to individuals who are not participating in the discourse at al. While each of these diferent clases of ?individual mention? make a distinct contribution, each contribution made is potentialy useful in modeling the 13 The second to last feature was included to acount for highly variable mentions of justices that were not serving on the Supreme Court during the case. A single feature was used in this final case because of the high variability acros cases of non-justice names. Note however, that the majority of these later namings within a case typicaly refer to the party curently presenting oral arguments or other individuals involved in the case. 28 conversational dynamics of the Court. Because references are typicaly made to someone who recently spoke or wil speak (because they have been addresed), for each turn we include the reference features from the imediately adjacent turns but not the current turn. Approximately 40% of turns contained at least one instance of personal reference. Finaly, as with discourse markers, because unigrams are filtered for stop-words and contain only single tokens there was litle overlap betwen the direct addres features and the unigram features. Figure 3 provides an example of the features extracted from a sequence of turns. 29 Figure 3 Example of features extracted from a transcript segment. Turns from S. D. Warren Co. v. Maine Bd. of Environmental Protection (04-1527) JUSTICE SOUTER: -- "reinforcing," and maybe it's "changing." I mean, you're characterizing it one way. We start with a different canon of meaning, and that is that we look to the words around which, in connection with which, the word is used. In here, it's being used without certain modifiers or descriptive conditions. In other cases, it is being used with them. And that's a good reason to think that probably the word is intended to mean something different in those situations. MR. KAYATTA: Well, I would -- I would hesitate, Justice Souter, to go from taking a specific word, like "discharge," and, therefore, saying that it meant something that is both more general and much more easily set. JUSTICE SOUTER: No, but your argument, I thought, was simply this, that it uses "discharge" in, you know, X number -- I forget how many you had -- and it's perfectly clear that in most of those instances it requires an addition; and, therefore, it should be construed as requiring it here. My point was that in a great many of those instances, the statute is not merely using the word in isolation; it's using it in connection with a couple of other words, like "discharge a pollutant." And it, therefore, number one, makes sense to construe "discharge of a pollutant" differently from "discharge." That's the -- that's the only point. Features Souter 1: Unigrams: cases, word, start, changing, connection, words, modifiers, meaning, reinforcing, reason, situations, intended, characterizing, good, canon, descriptive, conditions Discourse Markers: - Direct Address: you Kayatta 1: Unigrams: meant, discharge, word, set, justice, souter, easily, taking, specific, general, hesitate Discourse Markers: well Direct Address: Justice_Souter, JUSTICE Souter 2: Unigrams: argument, simply, requires, sense, discharge, construe, clear, thought, construed, point, number, great, word, connection, requiring, forget, words, couple, addition, differently, perfectly, statute, instances, isolation, pollutant, makes Discourse Markers: no, but Direct Address: your, you 30 3.2 Experiments For our experiments we utilized four combinations of features: ? Unigrams (Unigrams) ? Unigrams plus Discourse Marker Features (Unigrams + DM) ? Unigrams plus Personal Reference Features (Unigrams + Ref) ? Unigrams plus Discourse Markers plus Personal Reference (Unigrams + DM + REF) With these features we conducted sequence prediction using both first and second order CRFs. Al experiments were evaluated using k-fold cross validation. k-fold cross validation is a common evaluation technique wherein data is segmented into any number (i.e. k) of non-overlapping subsets of instances, or folds, where k is les than or equal to the number of individual instances in the data set. For each subset s i of the k subsets, a model is trained on the other k-1 subsets, and then evaluated using s i as a test set. Finaly, results from each iteration of testing are combined, typicaly through averaging (as in our experiments). We used 10-fold cross validation to evaluate our first-order models and 2- fold cross validation to evaluate our second-order models. 14 Results Results are reported as the F-score for sequence prediction. F-score is the harmonic mean of precision and recal. We used an equaly weighted F-score as the simplest measure of precision and recal. Figure 4 shows the 10-fold cross validation 14 The choice to use 2-fold cros validation for second order models was based on the significantly longer training time for this order of CRF as compared to first order CRFs. 31 results using first order CRFs. We report only those justices who regularly spoke in cases during their time on the bench and no other symbols. 15 Each justice category has been annotated with the relative improvement from Unigrams to the Unigrams + DM + Ref condition. Figure 4 1 st CRF 10-fold Cros-Validation Results. Anotations represent the relative improvement over Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 209) For the Unigrams + DM and Unigrams + Ref conditions we se relative improvement over Unigrams for al justices; however, there is variability across justices as to which of the two provides the greatest relative improvement. The use of both personal reference and discourse markers, in addition to unigrams, provides greater relative improvement than al other conditions for each justice. Figure 5 shows the 2-fold cross validation results for second order CRFs. As with the first order graphs, justice categories have been annotated with relative improvement 15 Thus we do not report section headers, the TIME symbol, the L symbol or Thomas (who spoke to infrequently to model). 32 from the Unigram condition to the Unigram + DM + Ref condition. For al justices but Alito and Rehnquist we se a relative improvement in al conditions as wel as a similar patern across conditions within justices. The decrease in performance for Alito and Rehnquist is to be expected given that these two justices cover the smalest portions of the corpus compared to al other justices who speak regularly. Because of this, sequences with their symbols appear infrequently across the corpus, and so wil either be les evenly distributed throughout cross-validation folds or contain les training data per fold. The overal increase in F-score for al other justices (as compared to Figure 4) in al conditions indicates that increasing speaker history is, as expected, beneficial in modeling justice turn-taking behavior. It would appear that the second-order CRF alows us to capture both complex interactions betwen justices as wel as individual justices? tendency to continue speaking to a lawyer without interuption from other justices. Figure 5 2 nd order CRF 2-fold Cros-Validation Results. Anotations represent the relative improvement over Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 209) 33 Figure 6 contains the overal acuracy for both first and second order CRFs in each condition, where acuracy is simply the proportion of correct predictions to the total number of predictions. Each bar has been annotated with its relative improvement over unigrams for their respective model orders. Eror bars were calculated as the 95% confidence interval as computed by the Clopper-Pearson method for infering exact binomial confidence intervals (Clopper & Pearson, 1934). The confidence intervals indicate that for both first and second order models, the inclusion of discourse markers or personal reference features provides a significant improvement over unigrams alone, though these two conditions are not significantly diferent from each other. However, the inclusion of both feature sets does provide a significant improvement over both of these conditions for both first and second order models. Figure 6 Overal acuracy of first and second order CRFs. Bars are anotated with the relative improvement over Unigram baseline. Error bars are the 95% confidence interval as calculated by the Cloper-Pearson method for infering exact binomial confidence intervals. 34 Discusion Interestingly, these results show that the inclusion of features such as discourse markers and instances of personal reference do add information that help in identifying who was speaking when in a discourse. While the results are considerably lower than the acoustic approach to speaker identification of Yuan and Liberman (2008) it should be noted that while our tasks are related, they are also distinct. Their work focuses on the use of acoustic diferences in individuals? speech and how this can be applied to speaker identification in acousticaly complex environments. In contrast, our work aims to understand the turn-taking paterns of justices in the Supreme Court through the relationship betwen turn content and turn organization, and we use speaker identification as a task to gauge our progres towards this goal. These results provide significant improvement over a unigram baseline model, and we se significant improvement from first order models to second order models. This indicates the existence of high-level paterns in justice turn organization during Supreme Court oral arguments. Though we are looking for positive results with our work, we are also looking for tools to help legal scholars. How then, might these results or this work in general be used as such? The fact that we have identified predictable paterns in turn-taking may be of interest to legal scholars. Though they may have had such an intuition about the Court (perhaps, noting that there is a pecking order amongst the justices, with the chief justice at the top, followed by the other justices organized by seniority), these results make this fact explicit. Additionaly, the work presented here is a novel approach for understanding the Supreme Court. By utilizing these methods, legal scholars wil have new tools for addresing questions about the Supreme Court, and a variety of new questions. 35 Chapter 4 Visualizing Dynamics This chapter addreses the second goal of this thesis, to demonstrate that the paterns indicated in the previous chapter can be asociated with case outcomes. To acomplish this, we explore the relationship betwen turn-taking paterns during oral arguments and case outcomes, via a multi-dimensional charting technique. We created charts for sets of cases belonging to a variety of outcomes and case conditions, and examine the relationship betwen justices? voting record and their turn-taking behavior in these conditions. By comparing these charts we create a picture of the relationship betwen the voting and conversational behavior of justices. In this Chapter, as wel as the next, we deal with justices? ideology. This is often discussed throughout the media, and often held as common knowledge. However, there have been a number of studies quantitatively examining the ideology of justices. For example, Martin-Quinn scores estimate the ?ideal point? (i.e. a point on an atitudinal scale, in this case ideology) for each justice (Martin and Quinn 2002). Martin-Quinn scores are regularly published at http:/mqscores.wustl.edu/measures.php. On the Martin- Quinn scale, negative numbers indicate a liberal ideology while positive numbers indicate a conservative ideology. Table 4 summarizes the mean Martin-Quinn score for the justices for the thre years covered in our selection of cases. 36 Justice Martin-Quinn score Thomas 4.37 Scalia 2.75 Alito 1.63 Roberts 1.6 Kenedy 0.41 Breyer -1.41 Souter -1.51 Ginsburg -1.54 Stevens -2.4 Table 4 Mean Martin-Quin scores for the 205-207 terms. Note, negative scores indicate a liberal ideology and positive scores indicate a conservative ideology. The higher (lower) the number the more conservative (liberal) the ideal point is. 4.1 Methods Corpus description While the source and format of documents for this corpus is the same as that in Chapter 3, we selected a diferent timeframe. For this work, transcripts corresponding to cases from the February 2006 argument sesion (2005 Term) through the April 2008 argument sesion (2007 Term) were collected. This selection of cases represents a ?natural court?, a period of time during which the same 9 justices were in office with no changes in court membership. These justices include Chief Justice Roberts, Justice Stevens, Justice Souter, Justice Ginsburg, Justice Kennedy, Justice Thomas, Justice Alito, Justice Scalia and Justice Breyer. By using a natural court, we avoid potentialy eroneous factors introduced by changes in court membership. Additionaly, it increases our chances of avoiding the case where significantly les data is available for an individual justice due to factors external to that justice?s behavior. While it would have been preferable to use more data, there is no longer natural court after the 2004 term; before then individual justices were not uniquely identified in argument transcripts. Of 37 the 179 cases argued during this period, 11 were held out due to inconsistencies in the database used for labeling each case. 16 Case Segmentation Cases were segmented into sequences of speaker labels. Each sequence was then divided into ?speaker trigrams?. Those familiar with the traditional view of trigrams, wil recognize our interpretation of speaker trigrams. A speaker trigram is S ii+1 S i+2 where S i is the speaker of the i th turn in the sequence (Manning and Sch?tze 1999). Figure 7 contains some example turns from the corpus (truncated for brevity), along with the sequence extracted from these turns and the resulting trigrams. We then obtained the count for each trigram across al cases and for al cases in each one of several conditions from the Spaeth database (e.g. direction of case decision, direction of Alito?s votes, vote split, etc.). 16 Held out cases include: 04-607, 05-204, 05-259, 06-1265, 06-166, 06-618, 06-7517, 07-290, 07-30, 07-77 and 06-134 (New Jersey v. Delaware) 38 Figure 7 Sequence of truncated turns, the sequence extracted from these turns and the resulting trigrams. Labeling description Labels were created using the Spaeth database. We experimented with variables along several dimensions including the direction of individual justices? and the Court?s decision in cases (liberal/conservative) and the Court?s vote split (5-4, 9-0, 8-1, etc). While we discuss only a sampling of charts in this chapter, al charts with greater than 10 cases for each variable value are included in Appendix A. In the Sections that follow we wil cover the Vote Split (VOTE) variable, which contains the distribution of votes for a case, the Direction (DIR) variable, which contains the ideological direction of the case outcome and Justice Direction variables (JDIR) which contain the ideological direction of each justice?s vote in a particular case. From Snyder v. Louisiana (06-10119) CHIEF JUSTICE ROBERTS: Even though -- even though you?re theory? MR. BRIGHT: Oh, no. CHIEF JUSTICE ROBERTS: -- that this jury did not return a? MR. BRIGHT: No. Let me -- let me make this quite? CHIEF JUSTICE ROBERTS: Thank you, Mr. Bright. Mr. Boudreaux? ORAL ARGUMENT OF TERRY M. BOUDREAUX ON BEHALF OF THE RESPONDENT MR. BOUDREAUX: Mr. Chief Justice, and may it please? JUSTICE SCALIA: As to life imprisonment or as to the? MR. BOUDREAUX: As to life imprisonment, Your Honor? JUSTICE SCALIA: Where is this? I -- 364? Show me -- MR. BOUDREAUX: Beginning at 364 of the joint appendix? Extracted Sequence: ROBE L ROBE L ROBE START-ORAL L SCAL L SCAL L Trigrams: ROBE L ROBE, L ROBE L, ROBE L ROBE, L ROBE START-ORAL, ROBE START- ORAL L, START-ORAL L SCAL, L SCAL L, SCAL L SCAL, L SCAL L 39 The Rose Charts Though radial plots have been explored extensively, use of radial plots for the visualization of sequential paterns and asociated variables is a novel application of this layout (Draper et al. 2009). The outer ring of our diagrams (the petals in our terminology) is related to polar plots discussed by Draper et al (2009), while the inner ring is a pie chart. Because these charts are a novel application of radial layouts, we include the following technical description. For an explanation of how to interpret the charts, proced to the Results Section (Section 4.2). For each justice (except Thomas, again because of his infrequency of speaking) we created charts for al trigrams ending with that justice (i.e. al trigrams represented in a chart must end with the same S i+3 , where S i+3 is a justice). By concentrating only on those trigrams that end with the same justice, we can concentrate on turns that can be asociated with ?choice? on the part of that justice (i.e. the choice of that justice to speak after the speakers in the first and second positions in the trigram). We interpret this ?choice? as the choice to interact with or pay atention to previous speakers. However, this is not necesarily the case; for example, these turns may arise if the justice is atempting to change the topic, and thus not paying atention to the previous speakers in the usual sense. Secondly, we chose to concentrate only on ?typical? trigrams; because the vast majority of trigrams are of the form JUSTICE LAYWER JUSTICE or LAWYER JUSTICE LAWYER, al trigrams that did not have a lawyer in the second position were filtered out. The center of each chart contains a pie graph representing the proportion of times the justice in the third position also spoke in the first position (i.e. S i = S i+3 ; ?held the 40 floor? after the lawyer?s turn). 17 Each of the outer petals represents one of the other justices that spoke in the first position (i.e. al other S i ). The width of each outer petal represents the frequency of each turn sequence normalized by the number of times S i spoke, relative to the other petals. Thus, if the justice in the center devotes equal atention to al other justices (e.g. that justice follows-up on the same proportion of the turns produced by each other justice) al petals wil have equal width. Because this looks at the proportion of turns rather than the count, the petals would be of equal width even if the frequencies of the sequences they represent are diferent. Petal radius represents the proportion of time with which two justices voted together, where shorter petals indicate the justices have more similar voting records than justices with longer petals. The inner dotted ring indicates 100% matching votes, and the outer edge of the chart area indicates 100% mismatch. Each object in the chart (petals and the pie graph) are colored on a gradient acording to the proportion of cases in which that justices voted liberaly or conservatively in the given category (i.e. that justices exhibited ideology), where white (blue in color versions) is liberal and gray (red in color versions) is conservative. We use counts of votes rather than Martin-Quinn scores because of the high variability of conditions chosen and because we want to represent the ideology within each condition. Note that because the range varies from condition to condition and because the range can often be quite narow, the gradient is calculated within a condition, thus, a justice?s color may vary from condition to condition. Finaly, each petal is annotated with two values. The percent on the top, which is also in bold, is the width of the petal, while the percent 17 We take the idea of ?holding-the-flor? beyond the typical interpretation of maintaining control of a turn, to al instances where a speaker continues to produce turns after a single interceding turn from another speaker. 41 on the bottom represents the proportion of times that n-gram occurred compared to al other petals. By representing turn-taking information in this way we hope to be able to capture broad paterns of the justices? turn-taking behavior. If we compare charts for diferent values within a condition, paterns may emerge that indicate a relationship betwen the values of that condition and a justice?s behavior. For example, if we compare the turn- taking behavior of a justice when his or her vote is liberal to when the vote is conservative, and we note that a petal for a particular justice is short and narow for liberal votes but long and wide for conservative votes, this could indicate that the justice in question has a greater tendency to follow-up on the particular justice of that petal in conservative cases. Furthermore, when the petal is long and wide, we may hypothesize that many of those follow-ups in some way chalenge the justice of the petal since the length of the petal indicates the level of disagrement in the cases? outcomes. 4.2 Results How to read the charts Some of the paterns we discuss wil be relevant either to wings of the Court or to justices from those wings. In these cases we wil treat Kennedy, the swing justice, as irelevant to these paterns. Additionaly, we wil identify speculative explanations for these paterns with italic text at the end of an observation. Take, for example, Figure 8 ?Stevens ? Rose Diagram of Al Cases?. This chart contains al cases from our dataset. Because this chart is for Stevens, we find a pie chart in the center labeled Stevens, which indicates Stevens tends to ?hold the floor?, i.e. speaks again after an initial turn directed at the lawyer, ~75% of the time (signified by 42 the area filed in for the pie chart). It also shows that his voting record is one of the most liberal for this set of cases at, ~ 31% conservative votes (indicated by the color gradient). Figure 8 Stevens - Rose Diagram of Al Cases As discussed above, the outer petals represent al turn sequences in the dataset of the patern JUSTICE 1 LAWYER JUSTICE 2 (J 1 L J 2 ) where J 1 ! J 2 , and in this case J 2 is Stevens. Thus, the petal labeled Kennedy represents al turn sequences of the form Kennedy Lawyer Stevens. The labels for this petal indicate that Stevens follows Kennedy 17.9% of the time when Stevens is not ?holding the floor? and that the normalized proportion of this sequence is 21.6%. For Scalia, the relationship betwen these values is reversed, with the normalized proportion much lower than the unnormalized proportion. This indicates that while Stevens follows up on Scalia more often than he does Kennedy, 43 he does so on a smaler proportion of the turns produced by Scalia as compared to Kennedy. Finaly, comparing the length of the Kennedy petal to the others, we se that Stevens votes with Kennedy les often than the liberal justices but more often than the conservative justices. Looking at the outer petals we can make a number of generalizations, several of which are covered here in a top down fashion: ? Stevens has a greater tendency to follow-up on Kennedy, Scalia, Alito and Roberts (the justices he least often votes with) as a group than he does Ginsburg, Breyer and Souter (the justices he most often votes with). ? Holding Kennedy out as the swing vote, Stevens?s interaction is much more evenly split betwen the conservative and liberal wings of the Court, with only slightly more follow-ups on justices he agres with les often than ones he does agre with (40% vs. 38.3%). Thus, this indicates a somewhat disproportionate amount of atention given to Kennedy. This may indicate that Stevens more often treats Kennedy as a ?potentially persuadable justice?, spending more time trying to convince him than other justices. ? While the normalized proportion is fairly evenly spread out betwen the conservative justices in this chart, for the liberal justices, atention is skewed towards Ginsburg (18.3% towards Ginsburg vs. 9.9% and 10.1% towards Breyer and Souter). This may indicate regular cooperation betwen Stevens and Ginsburg. ? Of al justices Stevens is most likely to follow-up on Kennedy, at 21.6%, followed by Ginsburg at 18.3%. 44 ? Finaly, Roberts and Scalia both have much higher absolute percents compared to the relative percents, indicating that Stevens is les likely to follow-up on one of their turns despite a larger number of opportunities, indicating a greater proportion of turns go ignored from these justices. ? The absolute percent is much lower than the scaled relative percent for Alito, indicating a stronger tendency for Stevens to follow-up on Alito given the opportunity as compared to other justices, indicating Alito?s turns are les often ignored as compared to Roberts and Scalia. These last two observations together may indicate a tendency to argue with Alito more often than other justices in the conservative wing. Vote Split Condition (VOTE) The VOTE variable in the Spaeth database indicates the distribution of the justices? votes (e.g. 5-4, 8-1, 9-0, etc.). Using this variable, we can test our intuitions about the sorts of paterns the charts wil exhibit because we have wel defined expectations for several features of the graph in this condition. Figure 9, Kennedy ? Rose Diagrams for 5-4 and 9-0 split cases, exhibits several paterns we would expect: ? 9-0 cases have maximal agrement betwen the justices; logicaly, if their decisions were unanimous then their votes always match. ? In 9-0 cases, justices always exhibit the same ideology. Their votes always match, thus their decisions have the same ideological direction. ? In 5-4 cases, Kennedy shows relatively high levels of disagrement with al justices, but slightly more agrement with conservative justices than with liberal 45 justices. We expect this patern given that Kennedy is a slightly conservative swing justice, often casting the deciding vote in narowly decided cases. ? In 5-4 cases, Kennedy exhibits an ideology in the center of the gradient while the other justices exhibit ideologies along the extremes of the gradient. This is what we would expect if Kennedy is the median justice and the other justices typicaly vote along their ideology in narowly decided cases. ? Finaly, in 5-4 cases, the petal width for Alito is very narow, both compared to the other justices in 5-4 cases and compared to Altio?s petal in 9-0 cases. Also, Alito has the shortest petal in 5-4 cases. This may indicate that Kennedy tends to avoid interaction with the justice whose viewpoint is closest to his in narrowly decided cases. Figure 9 Kenedy ? Rose Diagrams for 5-4 and 9-0 split cases This pair of diagrams confirms our intuitions about the agrement and ideology paterns we expect to se when they are logicaly predictable. Additionaly, the last bullet 46 point demonstrates the sorts of paterns that we can find when comparing levels of interaction across values in a condition. Direction Condition (DIR) The DIR variable in the Spaeth database indicates the ideological direction of a case?s outcome. The ideological direction of a decision is determined based on the parties involved in the case and the isue area of the case acording to the rules outlined in the Spaeth database documentation. Ideological direction is either liberal or conservative except in rare circumstances when no appropriate ideological direction can be determined. Below we discuss thre diagram pairs in the DIR condition. In al charts, conservative decisions are on the right and liberal decisions are on the left. Several observations can be made in Figure 10, Alito - Rose Diagrams for the DIR Condition (Alito is a conservative justice): ? When the eventual outcome of the case is conservative, Alito follows up on the liberal wing more frequently than when the outcome is liberal. This suggests a greater level of interaction via the lawyer betwen Alito and the liberal wing of the Court in cases that are eventualy decided conservatively. ? There is les interaction betwen Alito and the conservative justices when the outcome is liberal as opposed to conservative. It should be noted that this is not the logical converse of the previous observation as the presence of a swing justices alows for changes in only one wing across a condition. These two observations may indicate a slight tendency to argue more with justices that Alito disagres with in cases where the outcome is likely to be against Alito?s ideology. 47 ? These charts indicate an increase in interaction with Kennedy when the eventual outcome of the case is liberal. For example, it is reasonable to assume that in any given case, each justice (in this instance, Alito) wil have a fairly accurate expectation regarding the eventual outcome of the case. So, if Alito suspects that the eventual outcome of the case wil be liberal (and especially if the case is likely to be split), Alito is likely to sek the support of Kennedy as a swing vote, which may likely be indicated as a higher degre of interaction. Figure 10 Alito - Rose Diagrams for the DIR Condition. In the DIR condition for Ginsburg (Figure 11), we note the opposite basic paterns to those of Alito (Ginsburg is a liberal justice): ? In conservative cases we se a higher level of interaction with the liberal wing and a lower level of interaction when compared to liberal cases. ? We also se more interaction with Kennedy in conservative cases than liberal cases. 48 However, since Ginsburg and Alito are from opposing wings for the Court, these paterns can be used to form a single generalization. Namely, when the eventual outcome of a case is in opposition to the justice?s general ideology, there is increased interaction with that justice?s own wing, and decreased interaction with the opposing wing, as compared to cases when the outcome is inline with the justice?s ideology. This patern is observed for 5 of the 7 applicable justices (Kennedy excluded for the reason above and Thomas because he rarely speaks). Similarly, when a case?s eventual outcome is against a justice?s ideology, more interaction with the swing justice is observed than when the eventual outcome of the case is inline with the justice?s ideology. Figure 11 Ginsburg - Rose Diagrams for the DIR Condition. In the above cases, Kennedy was treated as irelevant to the paterns under discussion because he is the swing justice. Despite this, we can stil make observations regarding Kennedy?s interaction with the other justices. Figure 12 contains the DIR condition charts for Kennedy. 49 ? Kennedy is more consistent than the previous justices we have discussed, when looking at his interaction with wings of the Court. He has only slightly higher interaction with the liberal justices in liberal cases and conservative justices in conservative cases. We might expect this from a swing justice. ? For each value in the DIR condition, for Kennedy there is a decrease in the proportion of follow-ups to the most liberal justice in that condition. That is, Stevens is the most liberal justice in cases with a conservative outcome while Ginsburg is the most liberal justice when the outcome is liberal; we se that Kennedy interacts with Stevens les when the outcome is conservative (i.e. he is the most liberal justice in conservative cases) and les interaction with Ginsburg when the outcome is liberal (i.e. she is the most liberal justice in liberal cases). This could indicate a reluctance to get involved with the most extreme (liberal) viewpoint during a case. Figure 12 Kenedy - Rose Diagrams for the DIR Condition. 50 Justice Direction (JDIR) Similar to the DIR condition, the JDIR condition has two primary values, liberal (L) and conservative (C); however, unlike DIR there is one JDIR value for each justice. So, ALTODIR (Alito?s Direction) identifies the ideological direction of Alito?s vote in a particular case. Note that no variable named JDIR appears in the Spaeth database, which instead contains one variable for each justice. We are simply using the name JDIR as shorthand for these variables. While other comparisons are possible, below we concentrate on charts comparing justices within their own JDIR condition. That is, for Alito we only present ALTODIR, for Breyer we only present BRYDIR, etc. Figure 13 presents the two values for Alito in the ALTODIR condition. Note that because this is the ALTODIR condition, we expect that Alito wil be on the extreme end of the ideology gradient in this case group (logicaly, if the value is conservative in the ALTODIR condition, 100% of the votes from Alito for that value wil be conservative). We note several features in Figure 13 that may be interesting: ? First, when Alito?s vote is liberal, there is a high level of agrement amongst the justices signified by the relatively tight radius of the outer petals. This indicates that Alito typicaly votes liberaly only when most of the Court does so. ? When Alito?s vote is liberal, we se a decrease in turns following the conservative justices and a slight increase in vote disagrement betwen these justices as compared to when Alito?s vote is conservative. This may indicate Alito has a tendency to follow-up more often with people who he agres with. ? For individual justices, we se some diferences in the liberal wing. Though there is litle change for Ginsburg and Stevens, we se notable changes in the relative 51 frequency when following Breyer (a decrease from the conservative to liberal) and Souter (an increase from conservative to liberal). ? We also note that the relative frequency of follow-ups on Kennedy shows a considerable increase from conservative to liberal. Since Alito?s record is more moderate than the rest of the conservative wing, this could suggest that Alito has more to discuss with the swing justice in particular when their interpretation of a case most closely aligned. Figure 13 Alito - Rose Diagrams for the ALTODIR Condition. Figure 14 contains the charts for Souter in the SOUTDIR condition. As in the DIR condition, it wil be helpful here to look at things in terms of whether or not the vote matches the center justice?s usual ideological direction, and whether other justices are from the same wing or the opposing wing (Souter is a liberal justice). ? Compared to Alito voting against his usual direction, we se a higher level of disagrement when Souter is voting against his direction. This indicates Souter?s 52 conservative votes may be les closely related with conservative outcomes from the Court. ? As before, we se a slight increase in the normalized proportion of turns folowing justices from the same wing as the justice in the center when the case is against his typical direction (i.e. conservative). ? We also se a slight increase in the number of turns directed at the opposing wing when the outcome is against his usual direction. ? There is a decrease from conservative to liberal for turns following Ginsburg but an increase for turns following Stevens. We also se a fairly large decrease from conservative to liberal for Roberts and fairly smal increases for Alito and Scalia. These variations for individual justices likely suggest much more complex relationships betwen these justices. ? Finaly, we also se a relatively smal increase from C to L for Kennedy, indicating relatively even amounts of atention given to Kennedy for both outcomes. Perhaps this indicates that Souter doesn?t use increased attention as a means of convincing another justice. 53 Figure 14 Souter - Rose Diagrams for the SOUTDIR Condition. Unlike the two examples above, Kennedy?s chart is fairly consistent with respect to the normalized proportions for each wing; however, we do stil se smal but potentialy interesting diferences betwen the two charts. ? When Kennedy?s eventual vote is liberal, there is a slightly higher relative frequency of turns following liberal justices as compared to when his vote is conservative (the converse being true for conservative justices). This suggests that Kennedy devotes slightly more attention to whichever wing he is likely to agre with. ? It is also worth noting that for the conservative justices this diference primarily comes from a diference in the relative frequency of turns following Roberts, while for the liberal justices the diference is primarily distributed across Ginsburg, Breyer and Souter, with Stevens showing only a minimal change. 54 Figure 15 Kenedy - Rose Diagrams for the KENDIR Condition. 4.3 Discusion The charts and observations above are a sampling of the sorts of general conversational paterns that can be observed for individual justices and the Court given outcome conditions that are of interest to legal scholars. For example, we saw a tendency of some justices from both wings to exhibit similar paterns to their respective opposing wings both in DIR and JDIR conditions. This suggests that there are paterns of turn- taking that can be asociated with case outcomes, positively addresing the second point of this thesis. Though we have only ofered speculative explanations for these paterns, legal scholars should find that this sort of analysis could aid in the confirmation or discovery of paterns in the interactions of Supreme Court justices. Here we concentrated only on a particular subset of justices, outcome variables, and turn-taking paterns. While the appendix contains al justices for the conditions discussed above and several more outcome variables, there is no reason that these charts need to be limited to these 55 conditions. For example, it may be interesting to compare cases where a justice wrote a disenting opinion compared to cases in which that justice did not. Or, one may wish to look at how paterns vary for certain case variables such as the lower court?s direction, or combinations of variables such as unanimous conservative decisions. The rose diagrams are also a novel application of radial layouts that can be used as a new tool for legal researchers when exploring the behavior of the Supreme Court. This approach is not limited to this particular patern (i.e. J 1 L J 2 , where J 2 is held constant in the chart) of interaction either. There are numerous avenues for future research. For example, L could be broken down into petitioner and respondent or conservative party and liberal party. 18 If we are not particularly concerned with ?choice? we may want to look at paterns that share a common J 1 or simply paterns that share a common justice in any position. The primary limiting factor in this sort of analysis is ensuring that one has enough cases for a good sampling of paterns. This was the primary reason we used a patern that includes an additional individual betwen the two justices. Shorter paterns that include two justices are fairly rare, and longer paterns are sparser. However, with a careful selection of cases and relaxation of conditions one may stil find that some paterns of this form can be examined as wel. 18 Where ?conservative party? would indicate that a decision in favor of this party is a conservative decision, and vice versa for ?liberal party?. 56 Chapter 5 Vote Prediction This Section describes our final set of experiments which build upon the insights revealed by the rose diagrams in the previous Chapter, examining vote prediction using turn sequences. If we can use turn-taking to forecast case outcomes, we wil have demonstrated the validity of the third main point of this thesis; that the asociation betwen turn-taking paterns and case outcomes is predictive. Before discussing the approach, experiments and results, we wil first briefly discuss our findings regarding the ?most questions asked? method discussed in Chapter 2. 5.1 Prior approaches We wil first discuss our atempts to replicate results for the ?most questions asked? rule discussed by Roberts, Shullman and Wrightsman (Wrightsman 2008), as wel as Johnson et al. (2009a). While these projects leave the term ?question? undefined, two reasonable interpretations exist. We could take question literaly as any interogative statement, which in the transcripts are usualy identified with a question mark at the end. This sidesteps some of the isues discussed in Wrightsman, as transcription typicaly includes only one question mark per complete question, with no markings at the end of interupted questions. However, we can also broadly define ?question? as al statements produced by a justice. Though not the typical interpretation of what a question is, this sems to met the typical treatment of turns produced by justices both as indicated in transcripts prior to 2004, which label the majority of Justice turns as ?QUESTION?, as 57 wel as Wrightsman?s example statements and Johnson?s discussion of ?atention given to a side?. We explore both here. Lacking the training data and some of the features used by Johnson et al. (2009a) we wil use a simple rule based approach. We simply identify al questions in a case, take separate counts for each side and asign a ?win? label to whichever side was asked the most questions. Following the lead of Johnston et al. (2009a), we can also apply both approaches to diference in questions asked and to diference in words directed at each side. By using a word based approach we again reduce the concerns about the definition of a ?question?. However, this does introduce other isues, such as the definition of a word (e.g. compounds, counting speech erors, contractions, etc.). To simplify maters, we take a word as anything separated by white spaces and word external punctuation (where characters such as apostrophe (?) and hyphen (-) are word internal punctuation). Table 5 summarizes the results from these experiments. Aproach Acuracy Most Questions Asked (by turn) 56.8% ost Questions Asked (by ?s) 56.8% Most Words Used (by turn) 51.5% ost ords Used (by ?s) 53.8% Table 5 Comparison of ?most atention given? aproaches with varying interpretation of ?question?. ?By turn? indicates that we count each turn as a ?question?. ?By ?s? indicates we counted ?s in the transcribed justices? spech, usualy indicating an interogative statement. As is clear, with this particular set of cases, no benefit is gained from a ?most atention given? approach. As with most time periods, the majority of cases were reversed in this time period, creating a 65.6% most frequent outcome baseline which these approaches fail to met. While interpreting ?questions? as interogatives 58 outperforms a turn based interpretation of questions on a ?most words used? approach, no diference was found for the ?most questions asked? approach. Moreover, the ?most questions asked? approaches outperformed both ?most words used? approaches. Stil, one could argue that the continued discrepancy over the power of a ?most questions asked? rule is a problem of sample size. In the case of the smaler manual studies, high acuracy may simply be atributed to a favorable sample selection. For the larger study, the distribution of questions compared to case outcome provided by Johnston et al. (2009a) is unambiguous, and clearly demonstrates that at least in the extreme cases this rule does appear to be valid. Models trained on a larger sample wil have a more representative distribution of these extreme cases. In fact, like Johnson et al. (2009a), if we asign labels based on the ?most atention given? rule for extreme cases and use the majority clas for the rest we do get similar acuracy. Results provided in Table 6 are for cases in which the diference in number of questions or words addresed to a side is more than 2 standard deviations from the mean. Aproach Cases Acuracy Most Questions Asked (by turn) 8 87.5% ost Questions Asked (by ?s) 7 75.0% Most Words Used (by turn) 6 83.3% ost ords Used (by ?s) 6 60.0% Table 6 Comparison of ?most atention given? rule for extreme cases (i.e. diference in words or questions is > 2 s.d. from the mean). The ?Cases? column indicates how many cases met this criterion. Because ?extreme cases? are simply those that have diferences in atention (measured by word or turn counts) given to a side more than two standard deviations. It may be possible to identify these cases in advance by examining the distribution of prior 59 cases and determining whether or not the diference in atention given for each new case is within or outside two standard deviations for the distribution of previous cases. 5.2 Forecasting votes In our discussion of forecasting oral argument transcripts atention must be given to both the sorts of features used and the outcomes that we are forecasting. We focus on using features that are easily extracted automaticaly, with litle to no human input. Instead of concentrating on the content of the oral arguments, we concentrate on the conversational dynamics of the justices and lawyers involved in a case, as a function of their turn-taking behavior. While the content of justices? and lawyers? turns is very likely informative about a case?s outcome, several factors make it dificult to utilize content with automatic methods. First, because the transcripts are composed mostly of spontaneous conversation, performance of existing natural language procesing techniques such as parsing and even POS tagging is considerably lower than in tasks where the input is writen text or even prepared speeches. Second, while features explored in some manual forecasting approaches such as ?hostility? and ?sympathy? are certainly present in the content, these features are also not wel defined and not easily identified using computational methods. Those features that are somewhat more easily identified, such as topic area, vary widely from case to case. This makes it dificult to find a relationship betwen these easily identified features and the cases outcome. Finaly, as we have shown above, because simple turn based ?most questions asked? or ?most words used? are limited to extreme cases, their recal (in this instance the proportion of correct predictions to the number of cases) wil be low despite high 60 precision (in this instance, the proportion of corect predictions to the total number of predictions). One important consideration when predicting case outcomes is deciding just what outcome one wants to predict. The most obvious choice, and the one most often chosen in previous prediction tasks, is whether a case wil be afirmed or reversed. There are, however, other potentialy relevant options to choose from. For example, justices are very rarely spoken of in terms of their tendency to afirm cases. Typicaly, when examining justice?s voting records, one wants to speak of justices in terms of the direction of their ideology; either liberal or conservative. While the vast majority of cases are either afirmed or reversed, typicaly each of these decisions is liberal or conservative as wel. If the most relevant dimension for discussing justices is the direction of their ideology, then it sems fair to at least consider prediction of case outcomes along this dimension as wel. For these reasons, conservative vs. liberal was the primary outcome feature we concentrated on. However, as one would expect, conservative and liberal outcomes do not occur with equal probability, and so the baseline for such a condition is not 50%. However, we can achieve a 50% baseline by spliting cases and then viewing outcomes as a win or lose variable for each side of the case. We explore this outcome in our third experiment. 5.3 Methods Corpus Description We use the same corpus as used for the rose charts, described in Section 3.1. 61 Turn Distribution As with the sequence prediction task in Chapter 3, from each case we extracted speaker IDs and meta-symbols from the transcript. As before, litigants were reduced to a single symbol (reported here as L). To conserve space when reporting tables, justices are identified by the first four leters of the justice?s last name (Table 7). From each sequence we then counted al turn 4-grams. Since the objective of this experiment is to leverage justice interaction as a means for predicting case outcomes, we don?t want the n- grams to be too short. If the n-grams selected are too smal we risk losing information about the interaction betwen justices (as the typical sequence of speakers is Justice, L, Justice, L,?.). If the n-grams are too long, however, we begin to face sparsenes problems, since the larger n gets the more variability there is and thus the lower the counts wil be. Thus 4-grams semed to be the ideal selection. Speaker Symbol Count Non-justice party L 19840 Chief Justice Roberts ROBE 3890 Justice Stevens STEV 1964 Justice Scalia SCAL 4277 Justice Kennedy KEN 2196 Justice Souter SOUT 2590 Justice Thomas THOM 3 Justice Ginsburg GINS 2379 Justice Breyer BREY 2668 Justice Alito ALIT 840 Table 7 Speakers and their coresponding symbols. The count column identifies the frequency with which each symbol apears in the corpus. 62 There are 41,417 occurrences of 1,072 unique n-grams. Table 8 summarizes the 20 most frequent 4-grams in the corpus. Because justices do not frequently speak in adjacent turns, after each justice?s turn there is typicaly a lawyer?s turn. Because of this, n-grams usualy occur in corresponding pairs that have in common a Justice Lawyer Justice trigram, but difer in whether the four-gram starts or ends with a lawyer. We, therefore, report these pairs together. However, note that they do not always rank next to each other, and so the Table is ordered by the rank of the most frequent 4-gram in the pair. Corresponding n-grams Counts Ranks L SCAL L SCAL / SCAL L SCAL L 2467 / 2456 1 / 2 L ROBE L ROBE / ROBE L ROBE L 1801 / 1651 3 / 8 L BREY L BREY / BREY L BREY L 1746 / 1726 4 / 6 L SOUT L SOUT / SOUT L SOUT L 1729 / 1705 5 / 7 STEV L STEV L / L STEV L STEV 1237 / 1220 9 / 10 KENN L KENN L / L KENN L KENN 1182 / 1158 11 / 12 GINS L GINS L / L GINS L GINS 1137 / 1122 13 / 14 L SCAL L ROBE / SCAL L ROBE L 418 / 337 15 / 18 ALIT L ALIT L / L ALIT L ALIT 397 / 387 16 / 17 L ROBE L SCAL / ROBE L SCAL L 331 / 328 19 / 20 Table 8 20 most frequent n-grams grouped by corespondence pair, ranked by most frequent n-gram in pair Note that the majority of these 4-grams include justices ?holding-the-floor? with the only two instances of more than one justice in the bottom of the table. Despite the fact that the most common 4-grams follow this patern, many les frequent n-grams represent thre or four instances of a justice speaking (Table 9). 63 n-gram count BREY BREY L BREY 18 SCAL BREY L BREY 18 SCAL L SCAL SOUT 16 SCAL L SCAL SCAL 16 SOUT L SCAL GINS 15 BREY SCAL BREY SCAL 5 ROBE SCAL ROBE SCAL 3 KENN GINS ALIT GINS 1 Table 9 Infrequent n-grams containing 3-4 instances of justice turns. Note, because the conversational paterns of the Supreme Court are usualy very consistent, rare paterns like those in Table 9 often indicate uniquely transcribed events; the majority of instances where the same justice has two adjacent turns in the transcript indicate laughter in the Court. When two justices? turns are adjacent to one another this usualy indicates an interuption has occurred. Figure 16 contains examples of both laughter and interuptions from the corpus. In the first excerpt, there is laughter after Breyer?s first turn, after which he continues to speak. 19 Thus the sequence is transcribed as BREY BREY L BREY. Also note, Mr. Sorrel?s turn ends with a ?--" indicating that his turn was unfinished. We interpret this as an interuption. However, because Mr. Sorrel is the atorney in this instance, we do not observe anything unusual in the sequence for this pair. In the second excerpt the transcript indicates that Roberts was interupted by Scalia, after which Roberts atempts to ?hold-the-floor? by interupting Scalia, but eventualy gives way to a second interuption by Scalia. This sequence is then transcribed as ROBE SCAL ROBE SCAL. 19 It is unclear from the transcripts whether this laughter should be atributed to Justice Breyer or someone else. 64 Figure 16 Examples of ?Laughter? and interuptions in the transcript Data Preparation Before proceding with any sort of clasification, several preprocesing steps were taken in some experiments in order to addres sparsenes isues as wel as remove irelevant and potentialy distracting features: ? Al non-justice parties are reduced into a single symbol. Since these are most often atorneys, we reduced them to the L symbol. This step was taken for al experiments. ? Eliminate al turns not ending with a justice. This esentialy reduced the presence of feature pairs of the type discussed above. ? Remove al n-grams containing markup, including TIME, as wel as the special symbol for the beginning and end of a case. Randall v. Sorrell (04-1528) JUSTICE BREYER: No, no. It's $200. Coffee and donuts are expensive. (Laughter.) JUSTICE BREYER: Okay? Count it or not? MR. SORRELL: We don't -- our coffee is not that expensive, but -- JUSTICE BREYER: Donuts and coffee. In other words, it counts as long as it's over $100. Samson v. California (04-9728) CHIEF JUSTICE ROBERTS: What about -- JUSTICE SCALIA: Is -- CHIEF JUSTICE ROBERTS: What about -- JUSTICE SCALIA: Is that right? I mean, even in prison, I -- what -- I'm not sure you could even do that if they were still in prison. Can you subject people in prison -- 65 ? Collapse al justices into one of thre categories; liberal (occupied by Stevens, Souter, Ginsburg, and Breyer), conservative (occupied by Roberts, Scalia, Thomas, and Alito) and swing (occupied by Kennedy). While not taken in al experiments, as it semingly disregards quite a bit of information, this final step deserves some more atention. The motivation behind such an approach is that it greatly reduces sparsenes in the data. Not only is the liberal/conservative ideology one that is more or les common knowledge, often observed both in scholarly literature and in the media, but it is also clearly indicated in each justice?s voting records. Moreover, ideology is often considered one of the more relevant dimensions over which a case is decided, so it is extremely relevant to predicting case outcomes. Even when the outcome to be predicted is afirm/reverse or agre/disagre, the interaction of the liberal justices and conservative justices with the swing justice can be informative in predicting case outcomes. However, rather than capturing the interaction betwen individual justices, this is more acurately described as capturing the interaction betwen wings of the Court. Given the rose charts, we may hypothesize that this interaction betwen the wings is also a relevant point to look, as paterns were observed in the way that members of each wing treated opposing wings. That is, paterns at the ?wing level? should be relevant. In addition to these data preparation options, we also calculated feature values in two ways. The first, and most straightforward, was to simply use the absolute counts of each n-gram. For the second approach we used relative feature scores. For each n-gram we divided its frequency by the count of al n-grams for that case. The denominator included al n-grams; i.e. even those that were removed from the feature set using the 66 filters described above. While the feature values do not sum to one this means we wil be able to indirectly encode potentialy useful information such as case length. Baselines In most studies predicting Supreme Court outcomes, litle atention is given to baselines. Understandably, at first blush, when trying to predict an outcome like afirm or reverse a 50/50 baseline sems applicable. There are only two outcomes in general (others are possible, but rare) and both sem to occur with a fair amount of regularity. However, when examining the history of the Court, one finds strong tendencies for certain outcomes to occur more often than others. Nedles to say, the Supreme Court is not as simple as a fair coin toss. So, we need to consider the frequency with which each outcome occurs in each condition in order to establish more reliable random baseline. For an afirm/reverse condition we look back at the frequency with which the Court upheld the lower court?s decision and the frequency with which the lower court was overturned. In doing so we find that the Court has a tendency to reverse cases more frequently than it afirms cases. Taking a sample of 1000 cases from the 1997 term to the 2007 term, the Court afirmed cases 34.4% of the time and reversed 65.6% of the time. Over shorter periods this tendency can shift drasticaly; for example, if we look at a 20 case ?moving average? of afirm decision chronologicaly over this time period (based on date of argument) we se that the average reaches as high as 100% and as low as 35%. Thus, a random baseline for this example is not 50/50. At first this may sem surprising; however, one must consider how cases are selected. Of the approximately 9000 cases submited to the Court each year, only 80 or so are selected to be heard by the Court. Naturaly, then, the justices are picking those cases 67 which they view as most important, and as it turns out there is a slight bias for those cases which the Court wil overturn. For a liberal/conservative baseline, the Court is a bit more balanced, at 54.2% conservative and 45.8% liberal for the Roberts court (with Alito). This likely has more to do with the composition of the Court than anything else. In fact, one might expect to se a court with a conservative chief justice and a slightly conservative leaning swing vote with a greater proportion of conservatively decided cases. Despite these unbalanced baselines, it is possible to construct experiments that do have true 50/50 baselines. The experiment labeled The Court I is an example of this. By spliting the case into sides, (i.e. al turns during petitioner?s argument is one side, al turns during respondent?s argument is another) and seting the outcome to win/lose we ensure that there are an equal number of win instances in the data as there are lose instances (as for each case one side must win and the other must lose; again, except in rare circumstances). 5.4 Experiments We discuss four experiments in this Section, thre dealing with clasification of the Court as a whole (The Court I, The Court I and The Court II) and one dealing with the clasification of Thomas?s votes (Thomas). The Court I: The first experiment conducted in this category atempted to predict whether the Court?s ruling would be liberal or conservative. We found that for this sort of task, predicting the outcome of a case for the Court, clasification was highly sensitive to sparsenes, so we collapsed justices into Liberal, Conservative and Swing categories. We also employed the filter that reduces the presence of pairs. We use absolute rather than 68 relative feature values. Clasification was conducted using the LIBSVM 2.86 implementation of support vector machines (SVM) with default parameter setings 5-fold cross validation and parameter tuning (Cortes and Vapnik 1995). 20 The Court I: As a second experiment we tested the ?in favor of side? condition. While somewhat more artificial than other experiments, this approach does alow us to examine these features in a truly balanced context. We prepared the data by spliting each sequence by side, so each case was composed of two sequences; turns produced during the petitioner?s arguments and turns produced by the respondent?s arguments. Because the Court has a relatively high afirm baseline (meaning the Court usualy votes in favor of the petitioner) we removed al information about the side that was being spoken to from the feature set which are introduced in the form of meta-symbols. By spliting the data, we also magnify the sparsenes problems from before, and so we continue to collapse justices into their ideologies. However, also because of the high level of sparsenes, we did not remove n-gram pairs, as doing so often reduced the features in any given case too far. This experiment used relative rather than absolute feature values. Again, note that since in each case one party must win while the other loses, this ensures that there are an equal number of winners and losers in the dataset. Again we used the LIBSVM implementation of SVMs with default parameter setings and 5-fold cross validation with parameter tuning. Unlike the liberal/conservative clasification, the choice to collapse justices into liberal, conservative and swing categories for this condition might at first sem like an irelevant dimension on which to reduce sparsenes. However, there are some important points to keep in mind. While the Court for this corpus was balanced with liberal and 20 htp:/ww.csie.ntu.edu.tw/~cjlin/libsvm/ 69 conservative justices (4 of each), as a result of Thomas?s general silence, the number of speakers from each wing is unbalanced. Moreover, looking at the wings rather than individual justices, it may be the case that we are able to capture instances of the ?thre- way? conversation described by David Frederick?s where the justices are conversing both with each other and with a particular lawyer (Biscupic 2006, Johnson et al. 2009a). To se why this may mater, consider the rose diagrams discussed in Chapter 4. Although we remove identity information of justices by collapsing the data, we are able to maintain the general efects that have to do with wings of the Court, and since Kennedy is the only swing justice, no identity information is lost for this justice. As a result, we may se cases where either Kennedy is showing high levels of agrement with a particular wing, or where the wings are jostling for support from Kennedy. 21 In either situation, this may be an important factor as the swing vote wil often be the deciding factor in a case. The Court II: In addition to SVM approaches, in these conditions we also atempted some rule-based clasification conditions. This alows us to identify n-grams that are most informative in clasification, thus giving us a way to search for those exchanges betwen justices that may be particularly helpful in identifying the outcome of a case. This experiment used the WEKA 3.6.0 J48 implementation of decision tres. 22 We found that our original data preparation options did not perform wel with decision tres, however, after experimenting with other data preparation options we found that by only collapsing justices into their ideology some improvement over baseline was achieved. 21 In order to test whether we were simply predicting Kenedy?s votes in this situation we tested clasification for his votes, for or against a particular side of a case, with the same setings. The clasifier achieved 58.3% acuracy which sugests this was not the case. 22 htp:/ww.cs.waikato.ac.nz/ml/weka/ 70 Thomas: Thomas?s voting history indicates a relatively high baseline at 69.5% conservative votes. This, of course, is unsurprising given that Thomas is often considered one of the most conservative justices currently on the Court. What is surprising is that despite this relatively high baseline and his tendency to almost never speak during oral arguments, we are able to use the approach described above in order to gain insight as to when Thomas wil cast one of his relatively rare liberal votes. For the experiments with Thomas we found that by not reducing justice IDs to their liberal/conservative clasifications and by using only those n-grams with more than one justice we did se a reasonable improvement in Thomas?s clasification acuracy. We used relative rather than absolute feature values. Clasification was conducted using the WEKA 3.6.0 implementation of Decision Tables (Kohavi 1995). Results Figure 17 Clasification results including prior aproaches (Court I only), baseline, and absolute acuracy. Error bars are the 90% confidence interval as calculated by the Cloper-Pearson method for infering exact binomial confidence intervals. 71 The results of the experiments are detailed in Figure 17. Eror bars are calculated as the 90% confidence interval as computed by the Clopper-Pearson method for infering exact binomial confidence intervals (Clopper & Pearson, 1934). We compare our results to prior approaches for The Court I, and the baselines described above for al experiments. In al cases, our approach outperforms both prior approaches and the baseline. However, as indicated by the eror bars, confidence intervals overlap in several instances. Both The Court I and The Court I outperform the baseline at a 90% confidence level. We also se that The Court I outperforms the ?most words used? approach on this dataset. This is an important finding because the ?most words used? approach was found to be the most powerful approach in prior studies (Johnson et al. 2009a). Moreover, we se that these results are comparable to experiments that used an order of magnitude more data (Johnson et al. 2009a). For al experiments on the Court, we found that collapsing justices was a very useful preprocesing step. The greatest increase in acuracy was provided by SVMs, regardles of the condition. And of the two experiments that used SVMs the greatest increase was over the split-case baseline of 50%. While decision tres do not provide the double digit increases that SVMs do, they stil provide some improvement over baseline with the added benefit of providing decision tres that can be examined. The results for Thomas are perhaps the most surprising. Though the improvement is relatively smal, not only are we dealing with a much higher baseline, but this suggests that the interaction of the justices who do talk during cases is correlated with the way Thomas wil vote even though he rarely participates in oral arguments. 72 Because the decision tables are easily interpretable, we can also examine the specific n-grams that are most informative in clasification. We are especialy interested in n-grams that contain more than one justice, because these best highlight the interactions betwen individual justices. Decision tables returned four such 4-grams that contained more than one justice. Figure 18 contains these sequences along with examples of these sequences from the corpus. 73 Figure 18 Informative sequences from Thomas decision tres with examples from transcripts. BREY BREY L GINS Ex. From Michael A. Watson v. United States (06-571) JUSTICE BREYER: I don't want to put you in a whipsaw here. (Laughter.) JUSTICE BREYER: Sometimes policy seems relevant, too, to figure out what Congress wanted. But let me go back to the question I had, which is do you want to us overturn Smith?, Are you asking that, because I could understand it more easily if you said, look, both sides of the transaction should be treated alike, but they should be both outside the word "use." MR. KOCH: I do not believe it's necessary for this Court to overrule Smith in order to rule for the Petitioner here, because of -- because of the differences, first of all linguistically; and secondly because of the reliance on Bailey. JUSTICE GINSBURG: And in answer to my question, you said you were not urging the overruling of Smith? SOUT SCAL L SCAL Ex. From Federal Election Comm'n v. Wisconsin Right to Life, Inc. (06-969) JUSTICE SOUTER: And it is impossible to know what the words mean without knowing the context in which they are spoken. JUSTICE SCALIA: When the Government put these exhibits, were those exhibits complete with context? MR. BOPP: No. There was no -- JUSTICE SCALIA: I didn't think so. They just -- they just -- what the ads were. SCAL L SCAL GINS Ex. From Engquist v. Oregon Dept. of Agriculture (07-474) JUSTICE SCALIA: That's certainly an equal protection. She could be fired at will and everybody else can be fired at will. MS. METCALF: Agreed. JUSTICE SCALIA: Why isn't that equal protection of the law? JUSTICE GINSBURG: Except this wasn't -- this wasn't employment at will, right? BREY ROBE L GINS Ex. From Travelers Casualty & Surety Co. of America v. Pacific Gas & Elec. Co. (05-1429) JUSTICE BREYER: And, and yet there are no briefs from them; there are no -- there is no article that I could find in Bankruptcy Journal. CHIEF JUSTICE ROBERTS: Well, there may be no briefs from them because it isn't the question on which we granted cert, is it? MR. BRUNSTAD: Chief Justice Roberts, that's Official correct. And our view is that the Court should deal only with the Fobian rule. And the alternative argument which Respondent presents was never argued below, was not decided below, was not presented in the opposition to certiorari. It's been rejected by every single court of appeals -- JUSTICE GINSBURG: But it would be proper to remand for the Ninth Circuit to consider those other arguments? 74 Since the baselines for individual justices are so high, any improvement in clasification acuracy is going to come from the ability to predict unusual behavior from that justice. This is just what we found in the case of Thomas. One can already predict the majority of Thomas?s votes simply by asuming his vote wil be conservative. In order to move beyond this simple baseline, one needs to be able to predict liberal cases. By predicting these with high precision, we are able to boost performance when predicting outcomes for Thomas. Though such results may be subject to the danger of over-fiting, as additional cases are being created, it wil be possible to test this approach further. Of course, as justices change so too wil the performance this approach. Discusion These clasification experiments built upon the observations in Chapter 4 that turn-sequences are asociated with case outcomes. These results indicate that there are paterns in justices? turn-taking behavior that are in fact predictive of case outcomes. Additionaly, we show improvement on our dataset over approaches previously shown to have the best performance the most comprehensive prior study. Moreover, the acuracy is comparable to that of studies that used an order of magnitude more data than our study, while exploring a novel hypothesis about the predictability of Supreme Court outcomes and the features of the case that are used make predictions (Johnson et al. 2009a). The fact that any benefit at al is achieved using interaction features as simple as turn-taking is a novel finding that may surprise some researchers (Evans, M. personal correspondence, August 28, 2009). Questions stil remain as to why the features used are important. Without a doubt the content of justices? turns are informative with regard to a case?s outcome, but what about the conversational nature of the exchanges represented by 75 our features? Future research might ask what characteristics of these exchanges are informative. Perhaps it is general features, such as the tone of the exchange, or perhaps these n-grams isolate strategic exchanges where judges in opposition to one another are looking to counter other justices? arguments and judges in agrement to one another are providing support. Interestingly, this approach has the potential to predict both the behavior of the Court as wel as individual justices. This is an important finding as it suggests that these approaches may not need to be restricted to natural courts. This work represents a methodologicaly novel approach, thus creating a new tool for researchers looking to gain a greater understanding of the Supreme Court and the justices. As discussed below, as more data is created (thus reducing sparsenes) numerous extensions to this approach present themselves, suggesting the possibility of richer more powerful models of justice interaction and court behavior. 76 Chapter 6 Conclusions This work represents the first steps towards modeling the relationship betwen Supreme Court justices? interactions and actions. We have novely applied computational methods for patern discovery in Supreme Court discourse which may more generaly be applied in legal discourse. While legal scholars and other court folowers may have intuitions about the social dynamics of the Court, these intuitions are most often limited to a few areas of expertise and a narow range of examples. What this work offers is a global approach to patern discovery in the social dynamics of the Supreme Court justices. With these paterns, legal scholars are given a new avenue for research that can lead to a greater understanding of this country?s highest court that would otherwise go unexplored. This work addresed thre objectives: to show that a) predictable high level paterns exist in the conversational dynamics of the Supreme Court, b) these paterns may be asociated with other areas of interest to legal scholars such as voting paterns of the justices, c) this asociation betwen linguistic paterns and judicial paterns may be utilized both to provide short term insights (i.e. predicting the outcome of a particular case) and deeper insights about the behavior of the Supreme Court. Our results indicate that a, b and c do hold. We have found that by combining features with regard to turn content, discourse marker use, and personal reference we can gain information about who is speaking when and that by increasing the history of these features we can further boost the reliability of these methods. The rose charts demonstrate that interesting paterns can 77 be observed when we are looking at summaries of the turn-taking behavior for various conditions. Our prediction approach performed significantly beter than prior approaches on the same data and comparably to approaches utilizing an order of magnitude more data (Johnson et al. 2009a). These results indicate that turn-taking paterns are in fact predictive of case outcomes. In addition to the contribution of positive results, we have also made a number of methodological contributions as wel. While the analysis of Supreme Court discourse is not new, our approach of viewing the paterns of Supreme Court turn-taking as both predictable and predictive of case outcomes is a novel one, and we have offered several techniques to explore this hypothesis. We addresed only a narow range of questions with these techniques, but expect that legal scholars wil find a wide aray of hypotheses to explore. Additionaly, our rose diagrams are a new application of radial plots that are helpful in visualizing the relationship betwen turn-taking sequences and actions (Draper 2009). 6.1 Future work and Unanswered Questions Unfortunately, sparsenes is a major limiting factor in combining content with turn sequences for the Supreme Court. However, as data is continualy being created, these problems should be continualy reduced. Moreover, though not explicitly identified in the transcripts prior to 2004, the identity of individual justices is not lost, as the audio transcripts of these cases stil exist. Perhaps by combining audio speaker recognition techniques with our justice identification approach, one could reconstruct speaker identities for these earlier cases (Yuan and Liberman 2008). Doing so would provide considerably more data for experimentation. If sparsenes isues are appropriately 78 addresed one could incrementaly increase the amount of information used in turn sequences. For example, with limited additional work, one could include further turn features such as interuptions, perceived humor (indicated in transcripts with a ?laughter? marker), and question vs. statement. As indicated above in Section 5.3, while not overtly marked, these first two features stil managed to find their way into our dataset as discussed above and were some of the most informative features in clasifying Thomas. While overtly marking these features increases sparsenes too far, adding more data reduces this problem aking the overt marking of these features viable; and given the results above one would expect them to be helpful. As other researchers have found, the questioning patern is likely indicative of case outcomes, at least in extreme cases. Thus, one might expect some benefit from incorporating questioning features in the turn sequence. Moreover, in many cases the existence of interuptions and laughter is indicative of higher level features of a turn, such as hostility and tone of questioning. Though the reliability of identification of these features is currently untested, work in areas such as sentiment detection may be useful in atempting to identify these features (Pang and Le 2008). If succesful, these too could be included in the turn sequence and would likely give further insight into the interaction of the justices. Another strong cue to the interaction of justices would be the discourse relations that hold betwen justices? turns. Again, while incorporating features for discourse relations in the turn sequence would inherently increase sparsenes, if and when sparsenes is addresed, including discourse markers in the turn sequence is a logical first step to creating a richer feature set that includes information about discourse relations. 79 Ultimately, one would idealy want to identify the underlying relations that hold betwen the turns in the sequence. Identifying the speaker or wing of the speaker along with how the turn relates to the previous turn would clearly provide rich information about the interaction of justices and would likely be highly informative regarding case outcomes. Though sentiment analysis would likely make considerable contributions to the quality of Supreme Court forecasting as suggested by Wrightsman (2008) and Johnson et al. (2009a) automatic detection of sentiment in a domain such as Supreme Court discourse is likely to be considerably harder than the already dificult typical sentiment analysis tasks. While overt sentiment may be expresed by word choice, in a formal seting such as the Supreme Court, sentiment wil often not be expresed overtly, thus requiring researchers to rely on methods for identifying covert sentiment (Evans et al. 2007, Gren and Resnik 2009). This raises its own isues, as expresion of covert sentiment is likely to vary betwen cases as the isue area of cases changes. These factors make the task of automatic sentiment detection in this domain a considerably diferent task than typical areas of sentiment detection such as movie and product reviews. In Chapter 1 we discussed the potential broader implications of this research. That is, this work could be extended to other situations where we are interested in the relationship betwen conversational behavior and non-linguistic actions. While we are confident that we could directly apply these approaches to other similar situations, e.g. lower courts or even contestant judging on reality shows, this opens up the question of just how far approaches similar to those covered here can be applied. Do individuals in conversational setings take on recognizable natural roles (e.g. leader, ?devil?s advocate?, etc.) that are applicable across numerous situations? If so, would we be able to reduce 80 reliance on speaker and domain specific training data, expanding the applicability of these approaches to a wider range of conversational setings such as busines negotiations and other metings? And, what might we learn about human interaction in general and the relationship betwen conversational interaction and real world actions from these sorts of approaches? By exploring the conversational dynamics of the U.S. Supreme Court and their relationship with the actions taken by the Court as a whole and by individual justices, this work begins to addres these questions. 81 Apendix A Rose Charts Al Cases 82 83 DIR Condition 84 85 86 JDIR Condition 87 88 89 Vote Split 90 91 92 93 94 95 96 97 Apendix B Discourse Markers Note: Some of these discourse markers include some regular-expresion syntax. above al above al absolutely acordingly actualy add to this additionaly admitedly after after al after that after this afterwards again again and again albeit al in al al right al the same al this time already alright also also because alternatively although altogether always asuming that analogously and and again and also and another and then another time anyhow anyway apart from apart from that arguably as as a consequence as a corollary as a hypothetical as a logical conclusion as a mater of fact as a result as a whole as against as an as briefly as as closely as as evidence as far as as for as i said as i say as i understand as if as it happened as it is as it turned out as long as as luck would have it as soon as as such as though as to as we shal as we wil as wel aside from asuming at a time at any rate at first at first sight at first view at last at least at most at once at some level at some point at that at that moment at that point at that time at the moment at the moment when at the outset at the same time at the time at this date at this moment at this point at this stage at which at which point back back to my original point because because of because of this before before long before that before then besides besides that beter briefly but but also but then but then again by by al means by and by by and large by comparison by contrast by that time by the same by the same token by the time by the way by then certainly clearly come to think of it conceivably consequently considering considering that contrariwise conversely correspondingly decidedly definitely despite despite that despite the fact that despite this doubtles each time earlier either either case either event either way else elsewhere equaly 98 especialy esentialy even even after even before even if even so even then even though even when eventualy ever since every time everywhere evidently except except after except before except if except when except in so far as except that except when excuse me failing that finaly fine first first of al firstly following following this for for a start for example for fear that for instance for one for that for that mater for that reason for the reason that for the simple reason for this for this reason for me formerly fortunately frankly from al from everything from now on from then on from your answer further furthermore given given that granted that having said having said that hence here herein here's heretofore hitherto however however that may be hum i don't think i gues i mean i say i suppose i suspect i take it i think i thought i understand if if ever if in fact if indeed if not if only if so if such a in a diferent vain in a sense in actual fact in addition in al candor in al due respect in any case in any event in case in comparison in conclusion in consequence in contrast in doing in doing so in doing this in efect in esence in fact in fairnes in general in just the same way in may be concluded that in my case in my opinion in my view in one instance in order to in other respects in other words in our judgment in our view in part in particular in place of in point of fact in practice in real world terms in response in retrospect in short in so doing in so many words in spite of in spite of that in such a in such an in sum in that in that case in that instance in that respect in that scenario in that statement in the beginning in the case of in the end in the event in the first place in the hope that in the meantime in the same way in theory in this case in this connection in this respect in this way in truth in turn in which in which case in your opinion in your view inasmuch as incidentaly including incontestably incontroversialy indeed indisputably indubitably initialy insofar insofar as 99 instantly instead instead of interestingly interestingly enough ironicaly it becomes it can be concluded that it follows it follows that it happens it is because it is clear it is conceivable it is conclusive it is correct it is for this reason it is only it (may|might) sem that it (may|might) appear that it (may|might) sem that it turns out just just a pause just about just again just as just before just then kind of largely largely because last lastly later lest let us let us asume let us consider like likewise listen literaly look luckily mainly mainly because meanwhile merely merely because mind you more acurately more importantly more precisely more specificaly more to the point moreover most likely much as much later much sooner my point my position my question my response my solution my understanding naturaly needles neither neither is it the case never again nevertheles next next moment next time no no doubt no mater no sooner than nonetheles nor normaly not not at al not automaticaly not because not by itself not completely not directly not exactly not necesarily not only not quite not realy not specificaly not that notably notwithstanding notwithstanding that now now that obviously of course oh okay|ok on a diferent note on acount of on another on balance on condition on condition that on its face on its own on one hand on one side on that on that point on that question on that very point on the bases on the basis on the contrary on the face of on the grounds on the grounds that on the one hand on the other on the other hand on the other side on this basis on this particular isue on top of it on top of that on top of this on which once once again once more only only after only because only before only if only when oops or or again or else ordinarily originaly other than otherwise our focus our only point our point our position overal parentheticaly particularly particularly when perhaps plainly possibly 100 potentialy practicaly precisely presently presumably presumably because previously probably provided provided that providing that put another way quite quite likely quite simply quite the contrary rather reasonably reciprocaly regardles regardles of that returning to right rightly so say second secondly se seing as seing that semingly significantly similarly simply simply because simultaneously since so so far so if so that some time soon speaking of specificaly stil stil and al strictly speaking subsequently such as such that suddenly summarizing summing up suppose suppose that supposedly supposing that sure enough surely technicaly that that done that is that is al that is how that is to say that is why that reminds me that said that way the end the fact is the fact is that the first time the instant the isue here the key the key words the last time the later the logic is that the moment the more the more often the next time the one time the point the point being the point is the question the question is the thing is then then again theoreticaly there again there are a few things thereafter thereby therefore there('s| is) no doubt thereupon third thirdly this case this claim this court this means this time though thus thus far to add to be clear to be fair to them to be precise to be sure to begin with to clarify to close to comment to conclude to explain to follow-up to get back to go on to go to to ilustrate to interupt to make maters worse to me to my knowledge to note to open to put it to put it in context to put it this way to repeat to start with to stop to sum up to summarize to take an example to the best of my knowledge to the best of our knowledge to the degre that to the extent to the extent possible to the extent that to this end to the asumption too traditionaly two two answers two points two primary reasons two reasons two responses two separate two things typicaly uh ultimately undeniably under the circumstances 101 under these circumstances understand undoubtedly unfortunately unles unquestionably until until then up to now up to this very briefly very likely very quickly we agre we believe we believed we might say we think not we think that wel what i mean to say what is more whatever when whenever where whereas whereby whereupon wherever whether whether or not which which is why which means which reminds me whichever while while i have you who whoever with absolute certainty with al due respect with al respect with one addition with regard to with respect with respect to with that with this without yes yet you know you se false true 102 References Ali v. Federal Bureau of Prisons. 06-9130 U. S. (2007). Benesh, S. C. (2002). Becoming an Inteligent User of the Spaeth Supreme Court Databases. Southwestern Political Science Asociation Meting. New Orleans, LA. Biscupic, J. 2006. Justices make points by questioning lawyers. USA Today. (Oct. 5, 2006). Brown, G. and Yule, G. (1983). Discourse Analysis. Cambridge: Cambridge University Pres. Clopper, C. J., and Pearson, E. S. (1934). The use of confidence or fiducial limits ilustrated in the case of the binomial. Biometrika, 26, 404?413. Cortes, C. and Vapnik, V. (1995). Support-vector network. Machine Learning, 20. Draper, G. M., Livnat, Y., Riesenfeld, R. F. (2009). A Survey of Radial Methods for Information Visualization. IEE Transactions on Visualization and Computer Graphics. 15(5), 759-776. Duke Law. (2009). Supreme Court Asociate Justice Antonin Scalia presides over Dean's Cup Moot Court Competition Duke Law News and Events. http:/ww.law.duke.edu/news/story?id=2943&u=11. Engquist v. Oregon Dept. of Agriculture. 07-474 U. S. (2008). Evans, M., McIntosh, W., Lin, J., and Cates, C. (2007). Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research. Journal of Empirical Legal Studies, 4(4), 1007-1039. Federal Election Comm'n v. Wisconsin Right to Life, Inc. 06-969 U. S. (2007). Forbes-Riley, K. and Litman, D. (2004). Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources. In Procedings of the Human Language Technology Conference: 4th Meting of the North American Chapter of the Asociation for Computational Linguistics. Galey M., McKeown, K., Hirschberg, J., Shriberg, E. (2004). Identifying Agrement and Disagrement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies. In Procedings of the 42nd Annual Meting of the Asociation for Computational Linguistics (669-676). 103 Garside, R. (1987). The CLAWS Word-tagging System. In: R. Garside, G. Lech and G. Sampson (eds), The Computational Analysis of English: A Corpus-based Approach. London: Longman. Grene, S. and Resnik, P. (2009). More Than Words: Syntactic Packaging and Implicit Sentiment. In Procedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Asociation for Computational Linguistics. Grosz, B. and Hirschberg, J. (1992) Some Intonational Characteristics Of Discourse Structure. In Procedings of the International Conference on Spoken Language Procesing. Grosz, B. and Sidner, C. L. (1986). Atention, Intentions, and the Structure of Discourse. Computational Linguistics, 12(3), 175-204. Gurevych, I., Strube, M. (2004) Semantic Similarity Applied To Spoken Dialogue Summarization. In Procedings of the 20th International Conference on Computational Linguistics. Haliday, M. A. K., and Hasan, R. (1976). Cohesion in English. London: Longman. Hawes. T., Lin J., and Resnik, P. (2009) Elements of a Computational Model for Multi- Party Discourse: The Turn-Taking Behavior of Supreme Court Justices. Journal of the American Society for Information Science and Technology, 60(8), 1607 ? 1615. Hutchby, I. and Wooffit, R. (2008). Conversation Analysis. Cambridge: Polity Pres. Janin, A., Baron, D., Edwards, J., Elis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A. and Wooters, C. (2003). The ICSI Meting Corpus. In Procedings of the IEE International Conference on Acoustics, Speech, and Signal Procesing (ICASSP) (364?367). Johnson, T. R. (2001). Information, oral arguments, and Supreme Court decision making. American Politics Research, 29(4), 331?351. Johnson, T. R. (2004). Oral arguments and decision making on the United States Supreme Court. Albany, NY: State University of New York Pres. Johnson, T. R., Black, R., Goldman, J. and Treul, S. (2009) Inquiring Minds Want to Know: Do Justices Tip Their Hands with Questions at Oral Argument in the U.S. Supreme Court?. Washington University Journal of Law & Policy, 29. Johnson, T. R., Black, R., and Ringsmuth, E. (2009) Hear Me Roar: What Provokes Supreme Court Justices to Disent from the Bench? Minnesota Law Review. 104 Johnson, T. R., Spriggs, J. F., and Wahlbeck, P. J. (2007). Supreme Court Oral Advocacy: Does it afect the Justices' Decisions?. Washington University Law Review, 85. Johnson, T. R., Wahlbeck P.J., and Spriggs, J.F., I. (2006). The influence of oral arguments on the U.S. Supreme Court. American Political Science Review, 100(1), 99? 113. Johnson, T. R., Wahlbeck, P. J., and Spriggs, J. F. (2006). The Influence of Oral Arguments on the U.S. Supreme Court, American Political Science Review. Johnstone, B. (2007). Discourse Analysis. Malden: Blackwel Publishing. Jovanovic, N., and Akker, R. op den. (2004). Towards automatic addrese identification in multi-party dialogues. In M. Strube and C. Sidner (Eds.), Procedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT/NACL 2004 (89?92). Kansas v. Marsh (Reargued). 04-1170 U. S. (2006). Kohavi, R. (1995). The Power of Decision Tables. In 8th European Conference on Machine Learning (174-189). Kurland, P. B., & Hutchinson, D. J. (1983). The busines of the Supreme Court, O. T. 1982. The University of Chicago Law Review, 50(2), 628-651. Laferty, J. D., McCalum, A., and Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In C.E. Brodley and A.P. Danyluk (Eds.), Procedings of the Eightenth International Conference on Machine Learning (ICML 2001) (282?289). Laver, M., Benoit, K., and Gary, J. (2003). Extracting policy positions from political texts using words as data. American Political Science Review, 97(2), 311?331. MacWhinney, B., Bird, S., Cieri, C., and Martel, C. (2004). TalkBank: Building an open unified multimodal database of communicative interaction. In Procedings of the 4th International Conference on Language Resources and Evaluation (LREC). Manning, C. D. and Sch?tze, H. (1999). Foundations of Statistical Natural Language Procesing. Cambridge: MIT Pres. Marcu, D. (1997). The rhetorical parsing of unrestricted natural language texts. In P.R. Cohen and W. Wahlster (Eds.), Procedings of the 35th Annual Meting of the Asociation for Computational Linguistics (ACL 1997) (96-103), adrid, Spain: ACL. Marcu, D. and Echihabi, A. (2002) An Unsupervised Approach to Recognizing Discourse Relations. In Proceding of the ACL/NACL. 105 Martin, A. D. and Quinn, K. M. (2002). Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953-1999. Political Analysis. 10, 134- 153. Michael A. Watson v. United States. 06-571 U. S. (2007). Morris, J. and Hirst, G. (1991). Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text . Computational Linguistics 17(1), 21-48. Mosteler, F. and Walace, D. L. 1964. Inference and Disputed Authorship: The Federalist. Reading: Addison-Wesley. Oates, S. (2001). A listing of discourse markers. Technical Report ITRI-01-26. Retrieved January 10, 2008, from University of Brighton, Information Technology Research Institute Web site: ftp:/ftp.itri.bton.ac.uk/reports/ITRI-01-26.pdf. Pang, B. and Le, L. (2008). Opinion Mining and Sentiment Analysis. Boston: Now Publishers Inc. Purver, M., K?rding, K., Grifiths, T. and Tenenbaum, J. (2006). Unsupervised Topic Modeling for Multi-Party Spoken Discourse. In Procedings of COLING/ACL 2006 (pp. 17-24), Sydney, Australia: July 2006. Randall v. Sorrel. 04-1528. U. S. (2004). Rehnquist, W.H. (2002). The Supreme Court. New York: Vintage. Rohde, D. and Spaeth, H. (1976). Supreme Court Decision Making. San Francisco: Freman. Rombeck, T. (2002). Justice takes time for Q&A. Lawrence Journal-World. Ruger, T. W., Kim, P., Martin, A. D. and Quinn, K. M. (2002). The Supreme Court Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court Decisionmaking. Columbia Law Review. Ruger, T. W., Kim, P., Martin, A. D. and Quinn, K. M. (2004). Competing Approaches to Predicting Supreme Court Decision Making. Perspectives on Politics Symposium. 2(4). Samson v. California. 04-9728 U. S. (2006). Schegloff, E. A. (2007). Sequence Organization in Interaction: Volume 1: A Primer in Conversation Analysis. Cambridge: Cambridge University Pres. Schifrin, D. (1987). Discourse markers. Cambridge: Cambridge University Pres. 106 Schifrin, D., Tannen, D. and Hamilton, H. E. (eds.) 2001. The Handbook of Discourse Analysis. Malden: Blackwel Publishers Inc. Segal, J. A. and Spaeth, H. J. (2002). The Supreme Court and the Atitudinal Model Revisited. Cambridge: Cambridge University Pres. Setles, B. (2004). Biomedical named entity recognition using conditional random fields and rich feature sets. In N. Collier, P. Ruch, and A. Nazarenko (Eds.), In Procedings of the COLING 2004 International Joint workshop on Natural Language Procesing in Biomedicine and its Applications (NLPBA/BioNLP 2004) (107?110). Sha, F., and Pereira, F. (2003). Shalow parsing with conditional random fields. In M. Hearst and M. Ostendorf (Eds.), In Procedings of Author Proof the 2003 Human Language Technology Conference and the North American Chapter of the Asociation for Computational Linguistics Annual Meting (134?141), Edmonton, Alberta, Canada: ACL. Shullman, S. L. (2004). The ilusion of devil?s advocacy: How the justices of the Supreme Court foreshadow their decisions during oral argument. The Journal of Appelate Practice and Proces, 6, 271?293. Small v. United States. 03-750 U. S. (2004). Snyder v. Louisiana. 06-10119 U. S. (2007). Spaeth, H. J. (2009). The Original U.S. Supreme Court Judicial Database. http:/ww.cas.sc.edu/poli/juri/sctdata.htm. Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Es-Dykema, C. , Ries, K., Shriberg, E., Jurafsky, D., Martin, R. and Meter, M. (2000). Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Computatuinal Linguistics, 26(3). Sutton, C. and McCalum, A. (2006). Introduction to Conditional Random Fields for Relational Learning In L. Getoor and B. Taskar (Eds.), Introduction to Statistical Relational Learning. Thomas, M., Pang, B., and Le, L. (2006). Get out the vote: Determining support or opposition from Congresional floor-debate transcripts. In D. Jurafsky and E. Gaussier (Eds.), Procedings of the 2006 Conference on Empirical Methods in Natural Language Procesing (EMNLP 2006) (327?335). Sydney, Australia: ACL. Toutanova, K., Klein, D., Manning, C., and Singer, Y. (2003). Feature-Rich Part-of- Speech Tagging with a Cyclic Dependency Network. In Procedings of HLT-NACL 2003 (252-259). 107 Travelers Casualty & Surety Co. of America v. Pacific Gas & Elec. Co. 05-1429 U. S. (2007) . Wrightsman, L. S. (2008). Oral Arguments Before the Supreme Court. New York: Oxford University Pres. Wrightsman, L. S. (2008). Oral Arguments Before the Supreme Court. Oxford: Oxford University Pres. Yuan, J. and Liberman, Mark. (2008). Speaker Identification in the SCOTUS corpus. In Procedings of Acoustics ?08.