ABSTRACT 
 
 
 
Title of Thesis: COMPUTATIONAL ANALYSIS OF THE CONVERSATIONAL 
DYNAMICS OF THE UNITED STATES SUPREME COURT 
 
  Timothy W. Hawes, Master of Arts, 2009 
 
Thesis directed by: Profesor Jimy Lin 
 The iSchool 
 Profesor Philip Resnik 
 Department of Linguistics 
 
 
The decisions of the United States Supreme Court have far-reaching implications 
in American life. Using transcripts of Supreme Court oral arguments this work looks at 
the conversational dynamics of Supreme Court justices and links their conversational 
interaction with the decisions of the Court and individual justices. While several studies 
have looked at the relationship betwen oral arguments and case variables, to our 
knowledge, none have looked at the relationship betwen conversational dynamics and 
case outcomes. Working from this view, we show that the conversation of Supreme Court 
justices is both predictable and predictive. We aim to show that conversation during 
Supreme Court cases is paterned, this paterned conversation is asociated with case 
outcomes, and that this asociation can be used to make predictions about case outcomes. 
We present thre sets of experiments to acomplish this. The first examines the 
order of speakers during oral arguments as a paterned sequence, showing that cohesive 
elements in the discourse, along with references to individuals, provide significant 
improvements over our ?bag-of-words? baseline in identifying speakers in sequence 
 
within a transcript. The second graphicaly examines the asociation betwen speaker 
turn-taking and case outcomes. The results presented with this experiment point to 
interesting and complex relationships betwen conversational interaction and case 
variables, such as justices? votes. The third experiment shows that this relationship can be 
used in the prediction of case outcomes with acuracy ranging from 62.5% to 76.8% for 
varying conditions. Finaly, we offer recommendations for improved tools for legal 
researchers interested in the relationship betwen conversation during oral arguments and 
case outcomes, and suggestions for how these tools may be applied to more general 
problems. 
 
 
 
 
 
 
COMPUTATIONAL ANALYSIS OF THE CONVERSATIONAL DYNAMICS OF 
THE UNITED STATES SUPREME COURT 
 
 
by 
 
Timothy W. Hawes 
 
 
 
 
Thesis submited to the Faculty of the Graduate School of the 
University of Maryland, College Park in partial fulfilment 
of the requirements for the degre of 
Master of Arts 
2009 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Advisory Commite: 
 
Profesor Jimy Lin, Co-Chair 
Profesor Philip Resnik, Co-Chair 
Profesor Wayne McIntosh 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
?Copyright By 
Timothy W. Hawes 
2009 
 
 ii 
Acknowledgments 
 
I couldn?t possibly list al the people I want to thank and al the things they have 
done for me. Please know whether it is listed here or not, I am extremely grateful for 
everything my friends, family and acquaintances have done for me. 
I would like to thank: 
Dr. Jimy Lin and Dr. Philip Resnik, my advisors on this project, for their continualy 
invaluable support, fedback, encouragement and advice not just on this project, but in 
general. 
Dr. Wayne McIntosh and Dr. Michael Evans, for their generosity with their time, 
opinions and ideas throughout the course of this project. It was a discussion with them 
that gave initial shape to the conversational view taken in this thesis. 
Dr. Amy Weinberg, my first oficial advisor in the Department of Linguistics, for her 
excelent guidance and understanding. 
Dr. Stephan Grene, for his time and ideas at the earliest stages of this work. 
The Department of Linguistics and al of its profesors, for their support and guidance. 
Al of my sources of funding over the past 3 years. 
My crack team of proof-readers: Kely Schultz, Dan Knudsen, Mindy Watson, Mischa 
Bauermeister, Gordon Freman, Indira Sriram and Brian Hawes. They noticed more 
typos than I?d care to admit and each provided excelent suggestions on how to improve 
my thesis. 
Al of my friends at the University of Maryland and especialy Johannes, Josh, and Greg, 
for their good humor, support, advice and fedback over the years; Asad for his last 
minute help saving me hours of highway driving, and also his always enjoyable 
 iii 
conversations; and the many more who should be thanked for everything from invaluable 
help and support to just being good friends. 
Al of my friends who have since dispersed across the globe: Dan, Gordon, John, Tim, 
Mischa, Kevin, Kara and others. You have done more for me than I could ever recount. 
I thank Kely, for her love and support over the years. 
And my family: Mom, Dad, Kendra, Tim, Gam, my aunts and uncles (especialy Aunt 
Jane and Uncle Wayne), my cousins (especialy John and Kyle) and Chris (who while she 
isn?t technicaly ?family?, should be listed here). I appreciate everything you al have 
done for me. 
 iv 
Table of Contents 
Acknowledgments..........................................................................................................ii 
Table of Contents...........................................................................................................iv 
List of Tables.................................................................................................................vi 
List of Figures...............................................................................................................vii 
Chapter 1 Introduction.....................................................................................................1 
Chapter 2 Background.....................................................................................................5 
2.1 Oral Arguments/Supreme Court.............................................................................5 
2.2 Discourse Analysis................................................................................................7 
2.3 Conversation Analysis.........................................................................................10 
2.3 Computational Conversational/Discourse Analysis..............................................12 
2.4 Quantitative Oral Arguments Research................................................................13 
2.5 Spaeth Supreme Court Database..........................................................................19 
Chapter 3 Sequence Labeling........................................................................................20 
3.1 Methods...............................................................................................................21 
Data Preparation...................................................................................................21 
Corpus Description................................................................................................22 
Feature extraction.................................................................................................22 
Labeling................................................................................................................23 
Features................................................................................................................25 
3.2 Experiments.........................................................................................................30 
Results...................................................................................................................30 
Discussion.............................................................................................................34 
Chapter 4 Visualizing Dynamics...................................................................................35 
4.1 Methods...............................................................................................................36 
Corpus description................................................................................................36 
Case Segmentation................................................................................................37 
Labeling description..............................................................................................38 
The Rose Charts....................................................................................................39 
4.2 Results.................................................................................................................41 
How to read the charts...........................................................................................41 
Vote Split Condition (VOTE)..................................................................................44 
Direction Condition (DIR).....................................................................................46 
Justice Direction (JDIR)........................................................................................50 
4.3 Discussion...........................................................................................................54 
Chapter 5 Vote Prediction..............................................................................................56 
5.1 Prior approaches..................................................................................................56 
5.2 Forecasting votes.................................................................................................59 
5.3 Methods...............................................................................................................60 
Corpus Description................................................................................................60 
Turn Distribution...................................................................................................61 
Data Preparation...................................................................................................64 
Baselines...............................................................................................................66 
5.4 Experiments.........................................................................................................67 
Results...................................................................................................................70 
 v 
Discussion.............................................................................................................74 
Chapter 6 Conclusions...................................................................................................76 
6.1 Future work and Unanswered Questions..............................................................77 
Appendix A Rose Charts...............................................................................................81 
Al Cases...................................................................................................................81 
DIR Condition...........................................................................................................83 
JDIR Condition.........................................................................................................86 
Vote Split..................................................................................................................89 
Appendix B Discourse Markers.....................................................................................97 
References...................................................................................................................102 
 
 vi 
List of Tables 
Table 1 Example conjunctive relation markers (Brown and Yule 1983; 191).................10 
Table 2 Sumary of previous studies. ?Manual? indicates whether or not the study used 
manual methods of outcome forecasting (the alternative being automatic methods). 
?Cases? indicates the number of cases tested in the study...............................................15 
Table 3 Examples of non content items from the transcript of the oral arguments from Ali 
v. Federal Bureau of Prisons (06-9130) with the special symbols used to identify these 
items in our experiments................................................................................................22 
Table 4 Mean Martin-Quinn scores for the 2005-2007 terms. Note, negative scores 
indicate a liberal ideology and positive scores indicate a conservative ideology. The 
higher (lower) the number the more conservative (liberal) the ideal point is...................36 
Table 5 Comparison of ?most atention given? approaches with varying interpretation of 
?question?. ?By turn? indicates that we count each turn as a ?question?. ?By ?s? indicates 
we counted ?s in the transcribed justices? speech, usualy indicating an interogative 
statement.......................................................................................................................57 
Table 6 Comparison of ?most atention given? rule for extreme cases (i.e. diference in 
words or questions is > 2 s.d. from the mean). The ?Cases? column indicates how many 
cases met this criterion..................................................................................................58 
Table 7 Speakers and their coresponding symbols. The count column identifies the 
frequency with which each symbol appears in the corpus..............................................61 
Table 8 20 most frequent n-grams grouped by correspondence pair, ranked by most 
frequent n-gram in pair..................................................................................................62 
Table 9 Infrequent n-grams containing 3-4 instances of justice turns.............................63 
Figure 17 Clasification results including prior approaches (Court I only), baseline, and 
absolute acuracy. Eror bars are the 90% confidence interval as calculated by the 
Clopper-Pearson method for infering exact binomial confidence intervals....................70 
 
 vii 
List of Figures 
Figure 1 Empirical probability of each justice symbol in the corpus (Hawes et al. 2009).
......................................................................................................................................24 
Figure 2 Diagram of a linear chain of labels, where X
i
 is a group of observed features and 
Y
i 
is a label....................................................................................................................25 
Figure 3 Example of features extracted from a transcript segment..................................29 
Figure 4 1
st
 CRF 10-fold Cross-Validation Results. Annotations represent the relative 
improvement over Unigram baseline for the Unigram + DM +Ref condition (Hawes et 
al. 2009)........................................................................................................................31 
Figure 5 2
nd
 order CRF 2-fold Cross-Validation Results. Annotations represent the 
relative improvement over Unigram baseline for the Unigram + DM +Ref condition 
(Hawes et al. 2009)........................................................................................................32 
Figure 6 Overal acuracy of first and second order CRFs. Bars are annotated with the 
relative improvement over Unigram baseline. Eror bars are the 95% confidence interval 
as calculated by the Clopper-Pearson method for infering exact binomial confidence 
intervals.........................................................................................................................33 
Figure 7 Sequence of truncated turns, the sequence extracted from these turns and the 
resulting trigrams...........................................................................................................38 
Figure 8 Stevens - Rose Diagram of Al Cases...............................................................42 
Figure 9 Kennedy ? Rose Diagrams for 5-4 and 9-0 split cases......................................45 
Figure 10 Alito - Rose Diagrams for the DIR Condition................................................47 
Figure 11 Ginsburg - Rose Diagrams for the DIR Condition..........................................48 
Figure 12 Kennedy - Rose Diagrams for the DIR Condition..........................................49 
Figure 13 Alito - Rose Diagrams for the ALTODIR Condition......................................51 
Figure 14 Souter - Rose Diagrams for the SOUTDIR Condition....................................53 
Figure 15 Kennedy - Rose Diagrams for the KENDIR Condition..................................54 
Figure 16 Examples of ?Laughter? and interuptions in the transcript............................64 
Figure 17 Clasification results including prior approaches (Court I only), baseline, and 
absolute acuracy. Eror bars are the 90% confidence interval as calculated by the 
Clopper-Pearson method for infering exact binomial confidence intervals....................70 
Figure 18 Informative sequences from Thomas decision tres with examples from 
transcripts......................................................................................................................73 
 
 
 1 
 
Chapter 1 Introduction 
 
The United States Supreme Court plays a significant role in the U.S. Government; 
the decisions reached by Supreme Court justices have far-reaching implications for the 
entire American legal system. In this work, we aim to combine conversation analysis 
with computational techniques in novel approaches for the analysis of the behavior of the 
U.S. Supreme Court, in terms of both the justices individualy and the Court as a whole.  
Considerable amounts of work have been done applying computational 
techniques to the political domain. For example, Mosteler and Walace (1964) utilized 
models based on function word counts to identify the authorship of The Federalist 
Papers. Laver et al. (2003) used party manifestos and legislative speeches to identify the 
ideological positions of political parties in Britain, Ireland and Germany. More directly 
related to this work is that of Thomas et al. (2006), who examined the content of 
congresional floor debates and the relationships betwen congrespersons to determine 
whether individuals were in support of or opposition to the legislation under discussion. 
Also, Evans et al. (2007) clasified the ideological position of third-party briefs from the 
briefs? content. We leave further discussion of related work to Chapter 2. 
This thesis explores justice turn-taking during United States Supreme Court oral 
arguments and its relationship to other aspects of justice behavior. For our purposes, we 
wil treat each speech segment in the argument transcripts with a single speaker identifier 
as one turn.
1
 Thus, the oral arguments are organized into a series of turns produced by the 
                                                
1
 Due to the Courtrom reporter?s handling of factors such as interuption and overlaping spech, this 
definition of turn is somewhat diferent from that used in conversation analysis, where they are ?turns at 
talk? composed of units that are gramaticaly and phoneticaly realized and ?constitute a recognizable 
 2 
justices and the atorneys before the Court. The first experiments we discuss look at the 
prediction of the turn-taking behavior of justices by exploring the task of labeling turns 
with their speakers when this information is unavailable in an oral arguments transcript. 
The next Chapter is a broad-scale analysis of the turn-taking paterns of justices in 
various conditions by looking at paterns of when justices typicaly follow-up on other 
justices? lines of questioning. Chapter 5 discusses a group of experiments that looks at the 
turn-taking behavior of justices as a predictor of case outcomes. 
This work wil be imediately relevant to researchers exploring the behavior of 
the United States Supreme Court. This view of the conversational dynamics betwen the 
justices as both predictable and predictive is one that has received litle atention in the 
literature. By applying computational models to this approach, this work wil provide 
new tools that may be able to open up novel avenues of research for legal scholars. 
Moreover, this work should also have broader implications. While we have concentrated 
on applying existing computational tools to a new approach to understanding the 
Supreme Court, the methods we develop here wil be applicable to similar setings where 
one may wish to link conversational actions to other actions with a real world impact. If 
this is the case, then these methods wil help to provide a deeper understanding of other 
social institutions and human conversational interaction in general. 
While the narow focus of this work is to produce methods for clasification and 
labeling of the oral arguments of the U.S. Supreme Court, this research was conducted 
with the broader goal of creating novel approaches for judicial scholars to use in 
examining the dynamics of the Supreme Court. 
                                                                                                                                            
action in context? (Scheglof 207; 3-4). Despite this diference, there wil stil be significant overlap 
betwen what we are defining as a turn and what a conversation analyst would define as a turn. 
 3 
Our primary objective is to gain a clearer understanding of the role of the 
conversational dynamics of Supreme Court justices. We aim to show that: a) predictable 
high level paterns exist in the conversational dynamics of the Supreme Court, b) these 
paterns may be asociated with other areas of interest to legal scholars such as voting 
paterns of the justices, c) this asociation betwen linguistic paterns and judicial paterns 
may be utilized to provide both short term insights (i.e. predicting the outcome of a 
particular case) and deeper insights about the behavior of the Supreme Court. 
In the proces of pursuing these objectives we have decided to minimize the need 
for specialized knowledge and training for feature identification. In order to do this, we 
minimize theoretical commitments, thus reducing the need for an extensive background 
in any particular theory of discourse. Moreover, we want to reduce reliance on features 
that can only be encoded with human judgment and expertise, by favoring features that 
can be automaticaly recognized. By restricting ourselves to such conditions we hope to 
maximize the applicability and reproducibility of our methods, as the reliance on human 
judgment has hampered both of these qualities in some previous work. Despite this, we 
expect that higher level information from more sophisticated approaches, such as 
sentiment analysis, would only add to the value and power of these basic approaches. 
Producing any positive result for this work is a contribution to the overal 
understanding of the Court. While smal studies using human judgments have produced 
relatively large positive results, larger studies using automatic methods stil achieve 
relatively smal improvements (Se Section 2.4). In one case, these automatic methods 
achieve comparable results to our own work with an order of magnitude more data. Also, 
when tested on our dataset, these methods achieve considerably lower results. Just as 
 4 
these larger studies have contributed to the understanding of the relationship betwen one 
aspect of oral arguments and case outcomes, positive results in this work should 
contribute to the understanding of the relationship betwen conversational interaction and 
case outcomes. Moreover, given the relative simplicity of our feature sets, the fact that 
we are able to gain some predictive power at al from these features may be a surprising 
result for legal scholars (Evans, M. personal correspondence, August 28, 2009).  
Thesis Organization 
 The remainder of this thesis is organized as follows: 
? Chapter 2 discusses background on oral arguments, discourse and 
conversation analysis, computational approaches to discourse and 
conversation analysis, quantitative research on oral arguments, and the 
Supreme Court case database used in two of our experiments.  
? Chapter 3, Chapter 4 and Chapter 5 cover our thre experiment groups 
dealing with turn sequence labeling, ?rose diagrams? of turn-taking and 
case outcomes and case outcome prediction, respectively.  
? The final Chapter offers conclusions from this work and suggests some 
future research and unanswered questions. 
 5 
 
Chapter 2 Background 
 
This chapter contains thre main parts. The first part covers the domain 
knowledge regarding the area of study contained in this thesis, namely, oral arguments 
and the Supreme Court. The second introduces the linguistic area of study we utilize in 
this thesis, specificaly, conversation and discourse analysis. The third part is an overview 
of computational studies in discourse analysis as wel as a review of both computational 
and manual studies of the Supreme Court. We include one final section to introduce our 
source of Supreme Court case data (not including oral argument transcripts). 
2.1 Oral Arguments/Supreme Court 
 As one of the last, and only public, stages a case goes through before the Supreme 
Court, the importance of oral arguments is often questioned. At this stage, al briefs have 
ben submited by each side of a case and by amici curiae, and the justices have had time 
to study the details of the case. It is believed that by this time, justices have had sufficient 
opportunity to make up their mind regarding a case, and so it is often suggested oral 
arguments play litle if any role in justices? decision making proces (Rhode & Spaeth 
1976; Kurland & Hutchinson 1983; Segal & Spaeth 2002). Kurland and Hutchinson 
(1983) argue, ?There are a few cases in which oral argument serves as a means of 
discovery by the Justices. But there is no reason why this discovery could not be 
conducted beter by interogatories than by oral deposition.? This view is not just held by 
academics either: some justices have also expresed these views. Justice Thomas once 
said, ?99 per cent of the time justices have made up their mind when they go to the 
 6 
bench. Also, there are so many questions you have to elbow your way in? (Rombeck 
2002; 5B). 
Even for those justices who do view oral arguments as important, it would sem 
that they do not believe oral augments typicaly lead a justice to change his or her mind. 
On the topic of whether oral argument maters, Justice Rehnquist wrote, ?I think it does 
make a diference? though only in ?a significant minority of cases?.The change is 
seldom a full one-hundred-and-eighty-degre swing, and I find that it is most likely to 
occur in cases involving areas of law ith which I am least familiar? (Rehnquist 2002). 
In a 2009 interview, Justice Scalia (who admits that he once believed oral arguments 
were a ?dog and pony show? (Johnson 2004)) said, ?A lot of people are under the 
impresion that [oral advocacy] is a dog and pony show. The judges have read the briefs, 
they come in with their minds made up, and this is just a performance for the benefit of 
your client. If that?s the impresion you have, you are just wrong. I have never met a 
judge who doesn?t think that oral argument is important? (Duke Law 2009). However, 
similar to Rehnquist, he suggested that only in cases where he has not already made up 
his mind do oral arguments play a role in his decision making. 
While the view that oral arguments are unimportant is commonly held, some 
scholars have also argued against it, suggesting that justices do in fact utilize information 
gained during oral arguments to make decisions (Johnson 2001, Johnson 2004, Shullman 
2004, Johnson et al. 2006). Johnson (2001; 2) points out that up to oral arguments, the 
majority of information the justices have sen is that which ?other actors want them to 
se and consider?, and that justices use oral arguments as an opportunity to get at what 
they want to ?se and consider? in order to make a decision in the case. However, even in 
 7 
these studies, the strongest conclusion made is that, in typical cases, oral arguments at 
best are used to refine a justices? opinion, thus having an important impact on the details 
of a case?s outcome but not necesarily on the case?s overal outcome.  
 Johnston et al. (2009a) note David Frederick?s observation that oral arguments are 
composed of conversations betwen a lawyer, a justice and another ?potentialy 
persuadable justice?. While the above description of oral arguments should indicate that 
the existence of ?potentialy persuadable justices? may be in question, it sems natural to 
presume that even if justices cannot be persuaded during oral arguments, other justices 
wil stil atempt to do so. 
2.2 Discourse Analysis 
Discourse analysis is a fairly broad subfield of linguistics. Schifrin et al. (2001; 
1) note that discourse analysis is often not strictly defined but usualy refers to one of 
thre domains of study; ?(1) anything beyond the sentence, (2) language use, and (3) a 
broader range of social practice that includes nonlinguistic and nonspecific instances of 
language.? Given this broad definition of discourse analysis, it is clear that there is an 
open view of what exactly is meant by ?discourse?. Typicaly, however, the term is used 
to indicate a language-based communication forming a ?unified whole? (refered to as a 
text in the discourse analysis literature), and such communications can take on a variety 
of forms including writen, spoken or signed (Haliday and Hasan 1976, Johnstone 2007). 
With regard to the domains of study discourse analysis may involve, aspects of this work 
could fal under each of these categories; while our first experiment looks at (potentialy) 
extra-sentential linguistic units, overal this work is looking at language use in a 
particular social seting, the Supreme Court, and the relationship betwen that language 
 8 
use and the overal behavior of the Supreme Court. As for our particular version of 
discourse, we are dealing with transcribed spontaneous speech which inherently 
incorporates both writen and spoken language. 
Regardles of the form of communication under consideration, thre of the key 
aspects of discourse an analyst is often concerned with are texture, cohesion, and 
coherence. Texture, the defining characteristic of a text, is identified by Haliday and 
Hasan (1976; 2) as ?the property of being a text?this [texture] is what distinguishes it [a 
text] from something that is not a text?. Take (1) for example. 
(1)  A: Does the store cary galvanized wire? 
  B: Yeah, they do.  
 
This simple exchange can be said to have texture, because it can stand alone as (or at 
least be a part of) a unified conversation. 
Contributing to the texture of (1) is the use of reference (anaphora; they refers to 
the store) and substitution (do stands in for carry galvanized wire) in B. Taken together, 
these lend cohesion to the text, creating texture. Cohesion refers to the relations that exist 
within a text betwen separate units in that text and the idea that ?the INTERPRETATION of 
some element in the discourse is dependent on that of another? (Haliday and Hasan 
1976; 4). In the example above, in order to interpret B correctly we need A. Cohesion 
can take on a number of forms, faling under the headings of gramatical cohesion and 
lexical cohesion. Gramatical cohesion refers to the use of gramatical tools to create 
cohesive relations in a text; including reference and substitution as in the example above 
as wel as elipsis (omision of clauses; e.g. Who stole the book? ? John stole the book) 
and conjunction (linking of clauses; e.g. John went to the bank. Later he went to the 
 9 
movies).
2
 We wil discuss conjunction more thoroughly later in this Section. Lexical 
cohesion includes repetition of the same word, or semanticaly related words such as 
holonyms (tre-forest), hypernyms (hat-clothing), semanticaly ?close? terms (banana-
apple), etc. (Haliday and Hasan 1976, Brown and Yule 1983). 
While cohesion deals with overt relations in a text, coherence deals with relations 
that must be interpreted by an individual listening to or reading a text. Coherent relations 
are the underlying relations that hold betwen segments of text (Brown and Yule 1983). 
Returning to (1) above, while B is a cohesive response to A, we need to appeal to 
coherence in order to describe it as an appropriate response to A, as cohesion is no 
guarante of coherence. For example, suppose we changed B in (1) as we have done in 
(2). While B is cohesive with A in (2), they stil refers to the store, it is no longer a 
coherent answer to A. 
(2) A: Does the store cary galvanized wire? 
B: They are open on Sundays. 
 
Thus, coherence too is a necesary aspect in building an interpretable discourse. For this 
work, we make the asumption that the texts we are dealing with, as spontaneous 
conversations betwen multiple individuals, are in fact coherent discourses at least for the 
parties involved. And, while it is not necesarily the case across al sorts of text and al 
relations within a text, we are making the asumption that the majority of cohesive 
relations existing in the text are representative of underlying coherent relations.  
 The connection betwen conjunction and the coherence relations they signal plays 
a role in Chapter 3. While the collection of potential conjunctive elements in English is 
extensive, Brown and Yule (1983) offer several examples as summarized in Table 1. 
                                                
2
 Note that the usage of some terms, such as anaphora and elipsis, is somewhat diferent in discourse 
analysis than in generative linguistics. 
 10 
   
Type Examples 
Additive and, or, furthermore, similarly, in addition, 
Adversative but, however, on the other hand, nevertheles 
Causal so, consequently, for this reason, it follows from this 
Temporal then, after that, an hour later, finaly, at last 
Table 1 Example conjunctive relation markers (Brown and Yule 1983; 191). 
 
It is important to note that because of the role of cohesion in the interpretation of 
discourse, these elements do not always identify the relations they are paired with in 
Table 1, nor are explicit elements required to mark these sorts of relations (Brown and 
Yule 1983). Nevertheles, overt markers of such relations are abundant in many forms of 
discourse, and do tend to exhibit some regularity in the relations they identify (as 
indicated by Table 1), even if the relationship is at times variable. 
2.3 Conversation Analysis 
Because this work deals with transcripts of oral arguments, it is most closely 
related to conversation analysis which may be viewed as a branch of discourse analysis.
3
  
Hutchby and Wooffit (2008; 13) write the ?aim? of conversation analysis (CA in their 
terms) ?is to focus on the production and interpretation of talk-in-interaction as an orderly 
acomplishment that is oriented by the participants themselves?. CA seks to uncover 
the organization of talk?from the perspective of how the participants display for one 
another their understanding of ?what is going on??. Because of this view, there is a focus 
on conversation as a sequence of ?turns at talk?, with each subsequent speaker turn in a 
conversation indicating the speaker?s understanding of the preceding conversation 
                                                
3
 However, conversation analysis comes with its own tols, methods and procedures for recording and 
analyzing conversation that we do not make use of. Despite this, many of the topics of interest to the 
conversation analyst are relevant to this discusion. 
 11 
(Hutchby and Wooffit 2008). In the present work we are particularly interested in this 
sequence of turns, how predictable that sequence is in a seting like the Supreme Court, 
and the relationship betwen this sequence and other actions taken by the Court. 
The previous discussion of cohesion and coherence can be tied into conversation 
analysis through a particular aspect of conversational sequence organization known as 
adjacency pairs. Adjacency pairs include two turns that are usualy, but not necesarily, 
adjacent in conversation, where the first turn ?initiates some exchange? and the second 
turn is ?responsive? to the first. These are treated as pairs because not al types of 
initiations can be followed by al sorts of responses. So while Question/Answer (e.g. (1) 
and Apology/Aceptance (e.g. (3) are typical adjacency pairs, Question/Aceptance and 
Apology/Answer are not (Schegloff 2007; 13-14). 
(3) A: Sorry I broke your mug. 
B: That?s ok. 
 
Regardles of the pair, recognizing a pair as a member of a particular type requires a 
coherent interpretation of that pair. However, responses to the first part of a pair may 
include or be entirely composed of elements that are cohesive with the previous turn (4). 
(4)  A: When are we going to the movies? 
 B: Later. 
 
Often times, as in the example given, these cohesive elements are conjunctive, linking the 
first turn to the second with relations related to those in Table 1. For example, if the 
initiating turn is a statement, a possible response may be to disagre with the statement. 
In this case, the response may begin with an ?adversative? element (5). 
(5) A: Let?s go to the movies. 
  B: But I don?t want to. 
 
 12 
As stated before, this relationship betwen cohesive elements and coherence relations 
offers insight into the discussion in Chapter 3.  
2.3 Computational Conversational/Discourse Analysis 
Though considerable work has been done in the domain of computational 
discourse analysis, interest in multi-party discourse involving more than two parties is 
relatively new, instead favoring single and two-party discourse. Broadly speaking, much 
of computational linguistics that explores language on the document level has focused on 
single-party discourse, since texts typicaly represent a single-party discourse. The 
following is a sampling of representative papers for single, two, and multi-party 
discourse. We concentrate on a variety of the more popular areas of research in discourse 
including coherence relation identification and topic segmentation and identification. 
For single party discourse (including text and monologue), Mann and 
Thompson?s (1988) Rhetorical Structure Theory (RST) has been used as a framework for 
identifying coherence relations in texts from a single author (Marcu, 1997; Corston-
Oliver, 1998). Marcu and Echihabi (2002) developed an approach to automaticaly 
identify discourse relations that hold betwen sentences and within sentence parts from a 
very large corpus of unannotated sentences drawn from textual resources. Grosz and 
Hirschberg (1992) used a Clasification and Regresion Tre analysis to identify 
discourse segments (building on the theory of discourse discussed in Grosz and Sidner 
(1986)) in Asociated Pres articles read aloud by news broadcasters. Morris and Hirst 
(1991) explored ?Lexical Chains? (spans of related words in a discourse; in this case text) 
as a means for modeling lexical cohesion.  
 13 
In the area of two-party dialog, Stolcke et al. (2000) modeled ?dialogue acts? in 
telephone conversations for automatic labeling.
4
 Forbes-Riley and Litman (2004) used 
acoustic and non-acoustic cues in spoken dialogs to predict the emotional state of 
students in one-on-one interaction with tutors via AdaBoost with decision tres. 
Gurevych and Strube (2004) used (manualy disambiguated) noun senses from WordNet 
to summarize the content of telephone-based conversations. Finaly, Wiliams and Young 
(2007) developed an approach for managing spoken human-machine dialogue. 
Much of the existing research on conversation involving thre or more parties has 
been conducted using the International Computer Science Institute (ICSI) meting corpus 
(Janin et al. 2003), though other corpora are available (e.g. TalkBank, which includes 
U.S. Supreme Court oral arguments as a subset of its documents (MacWhinney et al. 
2007)). Galey et al. (2003) use a lexical cohesion approach to create an unsupervised 
method of topic segmentation in multi-party ICSI metings, while Purver et al. (2006) 
offer an unsupervised method for topic segmentation and identification using Bayesian 
inference. Galey et al. (2004) used lexical, contextual and durational cues to identify 
agrement and disagrement betwen speakers turns in ICSI metings. 
2.4 Quantitative Oral Arguments Research 
To date, there have been several studies dealing with Supreme Court oral 
arguments. Johnson et al. (2009b) examine factors that may be involved in determining 
why and when justices wil give a disent from the bench, including the number of 
questions asked by the Court during oral arguments. This study found a smal efect in 
the relationship betwen disents from the bench and case activity measured by the 
                                                
4
 Dialog acts are often one part of an adjacency pair, e.g. ?STATEMENT, QUESTION,?  
AGREMENT, DISAGREMENT, and APOLOGY? (Stolcke et al. 2000).  
 14 
number of questions asked during oral arguments. In work related to our first 
experiments, Yuan and Liberman (2008) conducted speaker identification experiments 
using audio transcripts of oral arguments from 78 cases from the 2001 term.
5
 For the 800 
?clean? test samples used, 98% speaker identification acuracy was achieved by training 
8 justice specific speech recognition models, applying each model to a test utterance, and 
using the model with the highest score to identify the justice. 
We wil now discuss several studies aimed at forecasting case outcomes, which 
are summarized in Table 2. Wrightsman (2008) details several atempts to use manual 
quantitative and qualitative analysis to predict votes. The first of these examples recounts 
New York Times Supreme Court reporter Linda Grenhouse?s prediction of case 
outcomes based solely on oral arguments using her experience as a courtroom reporter. 
Of 27 articles she prepared based on oral arguments 17 contained predictions, 12 of 
which were correct (and one held-out because the case was dismised). The second 
example is an analysis of 28 cases from the 1980 and 2003 terms by John Roberts. By 
determining which side was asked the most questions he was able to determine the 
winner in 24 of those 28 cases studied. The third is a study by law student Sarah 
Shullman, who atended 10 argument sesions, and recorded information about each 
question asked including the content, the speaker, the level of ?hostility?, and the tone of 
the speaker?s voice. After analyzing 7 cases, Shullman also setled on a ?most questions 
asked? rule that predicted the winner in 6 of the 7 cases analyzed and the 3 held out 
cases. However, as Wrightsman (2008; 133) notes, ?determining what constitutes a 
?question? is not so simple?. For example, Wrightsman (2008; 136) writes, ?interaction 
                                                
5
 Audio transcripts were acompanied by writen transcripts, speaker identifications and manual word-
alignment from the OYEZ project (htp:/ww.oyez.org/) (Yuan and Liberman 208). 
 15 
betwen advocates and justices do not follow in a discrete manner; two justices may 
begin to speak at the same time, a justice may interupt an advocate, and justices may 
make elongated statements that may contain several questions.? From an even more basic 
standpoint, it is not clear whether or not researchers limit questions to interogative 
statements. Without explicitly identifying how questions are to be counted, replicability 
of these sorts of experiments wil be inherently shaky. 
 
Study Cases Acuracy Method Manual 
Grenhouse 16 75.0% Experience yes 
Roberts 28 85.7% Most Questions Asked yes 
Shullman 10 90.0% ost Questions Asked yes 
Wrightsman 24 42% Most Questions Asked yes 
Ruger et al. 68 75% Case metadata no 
Johnson et al. ~2000 66.2%/67.5% Most Questions Asked / 
Words Used 
no 
Table 2 Sumary of previous studies. ?Manual? indicates whether or not the study used manual methods of 
outcome forecasting (the alternative being automatic methods). ?Cases? indicates the number of cases tested 
in the study. 
 
The final study discussed in Wrightsman (2008) was conducted by Wrightsman 
and a student. It examined 24 cases from the October 2004 term, 12 of which were 
identified as ?very ideological? and 12 of which were identified as ?definitely not-
ideological?. For each of these cases they determined whether each justice?s ?overal 
patern of questions? was ?unsympathetic? to a particular side in the case, as wel as the 
number of questions asked of each side. While no definition of ?unsympathetic 
questioning? is provided, they do provide an example of an unsympathetic statement 
from Small v. United States: in arguing for the side of Smal, Justice O?Connor said, 
?Congres thinks about the United States, our country, and if it means to say something 
wil take place in other places in the world, it says so clearly?.  
 16 
While they do not report absolute acuracy values for the ?unsympathetic? 
questioning approach, they do point out that 87% of the unsympathetic comments were 
directed at the losing side in the ideological cases and 69% of the unsympathetic 
comments were directed at the losing side in non-ideological cases.
6
 Perhaps more 
importantly, they report that the ?more questions asked? rule employed by Shullman and 
Roberts led to 42% acuracy. In an atempt to rectify the discrepancy for the ?most 
questions asked? rule, results remained mixed, though a potential patern emerged; 
namely this rule sems to be most useful in ideological cases and least useful in non-
ideological cases. 
 While there has been extensive quantitative study on Supreme Court forecasting, 
computational work has been rather limited with only two studies (Ruger et al. 2002, 
2004 and Johnson et al. 2009a). Ruger et al. (2002, 2004) utilized clasification tres built 
from 6 metadata features for 8 years? worth of Supreme Court cases under Rehnquist 
(658 cases). The metadata used include: 
(1) the circuit of origin for the case; (2) the isue area of the case, coded 
from the petitioner?s brief using Spaeth?s protocol; (3) the type of petitioner (e.g., 
the United States, an injured person, an employer); (4) the type of respondent; (5) 
the ideological direction of the lower court ruling, also coded from the petitioner?s 
brief using Spaeth?s protocol; and (6) whether or not the petitioner argued the 
constitutionality of a law or practice. 
       (Ruger et al. 2004) 
                                                
6
 Though presumably not the case, their method of reporting leaves open the extreme posibility that only 
two cases contained unsympathetic questioning and for those two cases 87% and 69% of the unsympathetic 
questions were directed at the losing side. Of course, if this posibility is open, les extreme scenarios about 
the distribution of the questions are posible. In any case, this does not give a clear picture of the acuracy 
provided by this aproach. 
 17 
The authors argued that each of these features could be identified by a non-expert, and 
indeed al but the 6
th
 feature can be found in the Spaeth database (Spaeth 2009).  
They used the clasification tres to predict cases for the 2002 term prior to the 
case?s decision (68 cases). Finaly, results from their clasification tres were compared 
to those of legal experts including ?71 academics and 12 appelate atorneys?, each of 
whom have ?writen and taught about, practiced before, and/or clerked at the Supreme 
Court?. The model performed with an absolute acuracy of 75%, while experts performed 
at only 58.8% (with results for 10.3% of cases ?inconclusive?). Not reported for this 
timeframe is the proportion of cases decided in favor of the petitioner or respondent. 
However, based on the term they report using and the cases they held out, it appears that 
the Court reversed 69.1% of cases during this period.
7
 
A more recent and much more comprehensive study was conducted by Johnson et 
al. (2009a). This study examines al cases from 1979 to 1995 (?over 2000 hours?), testing 
the ?most questions asked? hypothesis. Two logistic regresion models are created in this 
study, the first utilizing the diference in number of questions asked of each side, and the 
second utilizing the diference in number of words used to discuss the case for each side. 
In addition to these two main features, features are included in each model to control for 
potentialy confounding factors. These include a ?measure of the ideology of the median 
justice on the Court?, the direction of the lower court?s decision, a variable to code the 
interaction of these two previous variables, two variables to code if the Solicitor General 
participated as amicus curiae on behalf of the petitioner and the respondent and two 
variables indicating whether amicus briefs were submited on behalf of the petitioner 
                                                
7
 Note that there is generaly a reversal bias, but that this varies over time. 69.1%, however, is somewhat 
higher than the typical rate of reversal which is closer to 64%-6%. 
 18 
and/or on behalf of the respondent. While each of the ?questions used? and ?words used? 
variables were the least informative variables in each of the models, they report smal, but 
noticeable efects for these two models with, 66.2% acuracy for the question diference 
model and 67.5% acuracy for the word diference model. 
While the results show relatively low acuracy, given that the Court?s tendency to 
reverse cases is around 64%, they do provide information to suggest that in extreme cases 
(>2 standard deviations from the mean diference in questions asked) the probability of a 
case being afirmed ranges betwen 18% and 39%. They report similar correlations with 
the distribution of the diference in words used for each side. Thus these results do 
suggest that despite the conflicting results presented by Wrightsman (2008), there is in 
fact some relevance to the ?most questions asked? hypothesis (and more generaly, a 
?more atention given? hypothesis). However, as is discussed in Chapter 5, we find that 
for our own data set, the ?most questions asked? rule is not predictive across the corpus, 
though, as suggested by Johnston et al. (2009a) it does provide some benefit in the 
extreme cases. 
 Though not explicitly a forecasting study, the work of Johnson et al. (2006, 2007) 
is also closely related to this work. They used Justice Blackmun?s records of the quality 
of arguments by individuals before the Court to examine the relationship betwen quality 
of oral arguments and case outcomes. In addition to Justice Blackmun?s records, they 
atempted to determine if any other factors such as atorney background and justice and 
atorney policy preferences had an impact on the quality of arguments presented to the 
Court. Their findings suggest that when the quality of one side?s oral arguments are 
significantly beter than another?s, the case is more likely to go to the side with the higher 
 19 
quality arguments, and that an atorney?s background may be helpful in determining the 
quality of arguments they wil present. This advantage is as high as a 77.9% chance of 
reverse when the petitioner?s arguments are ?manifestly beter? than the respondents, and 
as low as 34.9% chance of reverse in the converse situation. 
2.5 Spaeth Supreme Court Database 
Much of the work in this thesis utilized the Spaeth Supreme Court Database 
(Spaeth 2009; henceforth Spaeth database). The Spaeth database is a comprehensive 
listing of Supreme Court cases and acompanying variables dealing with the 
?background? of the case (e.g. the origin of the case, the parties involved in the case, the 
isue area), ?chronological variables? including important dates of the case, the identity 
of the chief justice and the natural court, ?substantive variables? such as the isue area of 
the case and the direction of the decision, ?outcome variables? including the winner of 
the case, and ?voting and opinion variables? identifying the votes and opinions isued in 
the case. 
Often cases can involve multiple legal provisions or isues. In these instances, 
multiple listings are provided for each case. These listings separate variables that would 
otherwise be conflated. As suggested in Benesh (2002) we concentrate on the ?case 
citation? listing as we ?[want] to study decisions in the aggregate and [want] to count 
each decision only once.? 
 20 
 
Chapter 3 Sequence Labeling
8
 
 
 
The work contained in this section aims to addres our first objective; to 
demonstrate that conversational paterns exist in Supreme Court oral arguments. This is 
acomplished by constructing a sequence labeling task that identifies speakers from turn 
content. Given a sequence labeling task, if speakers can be identified from the content of 
the turns and increasing the turn history in a model for sequence labeling improves 
performance, it indicates that paterns exist in the turn-taking behavior of Supreme Court 
justices. 
In a typical labeling task the objective is to identify present, but unobservable 
information (hidden variables) from observable information (observed variables). An 
example of a common sequence labeling task is part-of-speech (POS) tagging. In POS 
tagging, the objective is to identify the parts-of-speech (e.g. noun, adjective, preposition, 
determiner, conjunction, etc.) for words in a sentence. Framed as a sequence labeling 
problem, the hidden variables are the POS of each word and, in the simplest case, the 
observed variables are the words. Because the same words in diferent sequences may 
have diferent POS, one usualy wants to make use not only of the words themselves, but 
of sequential information as wel, such as the order of words or the sequence of the 
predicted POSs. Because of this, POS tagging is often approached with graph based 
statistical models that can easily make use both of the features in a sequence (i.e. words) 
and the sequence itself (e.g. DeRose 1988, Laferty et al. 2001, Toutanova et al. 2003).  
                                                
8
 This work was originaly published in Hawes et al. (209). Figures in the folowing Sections are from this 
paper. Other discusion wil either closely coincide with or match the content of this paper. Discusion is 
expanded and details are included to highlight the relevance of this work to this thesis. 
 21 
Similar to POS tagging, we can construct a task where the observable information 
is a sequence of turns, and the hidden variables are the identities of the speaker for each 
turn. Supreme Court transcripts prior to 2004 offer an imediately relevant example, as 
justices were not uniquely identified for these cases.   
3.1 Methods 
Data Preparation 
Though the cases used for each experiment set vary, al experiments share a 
common data preparation approach. Transcripts of oral arguments are posted the same 
day a case is argued in PDF format. Transcription is conducted by the Courtroom 
reporter, Alderson Reporting Company. While details of the transcription proces are not 
given, the character and infrequency of erors would indicate that transcripts are created 
manualy.
9
 For each segment of speech by a single speaker, transcripts contain the 
speaker?s name (i.e. Speaker ID) and the content of the speech segment. For al 
experiments, each segment is treated as one speaker turn and thus the transcript is treated 
as an approximation of the turn sequence during the entire case.
10
 Finaly, transcripts 
contain several non-content items including opening and closing time stamps and headers 
for the oral and rebuttal arguments of each litigant (Table 3). 
 
 
 
 
                                                
9
 For example, typos in speaker IDs (i.e. non-content text) such as JUSTICE KENNY instead of JUSTICE 
KENNEDY, or JUDGE ALITO instead of JUSTICE ALITO. 
10
 Of course, this sequence can only be an aproximation; there is no duration information, only coarse 
overlap information, and other discourse information such as filers (i.e. um) are often disregarded. 
 22 
Symbol Examples 
TIME (11:08 a.m.), 
(Whereupon, at 12:08 p.m., the case in the above-entitled 
mater was submited.) 
START-ORAL ORAL ARGUMENT OF JEAN-CLAUDE ANDRE ON 
BEHALF OF THE PETITIONER, 
ORAL ARGUMENT OF KANON SHANMUGAM ON 
BEHALF OF THE RESPONDENTS 
START-REBUTAL REBUTAL ARGUMENT OF JEAN-CLAUDE ANDRE 
ON BEHALF OF THE PETITIONER 
Table 3 Examples of non-content items from the transcript of the oral arguments from Ali v. Federal 
Bureau of Prisons (06-9130) with the special symbols used to identify these items in our experiments. 
 
 
Al transcript PDFs were converted to XML format using an off the shelf utility, 
followed by custom built automatic cleanup to remove extraneous formating. Cleanup 
code and cleaned transcripts wil be made available at 
http:/ww.umiacs.umd.edu/~twhawes/oralarguments/index.html. 
Corpus Description 
At the beginning of this study the Court?s 2007 term had not yet completed, and 
prior to the 2004 term justices did not have unique speaker IDs. Thus we limited the 
corpus to the 2004-2006 terms. For the sake of consistency, we also filtered out cases that 
followed an atypical format.
11
 For example, those cases that included arguments from 
amici curiae. 
Feature extraction 
From the XML formated cases we extracted the case content including: speaker 
IDs, speaker turn content and non-content items in the transcript. Turns were extracted as 
speaker ID/content pairs. From the content of each turn, we extracted features as shown 
in the Features Section (c.f. Figure 3).  
                                                
11
 Filtered out cases include: 02-1472, 04-1067, 04-473b (Garceti v. Cebalos (Reargued), 04-94, 05-
1342, 05-1575, 05-204, 05-705, 05-746, 05-922, 06-484, 06-5247, 06-5306, 06-593, 105 Orig. (Kansas v. 
Colorado) and 128 Orig. (Alaska v. United States). 
 23 
Labeling 
We extract from each unit x a set  of features, and our models predict the labels 
y
i 
for a sequence, yielding {(
1
 , y
1
 ), . . . , (
n
 , y
n
 )}. The labels y
i
 comprise a set of 15 
symbols: 11 for the justices (one for each), one to represent the lawyers (either on behalf 
of the petitioner or respondent), plus one special symbol for time stamps and two 
additional special symbols to encode the section headings (i.e. START-ORAL and 
START-REBUTAL). 
Figure 1 shows the frequency with which each of the justices spoke across al 
cases in the corpus. Not included are the non-justice parties from each side, who produce 
47.4% of al turns. Also not included are the special symbols, which comprise 2.2% of 
symbols in the corpus. While the Court is only composed of 9 justices at any given time, 
we report 11 in Figure 1 due to changes in court membership, including Robert?s 
replacement of Rehnquist and Alito?s replacement of O?Connor. Because these justices 
do not span this entire corpus, their empirical probability should be lower than that of the 
justices? true tendency to speak during oral arguments (this, in turn, has an impact on our 
experimental results). 
 
 24 
 
Figure 1 Empirical probability of each justice symbol in the corpus (Hawes et al. 209). 
 
Because we are predicting sequential labels from a collection of features, 
conditional random fields (CRFs; Laferty, McCalum, & Pereira, 2001) are a 
straightforward choice for this task. CRFs utilize undirected graphs to model the 
conditional probability of an unobserved sequence of labels (Y) given some observable 
sequence of features (X). CRFs are preferable to Hidden Markov Models (HMs) in 
many sequence-labeling tasks because they relax stringent conditional independence 
asumptions made by generative models. CRFs have been empiricaly shown to work 
wel for a variety of text procesing tasks, including POS tagging (Laferty et al. 2001), 
shalow parsing (Sha & Pereira, 2003), and named-entity recognition in the biomedical 
domain (Setles, 2004). Although the underlying structure of a CRF can take a variety of 
forms, a linear chain of labels (Figure 2) is often asumed for sequence-labeling tasks 
because they alow for eficient inference and decoding using the forward-backward and 
Viterbi algorithms (Sutton and McCalum 2006). Figure 2 corresponds to a first-order 
CRF, which determines probabilities using features at the current label along with the 
 25 
previous label; similarly, a second-order CRF corresponds to a model that determines 
probabilities using features at the current label along with the previous two labels. For 
this work we used the MALET implementation of CRFs (http:/malet.cs.umas.edu). 
 
Figure 2 Diagram of a linear chain of labels, where X
i
 is a group of observed features and Y
i 
is a label 
 
Features 
 The following is a discussion of the features used for this task. Note that an 
additional, contentles feature (T) was also used for every turn in order to ensure that al 
turns had at least one feature in the sequence. 
Unigrams 
 Unique tokens, white space and punctuation separated, were extracted from each 
turn, ignoring stop-words. One feature for each token used in a particular turn was 
included in the feature set for that turn indicating the presence of that token. By including 
unigrams in our feature set, we are esentialy creating a ?bag-of-words? language model. 
 26 
Because this is among the simplest possible approaches for this task, we treat unigrams as 
our baseline feature set. 
Discourse Markers (DM) 
 Al interpretable discourse is composed of discourse relations, which serve to 
connect each unit of discourse. Correct interpretation of these relations is necesary in 
order to correctly interpret a discourse. Because we can safely asume that oral arguments 
are an interpretable discourse (at least for al parties involved) we can infer the presence 
of these coherence relations, not only betwen an individual speaker?s utterance but 
betwen the utterances of separate speakers. Instead of atempting to identify al of these 
relations automaticaly, however, we instead rely on discourse markers, which have 
traditionaly been viewed as overt cues for underlying discourse relations (cf. conjunctive 
cohesive elements, Section 2.2).  
 Both semanticaly and syntacticaly optional, discourse markers are typicaly 
viewed as pragmatic units used to link clauses in a discourse (Schifrin 1987). As overt 
cues of discourse relations, discourse markers are a prime example of conjunctive 
cohesive elements of a discourse. For this task, we compiled a list of approximately 700 
potential discourse markers identified through manual examination of the corpus and in 
the literature (Marcu, 1997; Oates, 2001).
12
 Finaly, we make the simplifying asumption 
that any turn initial string that matches a member of this list is a discourse marker; a 
condition met in approximately 50% of turns. If multiple adjacent discourse markers 
appear at the beginning of the string, al were included. Consider an example from 
                                                
12
 Manual examination of the corpus may be sen as viewing test data prior to testing. The author readily 
admits this list would have idealy ben compiled from out-of-sample documents. However, note that the 
task is to examine the impact of discourse markers, not to identify discourse markers. Because al potential 
discourse markers were included using this method, we view this as paralel to anotations in the test data 
for a task that requires such information. 
 27 
Kansas v. Marsh (Reargued) (2006): ?JUSTICE BREYER: Okay, wel, what do you say 
to ??, from which we extract two discourse markers (italicized). Because the discourse 
marker list is composed of both single and multi-word discourse markers, and because 
the majority of single word discourse markers are also stop-words, there is very litle 
overlap betwen the Unigram feature set and the DM feature set. 
Personal Reference (Ref) 
 Finaly, we included a feature set for references to individuals. This feature set 
included four types of features: justice?s names, honorifics (i.e. ?Your Honor?), second 
person pronouns, a single feature for any justices mentioned, and a single feature for 
every non-justice name.
13
 Instances of these features were identified using simple patern 
matching, which we found to be sufficient for most instances of addres due to the formal 
nature of Supreme Court discourse. Thus, this works wel as a basic model for direct 
addres closely related to that discussed in Jovanovic and Akker (2004). 
However, one should note that as a consequence of using simple patern matching 
and no additional or more sophisticated approaches, al instances of reference are 
included regardles of the referent. While a subset of these references include direct 
references to an individual who either spoke or wil speak in adjacent turns, the direct 
addres feature set also includes references to individuals present, but not currently 
participating in the discourse, and to individuals who are not participating in the 
discourse at al. While each of these diferent clases of ?individual mention? make a 
distinct contribution, each contribution made is potentialy useful in modeling the 
                                                
13
 The second to last feature was included to acount for highly variable mentions of justices that were not 
serving on the Supreme Court during the case. A single feature was used in this final case because of the 
high variability acros cases of non-justice names. Note however, that the majority of these later namings 
within a case typicaly refer to the party curently presenting oral arguments or other individuals involved 
in the case. 
 28 
conversational dynamics of the Court. Because references are typicaly made to someone 
who recently spoke or wil speak (because they have been addresed), for each turn we 
include the reference features from the imediately adjacent turns but not the current 
turn. Approximately 40% of turns contained at least one instance of personal reference. 
Finaly, as with discourse markers, because unigrams are filtered for stop-words and 
contain only single tokens there was litle overlap betwen the direct addres features and 
the unigram features. Figure 3 provides an example of the features extracted from a 
sequence of turns. 
 29 
 
Figure 3 Example of features extracted from a transcript segment. 
 
 
 
Turns from S. D. Warren Co. v. Maine Bd. of Environmental Protection 
(04-1527) 
JUSTICE SOUTER: -- "reinforcing," and maybe it's "changing." I mean, 
you're characterizing it one way. We start with a different canon of 
meaning, and that is that we look to the words around which, in 
connection with which, the word is used. In here, it's being used 
without certain modifiers or descriptive conditions. In other cases, 
it is being used with them. And that's a good reason to think that 
probably the word is intended to mean something different in those 
situations. 
MR. KAYATTA: Well, I would -- I would hesitate, Justice Souter, to 
go from taking a specific word, like "discharge," and, therefore, 
saying that it meant something that is both more general and much 
more easily set. 
JUSTICE SOUTER: No, but your argument, I thought, was simply this, 
that it uses "discharge" in, you know, X number -- I forget how many 
you had -- and it's perfectly clear that in most of those instances 
it requires an addition; and, therefore, it should be construed as 
requiring it here. My point was that in a great many of those 
instances, the statute is not merely using the word in isolation; 
it's using it in connection with a couple of other words, like 
"discharge a pollutant." And it, therefore, number one, makes sense 
to construe "discharge of a pollutant" differently from "discharge." 
That's the -- that's the only point. 
 
Features 
Souter 1: 
Unigrams: cases, word, start, changing, connection, words, 
modifiers, meaning, reinforcing, reason, situations, intended, 
characterizing, good, canon, descriptive, conditions 
Discourse Markers: - 
Direct Address: you 
Kayatta 1: 
Unigrams: meant, discharge, word, set, justice, souter, easily, 
taking, specific, general, hesitate 
Discourse Markers: well 
Direct Address: Justice_Souter, JUSTICE 
Souter 2: 
Unigrams: argument, simply, requires, sense, discharge, construe, 
clear, thought, construed, point, number, great, word, connection, 
requiring, forget, words, couple, addition, differently, perfectly, 
statute, instances, isolation, pollutant, makes 
Discourse Markers: no, but 
Direct Address: your, you 
 
 30 
 
3.2 Experiments 
For our experiments we utilized four combinations of features: 
? Unigrams (Unigrams) 
? Unigrams plus Discourse Marker Features (Unigrams + DM) 
? Unigrams plus Personal Reference Features (Unigrams + Ref) 
? Unigrams plus Discourse Markers plus Personal Reference (Unigrams + DM + 
REF)  
With these features we conducted sequence prediction using both first and second order 
CRFs. 
 Al experiments were evaluated using k-fold cross validation. k-fold cross 
validation is a common evaluation technique wherein data is segmented into any number 
(i.e. k) of non-overlapping subsets of instances, or folds, where k is les than or equal to 
the number of individual instances in the data set. For each subset s
i
 of the k subsets, a 
model is trained on the other k-1 subsets, and then evaluated using s
i
 as a test set. Finaly, 
results from each iteration of testing are combined, typicaly through averaging (as in our 
experiments). We used 10-fold cross validation to evaluate our first-order models and 2-
fold cross validation to evaluate our second-order models.
14
 
Results 
 Results are reported as the F-score for sequence prediction. F-score is the 
harmonic mean of precision and recal. We used an equaly weighted F-score as the 
simplest measure of precision and recal. Figure 4 shows the 10-fold cross validation 
                                                
14
 The choice to use 2-fold cros validation for second order models was based on the significantly longer 
training time for this order of CRF as compared to first order CRFs. 
 31 
results using first order CRFs. We report only those justices who regularly spoke in cases 
during their time on the bench and no other symbols.
15
 Each justice category has been 
annotated with the relative improvement from Unigrams to the Unigrams + DM + Ref 
condition. 
 
 
Figure 4 1
st
 CRF 10-fold Cros-Validation Results. Anotations represent the relative improvement over 
Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 209) 
 
For the Unigrams + DM and Unigrams + Ref conditions we se relative improvement 
over Unigrams for al justices; however, there is variability across justices as to which of 
the two provides the greatest relative improvement. The use of both personal reference 
and discourse markers, in addition to unigrams, provides greater relative improvement 
than al other conditions for each justice. 
 Figure 5 shows the 2-fold cross validation results for second order CRFs. As with 
the first order graphs, justice categories have been annotated with relative improvement 
                                                
15
 Thus we do not report section headers, the TIME symbol, the L symbol or Thomas (who spoke to 
infrequently to model). 
 32 
from the Unigram condition to the Unigram + DM + Ref condition. For al justices but 
Alito and Rehnquist we se a relative improvement in al conditions as wel as a similar 
patern across conditions within justices. The decrease in performance for Alito and 
Rehnquist is to be expected given that these two justices cover the smalest portions of 
the corpus compared to al other justices who speak regularly. Because of this, sequences 
with their symbols appear infrequently across the corpus, and so wil either be les evenly 
distributed throughout cross-validation folds or contain les training data per fold. The 
overal increase in F-score for al other justices (as compared to Figure 4) in al 
conditions indicates that increasing speaker history is, as expected, beneficial in modeling 
justice turn-taking behavior. It would appear that the second-order CRF alows us to 
capture both complex interactions betwen justices as wel as individual justices? 
tendency to continue speaking to a lawyer without interuption from other justices. 
 
 
Figure 5 2
nd
 order CRF 2-fold Cros-Validation Results. Anotations represent the relative improvement 
over Unigram baseline for the Unigram + DM +Ref condition (Hawes et al. 209) 
 
 33 
 Figure 6 contains the overal acuracy for both first and second order CRFs in 
each condition, where acuracy is simply the proportion of correct predictions to the total 
number of predictions. Each bar has been annotated with its relative improvement over 
unigrams for their respective model orders. Eror bars were calculated as the 95% 
confidence interval as computed by the Clopper-Pearson method for infering exact 
binomial confidence intervals (Clopper & Pearson, 1934). The confidence intervals 
indicate that for both first and second order models, the inclusion of discourse markers or 
personal reference features provides a significant improvement over unigrams alone, 
though these two conditions are not significantly diferent from each other. However, the 
inclusion of both feature sets does provide a significant improvement over both of these 
conditions for both first and second order models. 
 
 
Figure 6 Overal acuracy of first and second order CRFs. Bars are anotated with the relative 
improvement over Unigram baseline. Error bars are the 95% confidence interval as calculated by the 
Cloper-Pearson method for infering exact binomial confidence intervals. 
 
 34 
Discusion 
 Interestingly, these results show that the inclusion of features such as discourse 
markers and instances of personal reference do add information that help in identifying 
who was speaking when in a discourse. While the results are considerably lower than the 
acoustic approach to speaker identification of Yuan and Liberman (2008) it should be 
noted that while our tasks are related, they are also distinct. Their work focuses on the use 
of acoustic diferences in individuals? speech and how this can be applied to speaker 
identification in acousticaly complex environments. In contrast, our work aims to 
understand the turn-taking paterns of justices in the Supreme Court through the 
relationship betwen turn content and turn organization, and we use speaker identification 
as a task to gauge our progres towards this goal. These results provide significant 
improvement over a unigram baseline model, and we se significant improvement from 
first order models to second order models. This indicates the existence of high-level 
paterns in justice turn organization during Supreme Court oral arguments. 
Though we are looking for positive results with our work, we are also looking for 
tools to help legal scholars. How then, might these results or this work in general be used 
as such? The fact that we have identified predictable paterns in turn-taking may be of 
interest to legal scholars. Though they may have had such an intuition about the Court 
(perhaps, noting that there is a pecking order amongst the justices, with the chief justice 
at the top, followed by the other justices organized by seniority), these results make this 
fact explicit. Additionaly, the work presented here is a novel approach for understanding 
the Supreme Court. By utilizing these methods, legal scholars wil have new tools for 
addresing questions about the Supreme Court, and a variety of new questions. 
 35 
 
Chapter 4 Visualizing Dynamics 
 
This chapter addreses the second goal of this thesis, to demonstrate that the 
paterns indicated in the previous chapter can be asociated with case outcomes. To 
acomplish this, we explore the relationship betwen turn-taking paterns during oral 
arguments and case outcomes, via a multi-dimensional charting technique. We created 
charts for sets of cases belonging to a variety of outcomes and case conditions, and 
examine the relationship betwen justices? voting record and their turn-taking behavior in 
these conditions. By comparing these charts we create a picture of the relationship 
betwen the voting and conversational behavior of justices. 
In this Chapter, as wel as the next, we deal with justices? ideology. This is often 
discussed throughout the media, and often held as common knowledge. However, there 
have been a number of studies quantitatively examining the ideology of justices. For 
example, Martin-Quinn scores estimate the ?ideal point? (i.e. a point on an atitudinal 
scale, in this case ideology) for each justice (Martin and Quinn 2002). Martin-Quinn 
scores are regularly published at http:/mqscores.wustl.edu/measures.php. On the Martin-
Quinn scale, negative numbers indicate a liberal ideology while positive numbers indicate 
a conservative ideology. Table 4 summarizes the mean Martin-Quinn score for the 
justices for the thre years covered in our selection of cases. 
 
 
 
 36 
Justice Martin-Quinn score 
Thomas 4.37 
Scalia 2.75 
Alito 1.63 
Roberts 1.6 
Kenedy 0.41 
Breyer -1.41 
Souter -1.51 
Ginsburg -1.54 
Stevens -2.4 
Table 4 Mean Martin-Quin scores for the 205-207 terms. Note, negative scores indicate a liberal 
ideology and positive scores indicate a conservative ideology. The higher (lower) the number the more 
conservative (liberal) the ideal point is. 
 
 
4.1 Methods 
Corpus description 
While the source and format of documents for this corpus is the same as that in 
Chapter 3, we selected a diferent timeframe. For this work, transcripts corresponding to 
cases from the February 2006 argument sesion (2005 Term) through the April 2008 
argument sesion (2007 Term) were collected. This selection of cases represents a 
?natural court?, a period of time during which the same 9 justices were in office with no 
changes in court membership. These justices include Chief Justice Roberts, Justice 
Stevens, Justice Souter, Justice Ginsburg, Justice Kennedy, Justice Thomas, Justice Alito, 
Justice Scalia and Justice Breyer. By using a natural court, we avoid potentialy 
eroneous factors introduced by changes in court membership. Additionaly, it increases 
our chances of avoiding the case where significantly les data is available for an 
individual justice due to factors external to that justice?s behavior. While it would have 
been preferable to use more data, there is no longer natural court after the 2004 term; 
before then individual justices were not uniquely identified in argument transcripts. Of 
 37 
the 179 cases argued during this period, 11 were held out due to inconsistencies in the 
database used for labeling each case.
16
 
Case Segmentation 
Cases were segmented into sequences of speaker labels. Each sequence was then 
divided into ?speaker trigrams?. Those familiar with the traditional view of trigrams, wil 
recognize our interpretation of speaker trigrams. A speaker trigram is S
ii+1
S
i+2 
where S
i
 
is the speaker of the i
th
 turn in the sequence (Manning and Sch?tze 1999). Figure 7 
contains some example turns from the corpus (truncated for brevity), along with the 
sequence extracted from these turns and the resulting trigrams. We then obtained the 
count for each trigram across al cases and for al cases in each one of several conditions 
from the Spaeth database (e.g. direction of case decision, direction of Alito?s votes, vote 
split, etc.). 
 
                                                
16
 Held out cases include: 04-607, 05-204, 05-259, 06-1265, 06-166, 06-618, 06-7517, 07-290, 07-30, 
07-77 and 06-134 (New Jersey v. Delaware) 
 38 
 
 
Figure 7 Sequence of truncated turns, the sequence extracted from these turns and the resulting trigrams. 
 
Labeling description 
Labels were created using the Spaeth database. We experimented with variables 
along several dimensions including the direction of individual justices? and the Court?s 
decision in cases (liberal/conservative) and the Court?s vote split (5-4, 9-0, 8-1, etc). 
While we discuss only a sampling of charts in this chapter, al charts with greater than 10 
cases for each variable value are included in Appendix A. In the Sections that follow we 
wil cover the Vote Split (VOTE) variable, which contains the distribution of votes for a 
case, the Direction (DIR) variable, which contains the ideological direction of the case 
outcome and Justice Direction variables (JDIR) which contain the ideological direction 
of each justice?s vote in a particular case. 
From Snyder v. Louisiana (06-10119) 
CHIEF JUSTICE ROBERTS: Even though -- even though you?re theory? 
MR. BRIGHT: Oh, no. 
CHIEF JUSTICE ROBERTS: -- that this jury did not return a? 
MR. BRIGHT: No. Let me -- let me make this quite? 
CHIEF JUSTICE ROBERTS: Thank you, Mr. Bright. Mr. Boudreaux? 
ORAL ARGUMENT OF TERRY M. BOUDREAUX ON BEHALF OF THE RESPONDENT 
MR. BOUDREAUX: Mr. Chief Justice, and may it please? 
JUSTICE SCALIA: As to life imprisonment or as to the? 
MR. BOUDREAUX: As to life imprisonment, Your Honor? 
JUSTICE SCALIA: Where is this? I -- 364? Show me -- 
MR. BOUDREAUX: Beginning at 364 of the joint appendix? 
 
Extracted Sequence: 
ROBE L ROBE L ROBE START-ORAL L SCAL L SCAL L  
 
Trigrams: 
ROBE L ROBE, L ROBE L, ROBE L ROBE, L ROBE START-ORAL, ROBE START-
ORAL L, START-ORAL L SCAL, L SCAL L, SCAL L SCAL, L SCAL L 
 39 
The Rose Charts 
Though radial plots have been explored extensively, use of radial plots for the 
visualization of sequential paterns and asociated variables is a novel application of this 
layout (Draper et al. 2009). The outer ring of our diagrams (the petals in our terminology) 
is related to polar plots discussed by Draper et al (2009), while the inner ring is a pie 
chart. Because these charts are a novel application of radial layouts, we include the 
following technical description. For an explanation of how to interpret the charts, proced 
to the Results Section (Section 4.2). 
For each justice (except Thomas, again because of his infrequency of speaking) 
we created charts for al trigrams ending with that justice (i.e. al trigrams represented in 
a chart must end with the same S
i+3
, where S
i+3
 is a justice). By concentrating only on 
those trigrams that end with the same justice, we can concentrate on turns that can be 
asociated with ?choice? on the part of that justice (i.e. the choice of that justice to speak 
after the speakers in the first and second positions in the trigram). We interpret this 
?choice? as the choice to interact with or pay atention to previous speakers. However, 
this is not necesarily the case; for example, these turns may arise if the justice is 
atempting to change the topic, and thus not paying atention to the previous speakers in 
the usual sense. Secondly, we chose to concentrate only on ?typical? trigrams; because 
the vast majority of trigrams are of the form JUSTICE LAYWER JUSTICE or LAWYER 
JUSTICE LAWYER, al trigrams that did not have a lawyer in the second position were 
filtered out. 
The center of each chart contains a pie graph representing the proportion of times 
the justice in the third position also spoke in the first position (i.e. S
i
 = S
i+3
;
 
?held the 
 40 
floor? after the lawyer?s turn).
17
 Each of the outer petals represents one of the other 
justices that spoke in the first position (i.e. al other S
i
). The width of each outer petal 
represents the frequency of each turn sequence normalized by the number of times S
i
 
spoke, relative to the other petals. Thus, if the justice in the center devotes equal atention 
to al other justices (e.g. that justice follows-up on the same proportion of the turns 
produced by each other justice) al petals wil have equal width. Because this looks at the 
proportion of turns rather than the count, the petals would be of equal width even if the 
frequencies of the sequences they represent are diferent. Petal radius represents the 
proportion of time with which two justices voted together, where shorter petals indicate 
the justices have more similar voting records than justices with longer petals. The inner 
dotted ring indicates 100% matching votes, and the outer edge of the chart area indicates 
100% mismatch. Each object in the chart (petals and the pie graph) are colored on a 
gradient acording to the proportion of cases in which that justices voted liberaly or 
conservatively in the given category (i.e. that justices exhibited ideology), where white 
(blue in color versions) is liberal and gray (red in color versions) is conservative. We use 
counts of votes rather than Martin-Quinn scores because of the high variability of 
conditions chosen and because we want to represent the ideology within each condition. 
Note that because the range varies from condition to condition and because the range can 
often be quite narow, the gradient is calculated within a condition, thus, a justice?s color 
may vary from condition to condition. Finaly, each petal is annotated with two values. 
The percent on the top, which is also in bold, is the width of the petal, while the percent 
                                                
17
 We take the idea of ?holding-the-flor? beyond the typical interpretation of maintaining control of a turn, 
to al instances where a speaker continues to produce turns after a single interceding turn from another 
speaker. 
 41 
on the bottom represents the proportion of times that n-gram occurred compared to al 
other petals. 
By representing turn-taking information in this way we hope to be able to capture 
broad paterns of the justices? turn-taking behavior. If we compare charts for diferent 
values within a condition, paterns may emerge that indicate a relationship betwen the 
values of that condition and a justice?s behavior. For example, if we compare the turn-
taking behavior of a justice when his or her vote is liberal to when the vote is 
conservative, and we note that a petal for a particular justice is short and narow for 
liberal votes but long and wide for conservative votes, this could indicate that the justice 
in question has a greater tendency to follow-up on the particular justice of that petal in 
conservative cases. Furthermore, when the petal is long and wide, we may hypothesize 
that many of those follow-ups in some way chalenge the justice of the petal since the 
length of the petal indicates the level of disagrement in the cases? outcomes. 
4.2 Results 
How to read the charts 
Some of the paterns we discuss wil be relevant either to wings of the Court or to 
justices from those wings. In these cases we wil treat Kennedy, the swing justice, as 
irelevant to these paterns. Additionaly, we wil identify speculative explanations for 
these paterns with italic text at the end of an observation. 
Take, for example, Figure 8 ?Stevens ? Rose Diagram of Al Cases?. This chart 
contains al cases from our dataset. Because this chart is for Stevens, we find a pie chart 
in the center labeled Stevens, which indicates Stevens tends to ?hold the floor?, i.e. 
speaks again after an initial turn directed at the lawyer, ~75% of the time (signified by 
 42 
the area filed in for the pie chart). It also shows that his voting record is one of the most 
liberal for this set of cases at, ~ 31% conservative votes (indicated by the color gradient).  
 
 
Figure 8 Stevens - Rose Diagram of Al Cases 
 
 
As discussed above, the outer petals represent al turn sequences in the dataset of 
the patern JUSTICE
1
 LAWYER JUSTICE
2 
(J
1
 L J
2
) where J
1
 ! J
2
, and in this case J
2
 is 
Stevens. Thus, the petal labeled Kennedy represents al turn sequences of the form 
Kennedy Lawyer Stevens. The labels for this petal indicate that Stevens follows Kennedy 
17.9% of the time when Stevens is not ?holding the floor? and that the normalized 
proportion of this sequence is 21.6%. For Scalia, the relationship betwen these values is 
reversed, with the normalized proportion much lower than the unnormalized proportion. 
This indicates that while Stevens follows up on Scalia more often than he does Kennedy, 
 43 
he does so on a smaler proportion of the turns produced by Scalia as compared to 
Kennedy. Finaly, comparing the length of the Kennedy petal to the others, we se that 
Stevens votes with Kennedy les often than the liberal justices but more often than the 
conservative justices.  
Looking at the outer petals we can make a number of generalizations, several of 
which are covered here in a top down fashion: 
? Stevens has a greater tendency to follow-up on Kennedy, Scalia, Alito and 
Roberts (the justices he least often votes with) as a group than he does Ginsburg, 
Breyer and Souter (the justices he most often votes with). 
? Holding Kennedy out as the swing vote, Stevens?s interaction is much more 
evenly split betwen the conservative and liberal wings of the Court, with only 
slightly more follow-ups on justices he agres with les often than ones he does 
agre with (40% vs. 38.3%). Thus, this indicates a somewhat disproportionate 
amount of atention given to Kennedy. This may indicate that Stevens more often 
treats Kennedy as a ?potentially persuadable justice?, spending more time trying 
to convince him than other justices. 
? While the normalized proportion is fairly evenly spread out betwen the 
conservative justices in this chart, for the liberal justices, atention is skewed 
towards Ginsburg (18.3% towards Ginsburg vs. 9.9% and 10.1% towards Breyer 
and Souter). This may indicate regular cooperation betwen Stevens and 
Ginsburg.  
? Of al justices Stevens is most likely to follow-up on Kennedy, at 21.6%, followed 
by Ginsburg at 18.3%. 
 44 
? Finaly, Roberts and Scalia both have much higher absolute percents compared to 
the relative percents, indicating that Stevens is les likely to follow-up on one of 
their turns despite a larger number of opportunities, indicating a greater 
proportion of turns go ignored from these justices. 
? The absolute percent is much lower than the scaled relative percent for Alito, 
indicating a stronger tendency for Stevens to follow-up on Alito given the 
opportunity as compared to other justices, indicating Alito?s turns are les often 
ignored as compared to Roberts and Scalia. These last two observations together 
may indicate a tendency to argue with Alito more often than other justices in the 
conservative wing. 
Vote Split Condition (VOTE) 
The VOTE variable in the Spaeth database indicates the distribution of the 
justices? votes (e.g. 5-4, 8-1, 9-0, etc.). Using this variable, we can test our intuitions 
about the sorts of paterns the charts wil exhibit because we have wel defined 
expectations for several features of the graph in this condition. 
Figure 9, Kennedy ? Rose Diagrams for 5-4 and 9-0 split cases, exhibits several 
paterns we would expect: 
? 9-0 cases have maximal agrement betwen the justices; logicaly, if their 
decisions were unanimous then their votes always match. 
? In 9-0 cases, justices always exhibit the same ideology. Their votes always match, 
thus their decisions have the same ideological direction. 
? In 5-4 cases, Kennedy shows relatively high levels of disagrement with al 
justices, but slightly more agrement with conservative justices than with liberal 
 45 
justices. We expect this patern given that Kennedy is a slightly conservative 
swing justice, often casting the deciding vote in narowly decided cases. 
? In 5-4 cases, Kennedy exhibits an ideology in the center of the gradient while the 
other justices exhibit ideologies along the extremes of the gradient. This is what 
we would expect if Kennedy is the median justice and the other justices typicaly 
vote along their ideology in narowly decided cases. 
? Finaly, in 5-4 cases, the petal width for Alito is very narow, both compared to 
the other justices in 5-4 cases and compared to Altio?s petal in 9-0 cases. Also, 
Alito has the shortest petal in 5-4 cases. This may indicate that Kennedy tends to 
avoid interaction with the justice whose viewpoint is closest to his in narrowly 
decided cases. 
 
 
Figure 9 Kenedy ? Rose Diagrams for 5-4 and 9-0 split cases 
 
 
 This pair of diagrams confirms our intuitions about the agrement and ideology 
paterns we expect to se when they are logicaly predictable. Additionaly, the last bullet 
 46 
point demonstrates the sorts of paterns that we can find when comparing levels of 
interaction across values in a condition.  
Direction Condition (DIR) 
The DIR variable in the Spaeth database indicates the ideological direction of a 
case?s outcome. The ideological direction of a decision is determined based on the parties 
involved in the case and the isue area of the case acording to the rules outlined in the 
Spaeth database documentation. Ideological direction is either liberal or conservative 
except in rare circumstances when no appropriate ideological direction can be 
determined. Below we discuss thre diagram pairs in the DIR condition. In al charts, 
conservative decisions are on the right and liberal decisions are on the left. 
Several observations can be made in Figure 10, Alito - Rose Diagrams for the 
DIR Condition (Alito is a conservative justice):  
? When the eventual outcome of the case is conservative, Alito follows up on the 
liberal wing more frequently than when the outcome is liberal. This suggests a 
greater level of interaction via the lawyer betwen Alito and the liberal wing of 
the Court in cases that are eventualy decided conservatively.  
? There is les interaction betwen Alito and the conservative justices when the 
outcome is liberal as opposed to conservative. It should be noted that this is not 
the logical converse of the previous observation as the presence of a swing 
justices alows for changes in only one wing across a condition. These two 
observations may indicate a slight tendency to argue more with justices that Alito 
disagres with in cases where the outcome is likely to be against Alito?s ideology. 
 47 
? These charts indicate an increase in interaction with Kennedy when the eventual 
outcome of the case is liberal. For example, it is reasonable to assume that in any 
given case, each justice (in this instance, Alito) wil have a fairly accurate 
expectation regarding the eventual outcome of the case. So, if Alito suspects that 
the eventual outcome of the case wil be liberal (and especially if the case is likely 
to be split), Alito is likely to sek the support of Kennedy as a swing vote, which 
may likely be indicated as a higher degre of interaction. 
  
 
Figure 10 Alito - Rose Diagrams for the DIR Condition. 
 
 In the DIR condition for Ginsburg (Figure 11), we note the opposite basic paterns 
to those of Alito (Ginsburg is a liberal justice):  
? In conservative cases we se a higher level of interaction with the liberal wing and 
a lower level of interaction when compared to liberal cases. 
? We also se more interaction with Kennedy in conservative cases than liberal 
cases.  
 48 
However, since Ginsburg and Alito are from opposing wings for the Court, these 
paterns can be used to form a single generalization. Namely, when the eventual outcome 
of a case is in opposition to the justice?s general ideology, there is increased interaction 
with that justice?s own wing, and decreased interaction with the opposing wing, as 
compared to cases when the outcome is inline with the justice?s ideology. This patern is 
observed for 5 of the 7 applicable justices (Kennedy excluded for the reason above and 
Thomas because he rarely speaks). Similarly, when a case?s eventual outcome is against a 
justice?s ideology, more interaction with the swing justice is observed than when the 
eventual outcome of the case is inline with the justice?s ideology. 
 
 
Figure 11 Ginsburg - Rose Diagrams for the DIR Condition. 
 
 In the above cases, Kennedy was treated as irelevant to the paterns under 
discussion because he is the swing justice. Despite this, we can stil make observations 
regarding Kennedy?s interaction with the other justices. Figure 12 contains the DIR 
condition charts for Kennedy. 
 49 
? Kennedy is more consistent than the previous justices we have discussed, when 
looking at his interaction with wings of the Court. He has only slightly higher 
interaction with the liberal justices in liberal cases and conservative justices in 
conservative cases. We might expect this from a swing justice. 
? For each value in the DIR condition, for Kennedy there is a decrease in the 
proportion of follow-ups to the most liberal justice in that condition. That is, 
Stevens is the most liberal justice in cases with a conservative outcome while 
Ginsburg is the most liberal justice when the outcome is liberal; we se that 
Kennedy interacts with Stevens les when the outcome is conservative (i.e. he is 
the most liberal justice in conservative cases) and les interaction with Ginsburg 
when the outcome is liberal (i.e. she is the most liberal justice in liberal cases). 
This could indicate a reluctance to get involved with the most extreme (liberal) 
viewpoint during a case. 
 
 
Figure 12 Kenedy - Rose Diagrams for the DIR Condition. 
 
 50 
Justice Direction (JDIR) 
 Similar to the DIR condition, the JDIR condition has two primary values, liberal 
(L) and conservative (C); however, unlike DIR there is one JDIR value for each justice. 
So, ALTODIR (Alito?s Direction) identifies the ideological direction of Alito?s vote in a 
particular case. Note that no variable named JDIR appears in the Spaeth database, which 
instead contains one variable for each justice. We are simply using the name JDIR as 
shorthand for these variables. While other comparisons are possible, below we 
concentrate on charts comparing justices within their own JDIR condition. That is, for 
Alito we only present ALTODIR, for Breyer we only present BRYDIR, etc. 
Figure 13 presents the two values for Alito in the ALTODIR condition. Note that 
because this is the ALTODIR condition, we expect that Alito wil be on the extreme end 
of the ideology gradient in this case group (logicaly, if the value is conservative in the 
ALTODIR condition, 100% of the votes from Alito for that value wil be conservative). 
We note several features in Figure 13 that may be interesting: 
? First, when Alito?s vote is liberal, there is a high level of agrement amongst the 
justices signified by the relatively tight radius of the outer petals. This indicates 
that Alito typicaly votes liberaly only when most of the Court does so. 
? When Alito?s vote is liberal, we se a decrease in turns following the conservative 
justices and a slight increase in vote disagrement betwen these justices as 
compared to when Alito?s vote is conservative. This may indicate Alito has a 
tendency to follow-up more often with people who he agres with. 
? For individual justices, we se some diferences in the liberal wing. Though there 
is litle change for Ginsburg and Stevens, we se notable changes in the relative 
 51 
frequency when following Breyer (a decrease from the conservative to liberal) 
and Souter (an increase from conservative to liberal). 
? We also note that the relative frequency of follow-ups on Kennedy shows a 
considerable increase from conservative to liberal. Since Alito?s record is more 
moderate than the rest of the conservative wing, this could suggest that Alito has 
more to discuss with the swing justice in particular when their interpretation of a 
case most closely aligned. 
 
 
Figure 13 Alito - Rose Diagrams for the ALTODIR Condition. 
 
Figure 14 contains the charts for Souter in the SOUTDIR condition. As in the 
DIR condition, it wil be helpful here to look at things in terms of whether or not the vote 
matches the center justice?s usual ideological direction, and whether other justices are 
from the same wing or the opposing wing (Souter is a liberal justice). 
? Compared to Alito voting against his usual direction, we se a higher level of 
disagrement when Souter is voting against his direction. This indicates Souter?s 
 52 
conservative votes may be les closely related with conservative outcomes from 
the Court. 
? As before, we se a slight increase in the normalized proportion of turns folowing 
justices from the same wing as the justice in the center when the case is against 
his typical direction (i.e. conservative). 
? We also se a slight increase in the number of turns directed at the opposing wing 
when the outcome is against his usual direction. 
? There is a decrease from conservative to liberal for turns following Ginsburg but 
an increase for turns following Stevens. We also se a fairly large decrease from 
conservative to liberal for Roberts and fairly smal increases for Alito and Scalia. 
These variations for individual justices likely suggest much more complex 
relationships betwen these justices. 
? Finaly, we also se a relatively smal increase from C to L for Kennedy, 
indicating relatively even amounts of atention given to Kennedy for both 
outcomes. Perhaps this indicates that Souter doesn?t use increased attention as a 
means of convincing another justice. 
 
 53 
 
Figure 14 Souter - Rose Diagrams for the SOUTDIR Condition. 
 
Unlike the two examples above, Kennedy?s chart is fairly consistent with respect 
to the normalized proportions for each wing; however, we do stil se smal but 
potentialy interesting diferences betwen the two charts. 
? When Kennedy?s eventual vote is liberal, there is a slightly higher relative 
frequency of turns following liberal justices as compared to when his vote is 
conservative (the converse being true for conservative justices). This suggests that 
Kennedy devotes slightly more attention to whichever wing he is likely to agre 
with.  
? It is also worth noting that for the conservative justices this diference primarily 
comes from a diference in the relative frequency of turns following Roberts, 
while for the liberal justices the diference is primarily distributed across 
Ginsburg, Breyer and Souter, with Stevens showing only a minimal change. 
 54 
 
Figure 15 Kenedy - Rose Diagrams for the KENDIR Condition. 
 
4.3 Discusion 
The charts and observations above are a sampling of the sorts of general 
conversational paterns that can be observed for individual justices and the Court given 
outcome conditions that are of interest to legal scholars. For example, we saw a tendency 
of some justices from both wings to exhibit similar paterns to their respective opposing 
wings both in DIR and JDIR conditions. This suggests that there are paterns of turn-
taking that can be asociated with case outcomes, positively addresing the second point 
of this thesis. 
 Though we have only ofered speculative explanations for these paterns, legal 
scholars should find that this sort of analysis could aid in the confirmation or discovery of 
paterns in the interactions of Supreme Court justices. Here we concentrated only on a 
particular subset of justices, outcome variables, and turn-taking paterns. While the 
appendix contains al justices for the conditions discussed above and several more 
outcome variables, there is no reason that these charts need to be limited to these 
 55 
conditions. For example, it may be interesting to compare cases where a justice wrote a 
disenting opinion compared to cases in which that justice did not. Or, one may wish to 
look at how paterns vary for certain case variables such as the lower court?s direction, or 
combinations of variables such as unanimous conservative decisions. 
The rose diagrams are also a novel application of radial layouts that can be used 
as a new tool for legal researchers when exploring the behavior of the Supreme Court. 
This approach is not limited to this particular patern (i.e. J
1
 L J
2
, where J
2
 is held 
constant in the chart) of interaction either. There are numerous avenues for future 
research. For example, L could be broken down into petitioner and respondent or 
conservative party and liberal party.
18
 If we are not particularly concerned with ?choice? 
we may want to look at paterns that share a common J
1
 or simply paterns that share a 
common justice in any position. The primary limiting factor in this sort of analysis is 
ensuring that one has enough cases for a good sampling of paterns. This was the primary 
reason we used a patern that includes an additional individual betwen the two justices. 
Shorter paterns that include two justices are fairly rare, and longer paterns are sparser. 
However, with a careful selection of cases and relaxation of conditions one may stil find 
that some paterns of this form can be examined as wel. 
                                                
18
 Where ?conservative party? would indicate that a decision in favor of this party is a conservative 
decision, and vice versa for ?liberal party?. 
 56 
 
Chapter 5 Vote Prediction 
 
 
This Section describes our final set of experiments which build upon the insights 
revealed by the rose diagrams in the previous Chapter, examining vote prediction using 
turn sequences. If we can use turn-taking to forecast case outcomes, we wil have 
demonstrated the validity of the third main point of this thesis; that the asociation 
betwen turn-taking paterns and case outcomes is predictive. Before discussing the 
approach, experiments and results, we wil first briefly discuss our findings regarding the 
?most questions asked? method discussed in Chapter 2.  
5.1 Prior approaches 
We wil first discuss our atempts to replicate results for the ?most questions 
asked? rule discussed by Roberts, Shullman and Wrightsman (Wrightsman 2008), as wel 
as Johnson et al. (2009a). While these projects leave the term ?question? undefined, two 
reasonable interpretations exist. We could take question literaly as any interogative 
statement, which in the transcripts are usualy identified with a question mark at the end. 
This sidesteps some of the isues discussed in Wrightsman, as transcription typicaly 
includes only one question mark per complete question, with no markings at the end of 
interupted questions. However, we can also broadly define ?question? as al statements 
produced by a justice. Though not the typical interpretation of what a question is, this 
sems to met the typical treatment of turns produced by justices both as indicated in 
transcripts prior to 2004, which label the majority of Justice turns as ?QUESTION?, as 
 57 
wel as Wrightsman?s example statements and Johnson?s discussion of ?atention given to 
a side?. We explore both here. 
Lacking the training data and some of the features used by Johnson et al. (2009a) 
we wil use a simple rule based approach. We simply identify al questions in a case, take 
separate counts for each side and asign a ?win? label to whichever side was asked the 
most questions. Following the lead of Johnston et al. (2009a), we can also apply both 
approaches to diference in questions asked and to diference in words directed at each 
side. By using a word based approach we again reduce the concerns about the definition 
of a ?question?. However, this does introduce other isues, such as the definition of a 
word (e.g. compounds, counting speech erors, contractions, etc.). To simplify maters, 
we take a word as anything separated by white spaces and word external punctuation 
(where characters such as apostrophe (?) and hyphen (-) are word internal punctuation). 
Table 5 summarizes the results from these experiments.  
 
Aproach Acuracy 
Most Questions Asked (by turn) 56.8% 
ost Questions Asked (by ?s) 56.8% 
Most Words Used (by turn) 51.5% 
ost ords Used (by ?s) 53.8% 
Table 5 Comparison of ?most atention given? aproaches with varying interpretation of ?question?. ?By 
turn? indicates that we count each turn as a ?question?. ?By ?s? indicates we counted ?s in the transcribed 
justices? spech, usualy indicating an interogative statement. 
 
 
As is clear, with this particular set of cases, no benefit is gained from a ?most 
atention given? approach. As with most time periods, the majority of cases were 
reversed in this time period, creating a 65.6% most frequent outcome baseline which 
these approaches fail to met. While interpreting ?questions? as interogatives 
 58 
outperforms a turn based interpretation of questions on a ?most words used? approach, no 
diference was found for the ?most questions asked? approach. Moreover, the ?most 
questions asked? approaches outperformed both ?most words used? approaches. 
 Stil, one could argue that the continued discrepancy over the power of a ?most 
questions asked? rule is a problem of sample size. In the case of the smaler manual 
studies, high acuracy may simply be atributed to a favorable sample selection. For the 
larger study, the distribution of questions compared to case outcome provided by 
Johnston et al. (2009a) is unambiguous, and clearly demonstrates that at least in the 
extreme cases this rule does appear to be valid. Models trained on a larger sample wil 
have a more representative distribution of these extreme cases. In fact, like Johnson et al. 
(2009a), if we asign labels based on the ?most atention given? rule for extreme cases 
and use the majority clas for the rest we do get similar acuracy. Results provided in 
Table 6 are for cases in which the diference in number of questions or words addresed 
to a side is more than 2 standard deviations from the mean. 
 
Aproach Cases Acuracy 
Most Questions Asked (by turn) 8 87.5% 
ost Questions Asked (by ?s) 7 75.0% 
Most Words Used (by turn) 6 83.3% 
ost ords Used (by ?s) 6 60.0% 
Table 6 Comparison of ?most atention given? rule for extreme cases (i.e. diference in words or questions 
is > 2 s.d. from the mean). The ?Cases? column indicates how many cases met this criterion. 
 
 
Because ?extreme cases? are simply those that have diferences in atention 
(measured by word or turn counts) given to a side more than two standard deviations. It 
may be possible to identify these cases in advance by examining the distribution of prior 
 59 
cases and determining whether or not the diference in atention given for each new case 
is within or outside two standard deviations for the distribution of previous cases. 
5.2 Forecasting votes 
 In our discussion of forecasting oral argument transcripts atention must be given 
to both the sorts of features used and the outcomes that we are forecasting. We focus on 
using features that are easily extracted automaticaly, with litle to no human input. 
Instead of concentrating on the content of the oral arguments, we concentrate on the 
conversational dynamics of the justices and lawyers involved in a case, as a function of 
their turn-taking behavior. While the content of justices? and lawyers? turns is very likely 
informative about a case?s outcome, several factors make it dificult to utilize content 
with automatic methods. First, because the transcripts are composed mostly of 
spontaneous conversation, performance of existing natural language procesing 
techniques such as parsing and even POS tagging is considerably lower than in tasks 
where the input is writen text or even prepared speeches. Second, while features 
explored in some manual forecasting approaches such as ?hostility? and ?sympathy? are 
certainly present in the content, these features are also not wel defined and not easily 
identified using computational methods. Those features that are somewhat more easily 
identified, such as topic area, vary widely from case to case. This makes it dificult to 
find a relationship betwen these easily identified features and the cases outcome. 
Finaly, as we have shown above, because simple turn based ?most questions asked? or 
?most words used? are limited to extreme cases, their recal (in this instance the 
proportion of correct predictions to the number of cases) wil be low despite high 
 60 
precision (in this instance, the proportion of corect predictions to the total number of 
predictions). 
 One important consideration when predicting case outcomes is deciding just what 
outcome one wants to predict. The most obvious choice, and the one most often chosen in 
previous prediction tasks, is whether a case wil be afirmed or reversed. There are, 
however, other potentialy relevant options to choose from. For example, justices are very 
rarely spoken of in terms of their tendency to afirm cases. Typicaly, when examining 
justice?s voting records, one wants to speak of justices in terms of the direction of their 
ideology; either liberal or conservative. While the vast majority of cases are either 
afirmed or reversed, typicaly each of these decisions is liberal or conservative as wel. If 
the most relevant dimension for discussing justices is the direction of their ideology, then 
it sems fair to at least consider prediction of case outcomes along this dimension as wel. 
For these reasons, conservative vs. liberal was the primary outcome feature we 
concentrated on. 
 However, as one would expect, conservative and liberal outcomes do not occur 
with equal probability, and so the baseline for such a condition is not 50%. However, we 
can achieve a 50% baseline by spliting cases and then viewing outcomes as a win or lose 
variable for each side of the case. We explore this outcome in our third experiment. 
5.3 Methods 
Corpus Description 
We use the same corpus as used for the rose charts, described in Section 3.1. 
 61 
Turn Distribution 
As with the sequence prediction task in Chapter 3, from each case we extracted 
speaker IDs and meta-symbols from the transcript. As before, litigants were reduced to a 
single symbol (reported here as L). To conserve space when reporting tables, justices are 
identified by the first four leters of the justice?s last name (Table 7).  From each 
sequence we then counted al turn 4-grams. Since the objective of this experiment is to 
leverage justice interaction as a means for predicting case outcomes, we don?t want the n-
grams to be too short. If the n-grams selected are too smal we risk losing information 
about the interaction betwen justices (as the typical sequence of speakers is Justice, L, 
Justice, L,?.). If the n-grams are too long, however, we begin to face sparsenes 
problems, since the larger n gets the more variability there is and thus the lower the 
counts wil be. Thus 4-grams semed to be the ideal selection. 
 
Speaker Symbol Count 
Non-justice party L 19840 
Chief Justice Roberts ROBE 3890 
Justice Stevens STEV 1964 
Justice Scalia SCAL 4277 
Justice Kennedy KEN 2196 
Justice Souter SOUT 2590 
Justice Thomas THOM 3 
Justice Ginsburg GINS 2379 
Justice Breyer BREY 2668 
Justice Alito ALIT 840 
Table 7 Speakers and their coresponding symbols. The count column identifies the frequency with which 
each symbol apears in the corpus. 
 
 62 
There are 41,417 occurrences of 1,072 unique n-grams. Table 8 summarizes the 
20 most frequent 4-grams in the corpus. Because justices do not frequently speak in 
adjacent turns, after each justice?s turn there is typicaly a lawyer?s turn. Because of this, 
n-grams usualy occur in corresponding pairs that have in common a Justice Lawyer 
Justice trigram, but difer in whether the four-gram starts or ends with a lawyer. We, 
therefore, report these pairs together. However, note that they do not always rank next to 
each other, and so the Table is ordered by the rank of the most frequent 4-gram in the 
pair. 
 
Corresponding n-grams Counts Ranks 
L SCAL L SCAL / SCAL L SCAL L 2467 / 2456 
1 / 2 
 
L ROBE L ROBE / ROBE L ROBE L 1801 / 1651 3 / 8 
L BREY L BREY / BREY L BREY L 1746 / 1726 4 / 6 
L SOUT L SOUT / SOUT L SOUT L 1729 / 1705 5 / 7 
STEV L STEV L / L STEV L STEV 1237 / 1220 9 / 10 
KENN L KENN L / L KENN L KENN 1182 / 1158 11 / 12 
GINS L GINS L / L GINS L GINS 1137 / 1122 13 / 14 
L SCAL L ROBE / SCAL L ROBE L 418 / 337 15 / 18 
ALIT L ALIT L / L ALIT L ALIT 397 / 387 16 / 17 
L ROBE L SCAL / ROBE L SCAL L 331 / 328 19 / 20 
Table 8 20 most frequent n-grams grouped by corespondence pair, ranked by most frequent n-gram in pair 
 
Note that the majority of these 4-grams include justices ?holding-the-floor? with the only 
two instances of more than one justice in the bottom of the table. Despite the fact that the 
most common 4-grams follow this patern, many les frequent n-grams represent thre or 
four instances of a justice speaking (Table 9).  
 63 
n-gram count 
BREY BREY L BREY 
18 
SCAL BREY L BREY 
18 
SCAL L SCAL SOUT 
16 
SCAL L SCAL SCAL 
16 
SOUT L SCAL GINS 
15 
BREY SCAL BREY SCAL 
5 
ROBE SCAL ROBE SCAL 
3 
KENN GINS ALIT GINS 
1 
Table 9 Infrequent n-grams containing 3-4 instances of justice turns. 
 
 
Note, because the conversational paterns of the Supreme Court are usualy very 
consistent, rare paterns like those in Table 9 often indicate uniquely transcribed events; 
the majority of instances where the same justice has two adjacent turns in the transcript 
indicate laughter in the Court. When two justices? turns are adjacent to one another this 
usualy indicates an interuption has occurred. Figure 16 contains examples of both 
laughter and interuptions from the corpus. In the first excerpt, there is laughter after 
Breyer?s first turn, after which he continues to speak.
19
 Thus the sequence is transcribed 
as BREY BREY L BREY. Also note, Mr. Sorrel?s turn ends with a ?--" indicating that 
his turn was unfinished. We interpret this as an interuption. However, because Mr. 
Sorrel is the atorney in this instance, we do not observe anything unusual in the 
sequence for this pair. In the second excerpt the transcript indicates that Roberts was 
interupted by Scalia, after which Roberts atempts to ?hold-the-floor? by interupting 
Scalia, but eventualy gives way to a second interuption by Scalia. This sequence is then 
transcribed as ROBE SCAL ROBE SCAL. 
 
                                                
19
 It is unclear from the transcripts whether this laughter should be atributed to Justice Breyer or someone 
else. 
 64 
 
Figure 16 Examples of ?Laughter? and interuptions in the transcript 
 
 
Data Preparation 
 
Before proceding with any sort of clasification, several preprocesing steps 
were taken in some experiments in order to addres sparsenes isues as wel as remove 
irelevant and potentialy distracting features: 
? Al non-justice parties are reduced into a single symbol. Since these are most 
often atorneys, we reduced them to the L symbol. This step was taken for al 
experiments. 
? Eliminate al turns not ending with a justice. This esentialy reduced the presence 
of feature pairs of the type discussed above. 
? Remove al n-grams containing markup, including TIME, as wel as the special 
symbol for the beginning and end of a case. 
Randall v. Sorrell (04-1528) 
JUSTICE BREYER: No, no. It's $200. Coffee and 
donuts are expensive. (Laughter.) 
JUSTICE BREYER: Okay? Count it or not? 
MR. SORRELL: We don't -- our coffee is not 
that expensive, but -- 
JUSTICE BREYER: Donuts and coffee. In other 
words, it counts as long as it's over $100. 
 
Samson v. California (04-9728) 
CHIEF JUSTICE ROBERTS: What about -- 
JUSTICE SCALIA: Is -- 
CHIEF JUSTICE ROBERTS: What about -- 
JUSTICE SCALIA: Is that right? I mean, even in 
prison, I -- what -- I'm not sure you could 
even do that if they were still in prison. Can 
you subject people in prison -- 
 
 
 65 
? Collapse al justices into one of thre categories; liberal (occupied by Stevens, 
Souter, Ginsburg, and Breyer), conservative (occupied by Roberts, Scalia, 
Thomas, and Alito) and swing (occupied by Kennedy). 
While not taken in al experiments, as it semingly disregards quite a bit of information, 
this final step deserves some more atention. The motivation behind such an approach is 
that it greatly reduces sparsenes in the data. Not only is the liberal/conservative ideology 
one that is more or les common knowledge, often observed both in scholarly literature 
and in the media, but it is also clearly indicated in each justice?s voting records. 
Moreover, ideology is often considered one of the more relevant dimensions over which a 
case is decided, so it is extremely relevant to predicting case outcomes. Even when the 
outcome to be predicted is afirm/reverse or agre/disagre, the interaction of the liberal 
justices and conservative justices with the swing justice can be informative in predicting 
case outcomes. However, rather than capturing the interaction betwen individual 
justices, this is more acurately described as capturing the interaction betwen wings of 
the Court. Given the rose charts, we may hypothesize that this interaction betwen the 
wings is also a relevant point to look, as paterns were observed in the way that members 
of each wing treated opposing wings. That is, paterns at the ?wing level? should be 
relevant. 
 In addition to these data preparation options, we also calculated feature values in 
two ways. The first, and most straightforward, was to simply use the absolute counts of 
each n-gram. For the second approach we used relative feature scores. For each n-gram 
we divided its frequency by the count of al n-grams for that case. The denominator 
included al n-grams; i.e. even those that were removed from the feature set using the 
 66 
filters described above. While the feature values do not sum to one this means we wil be 
able to indirectly encode potentialy useful information such as case length.  
Baselines 
 In most studies predicting Supreme Court outcomes, litle atention is given to 
baselines. Understandably, at first blush, when trying to predict an outcome like afirm or 
reverse a 50/50 baseline sems applicable. There are only two outcomes in general 
(others are possible, but rare) and both sem to occur with a fair amount of regularity. 
However, when examining the history of the Court, one finds strong tendencies for 
certain outcomes to occur more often than others. Nedles to say, the Supreme Court is 
not as simple as a fair coin toss. So, we need to consider the frequency with which each 
outcome occurs in each condition in order to establish more reliable random baseline. 
 For an afirm/reverse condition we look back at the frequency with which the 
Court upheld the lower court?s decision and the frequency with which the lower court 
was overturned. In doing so we find that the Court has a tendency to reverse cases more 
frequently than it afirms cases. Taking a sample of 1000 cases from the 1997 term to the 
2007 term, the Court afirmed cases 34.4% of the time and reversed 65.6% of the time. 
Over shorter periods this tendency can shift drasticaly; for example, if we look at a 20 
case ?moving average? of afirm decision chronologicaly over this time period (based on 
date of argument) we se that the average reaches as high as 100% and as low as 35%. 
Thus, a random baseline for this example is not 50/50. 
At first this may sem surprising; however, one must consider how cases are 
selected. Of the approximately 9000 cases submited to the Court each year, only 80 or so 
are selected to be heard by the Court. Naturaly, then, the justices are picking those cases 
 67 
which they view as most important, and as it turns out there is a slight bias for those cases 
which the Court wil overturn. 
 For a liberal/conservative baseline, the Court is a bit more balanced, at 54.2% 
conservative and 45.8% liberal for the Roberts court (with Alito). This likely has more to 
do with the composition of the Court than anything else. In fact, one might expect to se 
a court with a conservative chief justice and a slightly conservative leaning swing vote 
with a greater proportion of conservatively decided cases.  
 Despite these unbalanced baselines, it is possible to construct experiments that do 
have true 50/50 baselines. The experiment labeled The Court I is an example of this. By 
spliting the case into sides, (i.e. al turns during petitioner?s argument is one side, al 
turns during respondent?s argument is another) and seting the outcome to win/lose we 
ensure that there are an equal number of win instances in the data as there are lose 
instances (as for each case one side must win and the other must lose; again, except in 
rare circumstances).  
5.4 Experiments 
We discuss four experiments in this Section, thre dealing with clasification of the Court 
as a whole (The Court I, The Court I and The Court II) and one dealing with the 
clasification of Thomas?s votes (Thomas). 
The Court I: The first experiment conducted in this category atempted to predict 
whether the Court?s ruling would be liberal or conservative. We found that for this sort of 
task, predicting the outcome of a case for the Court, clasification was highly sensitive to 
sparsenes, so we collapsed justices into Liberal, Conservative and Swing categories. We 
also employed the filter that reduces the presence of pairs. We use absolute rather than 
 68 
relative feature values. Clasification was conducted using the LIBSVM 2.86 
implementation of support vector machines (SVM) with default parameter setings 5-fold 
cross validation and parameter tuning (Cortes and Vapnik 1995).
20
  
The Court I: As a second experiment we tested the ?in favor of side? condition. While 
somewhat more artificial than other experiments, this approach does alow us to examine 
these features in a truly balanced context. We prepared the data by spliting each 
sequence by side, so each case was composed of two sequences; turns produced during 
the petitioner?s arguments and turns produced by the respondent?s arguments. Because 
the Court has a relatively high afirm baseline (meaning the Court usualy votes in favor 
of the petitioner) we removed al information about the side that was being spoken to 
from the feature set which are introduced in the form of meta-symbols. By spliting the 
data, we also magnify the sparsenes problems from before, and so we continue to 
collapse justices into their ideologies. However, also because of the high level of 
sparsenes, we did not remove n-gram pairs, as doing so often reduced the features in any 
given case too far. This experiment used relative rather than absolute feature values. 
Again, note that since in each case one party must win while the other loses, this ensures 
that there are an equal number of winners and losers in the dataset. Again we used the 
LIBSVM implementation of SVMs with default parameter setings and 5-fold cross 
validation with parameter tuning. 
Unlike the liberal/conservative clasification, the choice to collapse justices into 
liberal, conservative and swing categories for this condition might at first sem like an 
irelevant dimension on which to reduce sparsenes. However, there are some important 
points to keep in mind. While the Court for this corpus was balanced with liberal and 
                                                
20
 htp:/ww.csie.ntu.edu.tw/~cjlin/libsvm/ 
 69 
conservative justices (4 of each), as a result of Thomas?s general silence, the number of 
speakers from each wing is unbalanced. Moreover, looking at the wings rather than 
individual justices, it may be the case that we are able to capture instances of the ?thre-
way? conversation described by David Frederick?s where the justices are conversing both 
with each other and with a particular lawyer (Biscupic 2006, Johnson et al. 2009a). To 
se why this may mater, consider the rose diagrams discussed in Chapter 4. Although we 
remove identity information of justices by collapsing the data, we are able to maintain the 
general efects that have to do with wings of the Court, and since Kennedy is the only 
swing justice, no identity information is lost for this justice. As a result, we may se cases 
where either Kennedy is showing high levels of agrement with a particular wing, or 
where the wings are jostling for support from Kennedy.
21
 In either situation, this may be 
an important factor as the swing vote wil often be the deciding factor in a case. 
The Court II: In addition to SVM approaches, in these conditions we also atempted 
some rule-based clasification conditions. This alows us to identify n-grams that are 
most informative in clasification, thus giving us a way to search for those exchanges 
betwen justices that may be particularly helpful in identifying the outcome of a case. 
This experiment used the WEKA 3.6.0 J48 implementation of decision tres.
22
 We found 
that our original data preparation options did not perform wel with decision tres, 
however, after experimenting with other data preparation options we found that by only 
collapsing justices into their ideology some improvement over baseline was achieved. 
                                                
21
 In order to test whether we were simply predicting Kenedy?s votes in this situation we tested 
clasification for his votes, for or against a particular side of a case, with the same setings. The clasifier 
achieved 58.3% acuracy which sugests this was not the case. 
22
 htp:/ww.cs.waikato.ac.nz/ml/weka/ 
 70 
Thomas: Thomas?s voting history indicates a relatively high baseline at 69.5% 
conservative votes. This, of course, is unsurprising given that Thomas is often considered 
one of the most conservative justices currently on the Court. What is surprising is that 
despite this relatively high baseline and his tendency to almost never speak during oral 
arguments, we are able to use the approach described above in order to gain insight as to 
when Thomas wil cast one of his relatively rare liberal votes. For the experiments with 
Thomas we found that by not reducing justice IDs to their liberal/conservative 
clasifications and by using only those n-grams with more than one justice we did se a 
reasonable improvement in Thomas?s clasification acuracy. We used relative rather 
than absolute feature values. Clasification was conducted using the WEKA 3.6.0 
implementation of Decision Tables (Kohavi 1995). 
Results 
 
 
Figure 17 Clasification results including prior aproaches (Court I only), baseline, and absolute acuracy. 
Error bars are the 90% confidence interval as calculated by the Cloper-Pearson method for infering exact 
binomial confidence intervals. 
 
 
 71 
The results of the experiments are detailed in Figure 17. Eror bars are calculated 
as the 90% confidence interval as computed by the Clopper-Pearson method for infering 
exact binomial confidence intervals (Clopper & Pearson, 1934). We compare our results 
to prior approaches for The Court I, and the baselines described above for al 
experiments. In al cases, our approach outperforms both prior approaches and the 
baseline. However, as indicated by the eror bars, confidence intervals overlap in several 
instances. Both The Court I and The Court I outperform the baseline at a 90% 
confidence level. We also se that The Court I outperforms the ?most words used? 
approach on this dataset. This is an important finding because the ?most words used? 
approach was found to be the most powerful approach in prior studies (Johnson et al. 
2009a). Moreover, we se that these results are comparable to experiments that used an 
order of magnitude more data (Johnson et al. 2009a). For al experiments on the Court, 
we found that collapsing justices was a very useful preprocesing step. The greatest 
increase in acuracy was provided by SVMs, regardles of the condition. And of the two 
experiments that used SVMs the greatest increase was over the split-case baseline of 
50%. While decision tres do not provide the double digit increases that SVMs do, they 
stil provide some improvement over baseline with the added benefit of providing 
decision tres that can be examined. The results for Thomas are perhaps the most 
surprising. Though the improvement is relatively smal, not only are we dealing with a 
much higher baseline, but this suggests that the interaction of the justices who do talk 
during cases is correlated with the way Thomas wil vote even though he rarely 
participates in oral arguments. 
 72 
Because the decision tables are easily interpretable, we can also examine the 
specific n-grams that are most informative in clasification. We are especialy interested 
in n-grams that contain more than one justice, because these best highlight the 
interactions betwen individual justices. Decision tables returned four such 4-grams that 
contained more than one justice. Figure 18 contains these sequences along with examples 
of these sequences from the corpus. 
 73 
 
Figure 18 Informative sequences from Thomas decision tres with examples from transcripts. 
BREY BREY L GINS 
Ex. From Michael A. Watson v. United States (06-571) 
JUSTICE BREYER: I don't want to put you in a whipsaw here. 
(Laughter.)  
JUSTICE BREYER: Sometimes policy seems relevant, too, to figure out 
what Congress wanted. But let me go back to the question I had, 
which is do you want to us overturn Smith?, Are you asking that, 
because I could understand it more easily if you said, look, both 
sides of the transaction should be treated alike, but they should be 
both outside the word "use."  
MR. KOCH: I do not believe it's necessary for this Court to overrule 
Smith in order to rule for the Petitioner here, because of -- 
because of the differences, first of all linguistically; and 
secondly because of the reliance on Bailey.  
JUSTICE GINSBURG: And in answer to my question, you said you were 
not urging the overruling of Smith?  
 
SOUT SCAL L SCAL  
Ex. From Federal Election Comm'n v. Wisconsin Right to Life, Inc. 
(06-969) 
JUSTICE SOUTER: And it is impossible to know what the words mean 
without knowing the context in which they are spoken.  
JUSTICE SCALIA: When the Government put these exhibits, were those 
exhibits complete with context?  
MR. BOPP: No. There was no -- 
JUSTICE SCALIA: I didn't think so. They just -- they just -- what 
the ads were.  
 
SCAL L SCAL GINS 
Ex. From Engquist v. Oregon Dept. of Agriculture (07-474) 
JUSTICE SCALIA: That's certainly an equal protection. She could be 
fired at will and everybody else can be fired at will.  
MS. METCALF: Agreed.  
JUSTICE SCALIA: Why isn't that equal protection of the law?  
JUSTICE GINSBURG: Except this wasn't -- this wasn't employment at 
will, right?  
 
BREY ROBE L GINS 
Ex. From Travelers Casualty & Surety Co. of America v. Pacific Gas & 
Elec. Co. (05-1429) 
JUSTICE BREYER: And, and yet there are no briefs from them; there 
are no -- there is no article that I could find in Bankruptcy 
Journal.  
CHIEF JUSTICE ROBERTS: Well, there may be no briefs from them 
because it isn't the question on which we granted cert, is it?  
MR. BRUNSTAD: Chief Justice Roberts, that's Official correct. And 
our view is that the Court should deal only with the Fobian rule. 
And the alternative argument which Respondent presents was never 
argued below, was not decided below, was not presented in the 
opposition to certiorari. It's been rejected by every single court 
of appeals -- 
JUSTICE GINSBURG: But it would be proper to remand for the Ninth 
Circuit to consider those other arguments?  
 
 
 
 
 
 
 
 74 
Since the baselines for individual justices are so high, any improvement in 
clasification acuracy is going to come from the ability to predict unusual behavior from 
that justice. This is just what we found in the case of Thomas. One can already predict the 
majority of Thomas?s votes simply by asuming his vote wil be conservative. In order to 
move beyond this simple baseline, one needs to be able to predict liberal cases. By 
predicting these with high precision, we are able to boost performance when predicting 
outcomes for Thomas. Though such results may be subject to the danger of over-fiting, 
as additional cases are being created, it wil be possible to test this approach further. Of 
course, as justices change so too wil the performance this approach. 
Discusion 
These clasification experiments built upon the observations in Chapter 4 that 
turn-sequences are asociated with case outcomes. These results indicate that there are 
paterns in justices? turn-taking behavior that are in fact predictive of case outcomes. 
Additionaly, we show improvement on our dataset over approaches previously shown to 
have the best performance the most comprehensive prior study. Moreover, the acuracy is 
comparable to that of studies that used an order of magnitude more data than our study, 
while exploring a novel hypothesis about the predictability of Supreme Court outcomes 
and the features of the case that are used make predictions (Johnson et al. 2009a). 
The fact that any benefit at al is achieved using interaction features as simple as 
turn-taking is a novel finding that may surprise some researchers (Evans, M. personal 
correspondence, August 28, 2009). Questions stil remain as to why the features used are 
important. Without a doubt the content of justices? turns are informative with regard to a 
case?s outcome, but what about the conversational nature of the exchanges represented by 
 75 
our features? Future research might ask what characteristics of these exchanges are 
informative. Perhaps it is general features, such as the tone of the exchange, or perhaps 
these n-grams isolate strategic exchanges where judges in opposition to one another are 
looking to counter other justices? arguments and judges in agrement to one another are 
providing support.  
Interestingly, this approach has the potential to predict both the behavior of the 
Court as wel as individual justices. This is an important finding as it suggests that these 
approaches may not need to be restricted to natural courts. 
This work represents a methodologicaly novel approach, thus creating a new tool 
for researchers looking to gain a greater understanding of the Supreme Court and the 
justices. As discussed below, as more data is created (thus reducing sparsenes) numerous 
extensions to this approach present themselves, suggesting the possibility of richer more 
powerful models of justice interaction and court behavior. 
 76 
 
Chapter 6 Conclusions 
 
This work represents the first steps towards modeling the relationship betwen 
Supreme Court justices? interactions and actions. We have novely applied computational 
methods for patern discovery in Supreme Court discourse which may more generaly be 
applied in legal discourse. While legal scholars and other court folowers may have 
intuitions about the social dynamics of the Court, these intuitions are most often limited 
to a few areas of expertise and a narow range of examples. What this work offers is a 
global approach to patern discovery in the social dynamics of the Supreme Court 
justices. With these paterns, legal scholars are given a new avenue for research that can 
lead to a greater understanding of this country?s highest court that would otherwise go 
unexplored. 
This work addresed thre objectives: to show that a) predictable high level 
paterns exist in the conversational dynamics of the Supreme Court, b) these paterns 
may be asociated with other areas of interest to legal scholars such as voting paterns of 
the justices, c) this asociation betwen linguistic paterns and judicial paterns may be 
utilized both to provide short term insights (i.e. predicting the outcome of a particular 
case) and deeper insights about the behavior of the Supreme Court. Our results indicate 
that a, b and c do hold. We have found that by combining features with regard to turn 
content, discourse marker use, and personal reference we can gain information about who 
is speaking when and that by increasing the history of these features we can further boost 
the reliability of these methods. The rose charts demonstrate that interesting paterns can 
 77 
be observed when we are looking at summaries of the turn-taking behavior for various 
conditions. Our prediction approach performed significantly beter than prior approaches 
on the same data and comparably to approaches utilizing an order of magnitude more 
data (Johnson et al. 2009a). These results indicate that turn-taking paterns are in fact 
predictive of case outcomes.  
In addition to the contribution of positive results, we have also made a number of 
methodological contributions as wel. While the analysis of Supreme Court discourse is 
not new, our approach of viewing the paterns of Supreme Court turn-taking as both 
predictable and predictive of case outcomes is a novel one, and we have offered several 
techniques to explore this hypothesis. We addresed only a narow range of questions 
with these techniques, but expect that legal scholars wil find a wide aray of hypotheses 
to explore. Additionaly, our rose diagrams are a new application of radial plots that are 
helpful in visualizing the relationship betwen turn-taking sequences and actions (Draper 
2009). 
6.1 Future work and Unanswered Questions 
 Unfortunately, sparsenes is a major limiting factor in combining content with 
turn sequences for the Supreme Court. However, as data is continualy being created, 
these problems should be continualy reduced. Moreover, though not explicitly identified 
in the transcripts prior to 2004, the identity of individual justices is not lost, as the audio 
transcripts of these cases stil exist. Perhaps by combining audio speaker recognition 
techniques with our justice identification approach, one could reconstruct speaker 
identities for these earlier cases (Yuan and Liberman 2008). Doing so would provide 
considerably more data for experimentation. If sparsenes isues are appropriately 
 78 
addresed one could incrementaly increase the amount of information used in turn 
sequences. For example, with limited additional work, one could include further turn 
features such as interuptions, perceived humor (indicated in transcripts with a ?laughter? 
marker), and question vs. statement. As indicated above in Section 5.3, while not overtly 
marked, these first two features stil managed to find their way into our dataset as 
discussed above and were some of the most informative features in clasifying Thomas. 
While overtly marking these features increases sparsenes too far, adding more data 
reduces this problem aking the overt marking of these features viable; and given the 
results above one would expect them to be helpful. As other researchers have found, the 
questioning patern is likely indicative of case outcomes, at least in extreme cases. Thus, 
one might expect some benefit from incorporating questioning features in the turn 
sequence. 
Moreover, in many cases the existence of interuptions and laughter is indicative 
of higher level features of a turn, such as hostility and tone of questioning. Though the 
reliability of identification of these features is currently untested, work in areas such as 
sentiment detection may be useful in atempting to identify these features (Pang and Le 
2008). If succesful, these too could be included in the turn sequence and would likely 
give further insight into the interaction of the justices. 
Another strong cue to the interaction of justices would be the discourse relations 
that hold betwen justices? turns. Again, while incorporating features for discourse 
relations in the turn sequence would inherently increase sparsenes, if and when 
sparsenes is addresed, including discourse markers in the turn sequence is a logical first 
step to creating a richer feature set that includes information about discourse relations. 
 79 
Ultimately, one would idealy want to identify the underlying relations that hold betwen 
the turns in the sequence. Identifying the speaker or wing of the speaker along with how 
the turn relates to the previous turn would clearly provide rich information about the 
interaction of justices and would likely be highly informative regarding case outcomes. 
 Though sentiment analysis would likely make considerable contributions to the 
quality of Supreme Court forecasting as suggested by Wrightsman (2008) and Johnson et 
al. (2009a) automatic detection of sentiment in a domain such as Supreme Court 
discourse is likely to be considerably harder than the already dificult typical sentiment 
analysis tasks. While overt sentiment may be expresed by word choice, in a formal 
seting such as the Supreme Court, sentiment wil often not be expresed overtly, thus 
requiring researchers to rely on methods for identifying covert sentiment (Evans et al. 
2007, Gren and Resnik 2009). This raises its own isues, as expresion of covert 
sentiment is likely to vary betwen cases as the isue area of cases changes. These factors 
make the task of automatic sentiment detection in this domain a considerably diferent 
task than typical areas of sentiment detection such as movie and product reviews. 
 In Chapter 1 we discussed the potential broader implications of this research. That 
is, this work could be extended to other situations where we are interested in the 
relationship betwen conversational behavior and non-linguistic actions. While we are 
confident that we could directly apply these approaches to other similar situations, e.g. 
lower courts or even contestant judging on reality shows, this opens up the question of 
just how far approaches similar to those covered here can be applied. Do individuals in 
conversational setings take on recognizable natural roles (e.g. leader, ?devil?s advocate?, 
etc.) that are applicable across numerous situations? If so, would we be able to reduce 
 80 
reliance on speaker and domain specific training data, expanding the applicability of 
these approaches to a wider range of conversational setings such as busines negotiations 
and other metings? And, what might we learn about human interaction in general and 
the relationship betwen conversational interaction and real world actions from these 
sorts of approaches? By exploring the conversational dynamics of the U.S. Supreme 
Court and their relationship with the actions taken by the Court as a whole and by 
individual justices, this work begins to addres these questions. 
 81 
 
Apendix A Rose Charts 
 
Al Cases 
 82 
 
 
 83 
DIR Condition 
 84 
 
 
 85 
 
 
 
 
 
 
 86 
JDIR Condition 
 87 
 88 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 89 
Vote Split 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 90 
 91 
 92 
 93 
 94 
 95 
 96 
 
 97 
 
Apendix B Discourse Markers 
 
Note: Some of these discourse markers include some regular-expresion syntax. 
above al 
above al 
absolutely 
acordingly 
actualy 
add to this 
additionaly 
admitedly 
after 
after al 
after that 
after this 
afterwards 
again 
again and again 
albeit 
al in al 
al right 
al the same 
al this time 
already 
alright 
also 
also because 
alternatively 
although 
altogether 
always 
asuming that 
analogously 
and 
and again 
and also 
and another 
and then 
another time 
anyhow 
anyway 
apart from 
apart from that 
arguably 
as 
as a 
consequence 
as a corollary 
as a hypothetical 
as a logical 
conclusion 
as a mater of 
fact 
as a result 
as a whole 
as against 
as an 
as briefly as 
as closely as 
as evidence 
as far as 
as for 
as i said 
as i say 
as i understand 
as if 
as it happened 
as it is 
as it turned out 
as long as 
as luck would 
have it 
as soon as 
as such 
as though 
as to 
as we shal 
as we wil 
as wel 
aside from 
asuming 
at a time 
at any rate 
at first 
at first sight 
at first view 
at last 
at least 
at most 
at once 
at some level 
at some point 
at that 
at that moment 
at that point 
at that time 
at the moment 
at the moment 
when 
at the outset 
at the same time 
at the time 
at this date 
at this moment 
at this point 
at this stage 
at which  
at which point 
back 
back to my 
original point 
because 
because of 
because of this 
before 
before long 
before that 
before then 
besides 
besides that 
beter 
briefly 
but 
but also 
but then 
but then again 
by 
by al means 
by and by 
by and large 
by comparison 
by contrast 
by that time 
by the same 
by the same 
token 
by the time 
by the way 
by then 
certainly 
clearly 
come to think of 
it 
conceivably 
consequently 
considering 
considering that 
contrariwise 
conversely 
correspondingly 
decidedly 
definitely 
despite 
despite that 
despite the fact 
that 
despite this 
doubtles 
each time 
earlier 
either 
either case 
either event 
either way 
else 
elsewhere 
equaly 
 98 
especialy 
esentialy 
even 
even after 
even before 
even if 
even so 
even then 
even though 
even when 
eventualy 
ever since 
every time 
everywhere 
evidently 
except 
except after 
except before 
except if 
except when 
except in so far 
as 
except that 
except when 
excuse me 
failing that 
finaly 
fine 
first 
first of al 
firstly 
following 
following this 
for 
for a start 
for example 
for fear that 
for instance 
for one 
for that 
for that mater 
for that reason 
for the reason 
that 
for the simple 
reason 
for this 
for this reason 
for me 
formerly 
fortunately 
frankly 
from al 
from everything 
from now on 
from then on 
from your 
answer 
further 
furthermore 
given 
given that 
granted that 
having said 
having said that 
hence 
here 
herein 
here's 
heretofore 
hitherto 
however 
however that 
may be 
hum 
i don't think 
i gues 
i mean 
i say 
i suppose 
i suspect 
i take it 
i think 
i thought 
i understand 
if 
if ever 
if in fact 
if indeed 
if not 
if only 
if so 
if such a 
in a diferent 
vain 
in a sense 
in actual fact 
in addition 
in al candor 
in al due 
respect 
in any case 
in any event 
in case 
in comparison 
in conclusion 
in consequence 
in contrast 
in doing 
in doing so 
in doing this 
in efect 
in esence 
in fact 
in fairnes 
in general 
in just the same 
way 
in may be 
concluded that 
in my case 
in my opinion 
in my view 
in one instance 
in order to 
in other respects 
in other words 
in our judgment 
in our view 
in part 
in particular 
in place of 
in point of fact 
in practice 
in real world 
terms 
in response 
in retrospect 
in short 
in so doing 
in so many 
words 
in spite of 
in spite of that 
in such a 
in such an 
in sum 
in that 
in that case 
in that instance 
in that respect 
in that scenario 
in that statement 
in the beginning 
in the case of 
in the end 
in the event 
in the first place 
in the hope that 
in the meantime 
in the same way 
in theory 
in this case 
in this 
connection 
in this respect 
in this way 
in truth 
in turn 
in which 
in which case 
in your opinion 
in your view 
inasmuch as 
incidentaly 
including 
incontestably 
incontroversialy 
indeed 
indisputably 
indubitably 
initialy 
insofar 
insofar as 
 99 
instantly 
instead 
instead of 
interestingly 
interestingly 
enough 
ironicaly 
it becomes 
it can be 
concluded that 
it follows 
it follows that 
it happens 
it is because 
it is clear 
it is conceivable 
it is conclusive 
it is correct 
it is for this 
reason 
it is only 
it (may|might) 
sem that 
it (may|might) 
appear that 
it (may|might) 
sem that 
it turns out 
just 
just a pause 
just about 
just again 
just as 
just before 
just then 
kind of 
largely 
largely because 
last 
lastly 
later 
lest 
let us 
let us asume 
let us consider 
like 
likewise 
listen 
literaly 
look 
luckily 
mainly 
mainly because 
meanwhile 
merely 
merely because 
mind you 
more acurately 
more 
importantly 
more precisely 
more 
specificaly 
more to the 
point 
moreover 
most likely 
much as 
much later 
much sooner 
my point 
my position 
my question 
my response 
my solution 
my 
understanding 
naturaly 
needles 
neither 
neither is it the 
case 
never again 
nevertheles 
next 
next moment 
next time 
no 
no doubt 
no mater 
no sooner than 
nonetheles 
nor 
normaly 
not 
not at al 
not 
automaticaly 
not because 
not by itself 
not completely 
not directly 
not exactly 
not necesarily 
not only 
not quite 
not realy 
not specificaly 
not that 
notably 
notwithstanding 
notwithstanding 
that 
now 
now that 
obviously 
of course 
oh 
okay|ok 
on a diferent 
note 
on acount of 
on another 
on balance 
on condition 
on condition 
that 
on its face 
on its own 
on one hand 
on one side 
on that 
on that point 
on that question 
on that very 
point 
on the bases 
on the basis 
on the contrary 
on the face of 
on the grounds 
on the grounds 
that 
on the one hand 
on the other 
on the other 
hand 
on the other side 
on this basis 
on this 
particular isue 
on top of it 
on top of that 
on top of this 
on which 
once 
once again 
once more 
only 
only after 
only because 
only before 
only if 
only when 
oops 
or 
or again 
or else 
ordinarily 
originaly 
other than 
otherwise 
our focus 
our only point 
our point 
our position 
overal 
parentheticaly 
particularly 
particularly 
when 
perhaps 
plainly 
possibly 
 100 
potentialy 
practicaly 
precisely 
presently 
presumably 
presumably 
because 
previously 
probably 
provided 
provided that 
providing that 
put another way 
quite 
quite likely 
quite simply 
quite the 
contrary 
rather 
reasonably 
reciprocaly 
regardles 
regardles of 
that 
returning to 
right 
rightly so 
say 
second 
secondly 
se 
seing as 
seing that 
semingly 
significantly 
similarly 
simply 
simply because 
simultaneously 
since 
so 
so far 
so if 
so that 
some time 
soon 
speaking of 
specificaly 
stil 
stil and al 
strictly speaking 
subsequently 
such as 
such that 
suddenly 
summarizing 
summing up 
suppose 
suppose that 
supposedly 
supposing that 
sure enough 
surely 
technicaly 
that 
that done 
that is 
that is al 
that is how 
that is to say 
that is why 
that reminds me 
that said 
that way 
the end 
the fact is 
the fact is that 
the first time 
the instant 
the isue here 
the key 
the key words 
the last time 
the later 
the logic is that 
the moment 
the more 
the more often 
the next time 
the one time 
the point 
the point being 
the point is 
the question 
the question is 
the thing is 
then 
then again 
theoreticaly 
there again 
there are a few 
things 
thereafter 
thereby 
therefore 
there('s| is) no 
doubt 
thereupon 
third 
thirdly 
this case 
this claim 
this court 
this means 
this time 
though 
thus 
thus far 
to add 
to be clear 
to be fair to 
them 
to be precise 
to be sure 
to begin with 
to clarify 
to close 
to comment 
to conclude 
to explain 
to follow-up 
to get back 
to go on 
to go to 
to ilustrate 
to interupt 
to make maters 
worse 
to me 
to my 
knowledge 
to note 
to open 
to put it 
to put it in 
context 
to put it this way 
to repeat 
to start with 
to stop 
to sum up 
to summarize 
to take an 
example 
to the best of my 
knowledge 
to the best of 
our knowledge 
to the degre 
that 
to the extent 
to the extent 
possible 
to the extent that 
to this end 
to the 
asumption 
too 
traditionaly 
two 
two answers 
two points 
two primary 
reasons 
two reasons 
two responses 
two separate 
two things 
typicaly 
uh 
ultimately 
undeniably 
under the 
circumstances 
 101 
under these 
circumstances 
understand 
undoubtedly 
unfortunately 
unles 
unquestionably 
until 
until then 
up to now 
up to this 
very briefly 
very likely 
very quickly 
we agre 
we believe 
we believed 
we might say 
we think not 
we think that 
wel 
what i mean to 
say 
what is more 
whatever 
when 
whenever 
where 
whereas 
whereby 
whereupon 
wherever 
whether 
whether or not 
which 
which is why 
which means 
which reminds 
me 
whichever 
while 
while i have you 
who 
whoever 
with absolute 
certainty 
with al due 
respect 
with al respect 
with one 
addition 
with regard to 
with respect 
with respect to 
with that 
with this 
without 
yes 
yet 
you know 
you se 
false 
true 
 102 
References 
 
Ali v. Federal Bureau of Prisons. 06-9130 U. S. (2007). 
 
Benesh, S. C. (2002). Becoming an Inteligent User of the Spaeth Supreme Court 
Databases. Southwestern Political Science Asociation Meting. New Orleans, LA. 
 
Biscupic, J. 2006. Justices make points by questioning lawyers. USA Today. (Oct. 5, 
2006). 
 
Brown, G. and Yule, G. (1983). Discourse Analysis. Cambridge: Cambridge University 
Pres. 
 
Clopper, C. J., and Pearson, E. S. (1934). The use of confidence or fiducial limits 
ilustrated in the case of the binomial. Biometrika, 26, 404?413. 
 
Cortes, C. and Vapnik, V. (1995). Support-vector network. Machine Learning, 20. 
 
Draper, G. M., Livnat, Y., Riesenfeld, R. F. (2009). A Survey of Radial Methods for 
Information Visualization. IEE Transactions on Visualization and Computer Graphics. 
15(5), 759-776. 
 
Duke Law. (2009). Supreme Court Asociate Justice Antonin Scalia presides over Dean's 
Cup Moot Court Competition Duke Law News and Events. 
http:/ww.law.duke.edu/news/story?id=2943&u=11. 
 
Engquist v. Oregon Dept. of Agriculture. 07-474 U. S. (2008). 
 
Evans, M., McIntosh, W., Lin, J., and Cates, C. (2007). Recounting the Courts? Applying 
Automated Content Analysis to Enhance Empirical Legal Research. Journal of Empirical 
Legal Studies, 4(4), 1007-1039. 
 
Federal Election Comm'n v. Wisconsin Right to Life, Inc. 06-969 U. S. (2007). 
 
Forbes-Riley, K. and Litman, D. (2004). Predicting Emotion in Spoken Dialogue from 
Multiple Knowledge Sources. In Procedings of the Human Language Technology 
Conference: 4th Meting of the North American Chapter of the Asociation for 
Computational Linguistics. 
 
Galey M., McKeown, K., Hirschberg, J., Shriberg, E. (2004). Identifying Agrement and 
Disagrement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic 
Dependencies. In Procedings of the 42nd Annual Meting of the Asociation for 
Computational Linguistics (669-676). 
 
 103 
Garside, R. (1987). The CLAWS Word-tagging System. In: R. Garside, G. Lech and G. 
Sampson (eds), The Computational Analysis of English: A Corpus-based Approach. 
London: Longman. 
 
Grene, S. and Resnik, P. (2009). More Than Words: Syntactic Packaging and Implicit 
Sentiment. In Procedings of Human Language Technologies: The 2009 Annual 
Conference of the North American Chapter of the Asociation for Computational 
Linguistics. 
 
Grosz, B. and Hirschberg, J. (1992) Some Intonational Characteristics Of Discourse 
Structure. In Procedings of the International Conference on Spoken Language 
Procesing. 
 
Grosz, B. and Sidner, C. L. (1986). Atention, Intentions, and the Structure of Discourse. 
Computational Linguistics, 12(3), 175-204. 
 
Gurevych, I., Strube, M. (2004) Semantic Similarity Applied To Spoken Dialogue 
Summarization. In Procedings of the 20th International Conference on Computational 
Linguistics. 
 
Haliday, M. A. K., and Hasan, R. (1976). Cohesion in English. London: Longman. 
 
Hawes. T., Lin J., and Resnik, P. (2009) Elements of a Computational Model for Multi-
Party Discourse: The Turn-Taking Behavior of Supreme Court Justices. Journal of the 
American Society for Information Science and Technology, 60(8), 1607 ? 1615. 
 
Hutchby, I. and Wooffit, R. (2008). Conversation Analysis. Cambridge: Polity Pres. 
 
Janin, A., Baron, D., Edwards, J., Elis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., 
Shriberg, E., Stolcke, A. and Wooters, C. (2003). The ICSI Meting Corpus. In 
Procedings of the IEE International Conference on Acoustics, Speech, and Signal 
Procesing (ICASSP) (364?367). 
 
Johnson, T. R. (2001). Information, oral arguments, and Supreme Court decision making. 
American Politics Research, 29(4), 331?351. 
 
Johnson, T. R. (2004). Oral arguments and decision making on the United States 
Supreme Court. Albany, NY: State University of New York Pres. 
 
Johnson, T. R., Black, R., Goldman, J. and Treul, S. (2009) Inquiring Minds Want to 
Know: Do Justices Tip Their Hands with Questions at Oral Argument in the U.S. 
Supreme Court?. Washington University Journal of Law & Policy, 29. 
 
Johnson, T. R., Black, R., and Ringsmuth, E. (2009) Hear Me Roar: What Provokes 
Supreme Court Justices to Disent from the Bench? Minnesota Law Review. 
 
 104 
Johnson, T. R., Spriggs, J. F., and Wahlbeck, P. J. (2007). Supreme Court Oral 
Advocacy: Does it afect the Justices' Decisions?. Washington University Law Review, 
85. 
 
Johnson, T. R., Wahlbeck P.J., and Spriggs, J.F., I. (2006). The influence of oral 
arguments on the U.S. Supreme Court. American Political Science Review, 100(1), 99?
113. 
 
Johnson, T. R., Wahlbeck, P. J., and Spriggs, J. F. (2006). The Influence of Oral 
Arguments on the U.S. Supreme Court, American Political Science Review. 
 
Johnstone, B. (2007). Discourse Analysis. Malden: Blackwel Publishing. 
 
Jovanovic, N., and Akker, R. op den. (2004). Towards automatic addrese identification 
in multi-party dialogues. In M. Strube and C. Sidner (Eds.), Procedings of the 5th 
SIGdial Workshop on Discourse and Dialogue at HLT/NACL 2004 (89?92). 
 
Kansas v. Marsh (Reargued). 04-1170 U. S. (2006). 
 
Kohavi, R. (1995). The Power of Decision Tables. In 8th European Conference on 
Machine Learning (174-189). 
 
Kurland, P. B., & Hutchinson, D. J. (1983). The busines of the Supreme Court, O. T. 
1982. The University of Chicago Law Review, 50(2), 628-651. 
 
Laferty, J. D., McCalum, A., and Pereira, F. C. N. (2001). Conditional random fields: 
Probabilistic models for segmenting and labeling sequence data. In C.E. Brodley and 
A.P. Danyluk (Eds.), Procedings of the Eightenth International Conference on 
Machine Learning (ICML 2001) (282?289). 
 
Laver, M., Benoit, K., and Gary, J. (2003). Extracting policy positions from political 
texts using words as data. American Political Science Review, 97(2), 311?331. 
 
MacWhinney, B., Bird, S., Cieri, C., and Martel, C. (2004). TalkBank: Building an open 
unified multimodal database of communicative interaction. In Procedings of the 4th 
International Conference on Language Resources and Evaluation (LREC).  
 
Manning, C. D. and Sch?tze, H. (1999). Foundations of Statistical Natural Language 
Procesing. Cambridge: MIT Pres. 
 
Marcu, D. (1997). The rhetorical parsing of unrestricted natural language texts. In P.R. 
Cohen and W. Wahlster (Eds.), Procedings of the 35th Annual Meting of the 
Asociation for Computational Linguistics (ACL 1997) (96-103), adrid, Spain: ACL. 
 
Marcu, D. and Echihabi, A. (2002) An Unsupervised Approach to Recognizing Discourse 
Relations. In Proceding of the ACL/NACL. 
 105 
 
Martin, A. D. and Quinn, K. M. (2002). Dynamic Ideal Point Estimation via Markov 
Chain Monte Carlo for the U.S. Supreme Court, 1953-1999. Political Analysis. 10, 134-
153. 
 
Michael A. Watson v. United States. 06-571 U. S. (2007). 
 
Morris, J. and Hirst, G. (1991). Lexical Cohesion Computed by Thesaural Relations as 
an Indicator of the Structure of Text . Computational Linguistics 17(1), 21-48. 
 
Mosteler, F. and Walace, D. L. 1964. Inference and Disputed Authorship: The 
Federalist. Reading: Addison-Wesley. 
 
Oates, S. (2001). A listing of discourse markers. Technical Report ITRI-01-26. Retrieved 
January 10, 2008, from University of Brighton, Information Technology Research 
Institute Web site: ftp:/ftp.itri.bton.ac.uk/reports/ITRI-01-26.pdf. 
 
Pang, B. and Le, L. (2008). Opinion Mining and Sentiment Analysis. Boston: Now 
Publishers Inc. 
 
Purver, M., K?rding, K., Grifiths, T. and Tenenbaum, J. (2006). Unsupervised Topic 
Modeling for Multi-Party Spoken Discourse. In Procedings of COLING/ACL 2006 (pp. 
17-24), Sydney, Australia: July 2006. 
 
Randall v. Sorrel. 04-1528. U. S. (2004). 
 
Rehnquist, W.H. (2002). The Supreme Court. New York: Vintage. 
 
Rohde, D. and Spaeth, H. (1976). Supreme Court Decision Making. San Francisco: 
Freman. 
 
Rombeck, T. (2002). Justice takes time for Q&A. Lawrence Journal-World. 
 
Ruger, T. W., Kim, P., Martin, A. D. and Quinn, K. M. (2002). The Supreme Court 
Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court 
Decisionmaking. Columbia Law Review. 
 
Ruger, T. W., Kim, P., Martin, A. D. and Quinn, K. M. (2004). Competing Approaches to 
Predicting Supreme Court Decision Making. Perspectives on Politics Symposium. 2(4). 
 
Samson v. California. 04-9728 U. S. (2006). 
 
Schegloff, E. A. (2007). Sequence Organization in Interaction: Volume 1: A Primer in 
Conversation Analysis. Cambridge: Cambridge University Pres. 
 
Schifrin, D. (1987). Discourse markers. Cambridge: Cambridge University Pres. 
 106 
 
Schifrin, D., Tannen, D. and Hamilton, H. E. (eds.) 2001. The Handbook of Discourse 
Analysis. Malden: Blackwel Publishers Inc. 
 
Segal, J. A. and Spaeth, H. J. (2002). The Supreme Court and the Atitudinal Model 
Revisited. Cambridge: Cambridge University Pres. 
 
Setles, B. (2004). Biomedical named entity recognition using conditional random fields 
and rich feature sets. In N. Collier, P. Ruch, and A. Nazarenko (Eds.), In Procedings of 
the COLING 2004 International Joint workshop on Natural Language Procesing in 
Biomedicine and its Applications (NLPBA/BioNLP 2004) (107?110). 
 
Sha, F., and Pereira, F. (2003). Shalow parsing with conditional random fields. In M. 
Hearst and M. Ostendorf (Eds.), In Procedings of Author Proof the 2003 Human 
Language Technology Conference and the North American Chapter of the Asociation 
for Computational Linguistics Annual Meting (134?141), Edmonton, Alberta, Canada: 
ACL. 
 
Shullman, S. L. (2004). The ilusion of devil?s advocacy: How the justices of the 
Supreme Court foreshadow their decisions during oral argument. The Journal of 
Appelate Practice and Proces, 6, 271?293. 
 
Small v. United States. 03-750 U. S. (2004). 
 
Snyder v. Louisiana. 06-10119 U. S. (2007). 
 
Spaeth, H. J. (2009). The Original U.S. Supreme Court Judicial Database. 
http:/ww.cas.sc.edu/poli/juri/sctdata.htm. 
 
Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Es-Dykema, C. , Ries, K., Shriberg, 
E., Jurafsky, D., Martin, R. and Meter, M. (2000). Dialogue Act Modeling for 
Automatic Tagging and Recognition of Conversational Speech. Computatuinal 
Linguistics, 26(3). 
 
Sutton, C. and McCalum, A. (2006). Introduction to Conditional Random Fields for 
Relational Learning In L. Getoor and B. Taskar (Eds.), Introduction to Statistical 
Relational Learning. 
 
Thomas, M., Pang, B., and Le, L. (2006). Get out the vote: Determining support or 
opposition from Congresional floor-debate transcripts. In D. Jurafsky and E. Gaussier 
(Eds.), Procedings of the 2006 Conference on Empirical Methods in Natural Language 
Procesing (EMNLP 2006) (327?335). Sydney, Australia: ACL. 
 
Toutanova, K., Klein, D., Manning, C., and Singer, Y. (2003). Feature-Rich Part-of-
Speech Tagging with a Cyclic Dependency Network. In Procedings of HLT-NACL 
2003 (252-259). 
 107 
 
Travelers Casualty & Surety Co. of America v. Pacific Gas & Elec. Co. 05-1429 U. S. 
(2007) .        
 
Wrightsman, L. S. (2008). Oral Arguments Before the Supreme Court. New York: 
Oxford University Pres. 
 
Wrightsman, L. S. (2008). Oral Arguments Before the Supreme Court. Oxford: Oxford 
University Pres. 
 
Yuan, J. and Liberman, Mark. (2008). Speaker Identification in the SCOTUS corpus. In 
Procedings of Acoustics ?08.