ABSTRACT 
 
 
 
 
Title of Document: WINDOWS INTO SENSORY INTEGRATION 
AND RATES IN LANGUAGE PROCESSING: 
INSIGHTS FROM SIGNED AND SPOKEN 
LANGUAGES 
  
 So-One K. Hwang 
 
Doctor of Philosophy 2011 
  
Directed By: Professor William Idsardi 
Department of Linguistics 
 
 
 
This dissertation explores the hypothesis that language processing proceeds in 
?windows? that correspond to representational units, where sensory signals are 
integrated according to time-scales that correspond to the rate of the input.  To 
investigate universal mechanisms, a comparison of signed and spoken languages is 
necessary.  Underlying the seemingly effortless process of language comprehension is 
the perceiver?s knowledge about the rate at which linguistic form and meaning unfold 
in time and the ability to adapt to variations in the input. 
The vast body of work in this area has focused on speech perception, where 
the goal is to determine how linguistic information is recovered from acoustic signals.  
Testing some of these theories in the visual processing of American Sign Language 
(ASL) provides a unique opportunity to better understand how sign languages are 
  
processed and which aspects of speech perception models are in fact about language 
perception across modalities.    
The first part of the dissertation presents three psychophysical experiments 
investigating temporal integration windows in sign language perception by testing the 
intelligibility of locally time-reversed sentences.  The findings demonstrate the 
contribution of modality for the time-scales of these windows, where signing is 
successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 
ms), while also pointing to modality-independent mechanisms, where integration 
occurs in durations that correspond to the size of linguistic units.  The second part of 
the dissertation focuses on production rates in sentences taken from natural 
conversations of English, Korean, and ASL.  Data from word, sign, morpheme, and 
syllable rates suggest that while the rate of words and signs can vary from language to 
language, the relationship between the rate of syllables and morphemes is relatively 
consistent among these typologically diverse languages.  The results from rates in 
ASL also complement the findings in perception experiments by confirming that 
time-scales at which phonological units fluctuate in production match the temporal 
integration windows in perception.   
These results are consistent with the hypothesis that there are modality-
 independent time pressures for language processing, and discussions provide a 
synthesis of converging findings from other domains of research and propose ideas 
for future investigations. 
 
 
  
 
 
 
 
 
 
 
WINDOWS INTO SENSORY INTEGRATION AND RATES IN LANGUAGE 
PROCESSING: INSIGHTS FROM SIGNED AND SPOKEN LANGUAGES 
 
 
 
By 
 
 
So-One K. Hwang 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the 
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2011 
 
 
 
 
 
 
 
 
 
 
Advisory Committee: 
Professor William Idsardi, Chair 
Associate Professor Gaurav Mathur 
Professor David Poeppel 
Assistant Professor Naomi Feldman 
Professor Robert DeKeyser 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
So-One K. Hwang 
2011 
 
ii 
Dedication 
 
 
 
~ For Shue-Yearn, One, and Tyler ~ 
 
iii 
Acknowledgements 
 
I am fortunate to have had so many opportunities to learn and be inspired 
throughout my life, and here, I would like to express gratitude to all those who 
enriched my graduate school experience.   
I would like to thank Bill Idsardi for being my advisor.  He has provided me 
with an excellent foundation for learning about important themes and methodologies 
in linguistics and speech perception.  He always seeks explanations that work with the 
right level of analysis in the field of cognitive science, with a sharp eye for testable 
hypotheses.  Because of his open-mindedness and encouragements, I have been able 
to build many collaborations and pursue research projects I can be passionate about.  I 
would like to thank David Poeppel for first introducing me to the importance of 
temporal processing in language and cognition.  He has an extraordinary ability to 
keep cool while still conveying enthusiasm (or skepticism) for ideas, and I have 
learned a lot from the way he mentors students and works with colleagues.  I would 
like to thank Gaurav Mathur for first introducing me to sign language research, which 
has truly been an eye-opening experience for me.  He has a passion for cross-
 linguistic research and a genuine enthusiasm for collaboration, and I am so grateful 
for his guidance and support.  I would also like to thank Robert DeKeyser and Naomi 
Feldman, whose participation in my committee and valuable feedback also helped 
shape this work. 
This work was made possible by the funding of University of Maryland?s NSF 
IGERT program (#DGE-0801465), the NSF Science of Learning Center on Visual 
Language and Visual Learning at Gallaudet University (#SBE-0541953), and an NSF 
Doctoral Dissertation Improvement Grant (#BCS-1025530).  In addition to my 
department, I would like to thank these programs, and in particular IGERT?s Colin 
Phillips and VL2?s Tom Allen and Diane Clark for all their support and for providing 
me with wonderful training opportunities.   
I would like to thank Clifton Langdon for his significant contributions to this 
work and for his friendship.  Collaborating with him has been extremely fun and 
productive, and I am also grateful that he has helped me learn to sign.  I would also 
like to thank Connie Pucci for her work and dedication to these projects, and I really 
admire her energy, time management, and leadership skills.  I am also grateful to 
Ver?nica Figueroa, whose own dissertation work had an important influence on me, 
and who shared her interests in research ? and cooking.   
There are many people who helped implement this project at various stages.  I 
would like to thank Dave Kleinschmidt, Yakov Kronrod, Vladimir Kronrod, Mirko 
Santoro, Anika Stephen, and Cecily Whitworth.  I would like to give special 
recognition to Nora Oppenheim, Ji-yun Han, and Lesa Young for their contribution to 
the analysis of corpus data.  I would like to thank Ceil Lucas for sharing with me her 
ASL corpus (funded by NSF grants #SBR-9310116 and #SBR-9709522).  I am also 
grateful to Karen Emmorey for her interest in this work and our many discussions on 
iv 
modality and working memory.  I would also like to thank all the participants of the 
experiments for their contribution to this work. 
I am fortunate to have worked with many wonderful people in other areas of 
research.  I would like to thank Ariane Rhone for her friendship ? she has taught me 
so much, and our adventures together included co-teaching and two defenses.  I 
appreciated the opportunity to collaborate with Derek Monner, Karen Vatz, Giovanna 
Morini, and Robert DeKeyser and to learn a lot about bilingualism, working memory, 
and computational modeling along the way.  I would also like to thank Phil Monahan 
for providing me with my first training in MEG and working with me on speech 
perception experiments.   
I am grateful to everyone at Maryland and at VL2 for their support, 
encouragement, and friendships.  At Maryland, I would especially like to thank Eri 
Takahashi, Ellen Lau, Mathias Scharinger, Bridget Samuels, Shannon Barrios, Wing-
 Yee Chow, Sunyoung Lee, Julian Jenkins, Pedro Alcocer, and Diogo Almeida. At 
VL2, I would especially like to thank Gabrielle Jones, Lynn Hou, Peter Crume, and 
Shilpa Hanumantha, with whom I served on the Student Leadership Team this past 
year.  It has been a privilege to meet and work with many others who have touched 
my life during graduate school.   
Finally, I am so thankful for my family.  Thank you, thank you, thank you. 
v 
Table of Contents 
 
Dedication.................................................................................................................ii 
Acknowledgements ..................................................................................................iii 
Table of Contents ......................................................................................................v 
List of Tables ...........................................................................................................vi 
List of Figures .........................................................................................................vii 
1 Introduction.........................................................................................................1 
1.1 Overview ....................................................................................................... 1 
1.2 Why sign language?....................................................................................... 4 
1.3 Temporal integration windows..................................................................... 11 
1.4 Neural correlates of temporal integration windows....................................... 16 
1.5 Oscillation of sub-lexical units in language .................................................. 21 
1.6 Rates of processing in language ................................................................... 24 
1.7 Outline of the dissertation ............................................................................ 31 
2 Temporal integration windows in sign language ................................................ 33 
2.1 Introduction ................................................................................................. 33 
2.2 Cognitive restoration of locally time-reversed sentences .............................. 35 
2.3 Flexibility of perceptual parameters to rates ................................................. 42 
2.4 Perspectives from development and bilingualism......................................... 47 
2.5 Experiment 1 ? Effect of modality on temporal integration 
windows: evidence from local-reversals of ASL sentences........................... 56 
2.6 Experiment 2 ? Effect of modality-independent mechanisms on 
temporal integration windows: evidence from compression and 
local-reversals of ASL sentences.................................................................. 70 
2.7 Experiment 3 ? Effect of developmental factors on temporal 
processing: evidence from late-learners of ASL ........................................... 75 
2.8 Conclusion................................................................................................... 81 
3 Temporal Dynamics in Natural Production ........................................................ 89 
3.1 Introduction ................................................................................................. 89 
3.2 Bellugi & Fischer (1972) revisited: Beyond the rate of signs........................ 97 
3.3 Perspectives from information theory......................................................... 113 
3.4 Words, signs, morphemes, and syllables .................................................... 119 
3.5 Rates in spoken languages: English and Korean......................................... 125 
3.6 Rates in sign language: ASL revisited ........................................................ 139 
3.7 Conclusion................................................................................................. 151 
4 Conclusion ...................................................................................................... 163 
4.1 Overview ................................................................................................... 163 
4.2 More than meets the eye ............................................................................ 164 
4.3 Hierarchical coupling in sign language processing?.................................... 167 
4.4 Innate sensitivity to rhythms in language.................................................... 170 
4.5 Channel capacity for sign language............................................................ 174 
4.6 Availability of two communication channels? ............................................ 176 
4.7 Rates in production and time-course of recognition.................................... 182 
4.8 General conclusions ................................................................................... 185 
Bibliography ......................................................................................................... 191 
vi 
List of Tables  
 
Table 1. Adapted from Krentz & Corina (2008), this table lists some of the 
qualitative differences between pantomime and ASL.......................................... 6 
Table 2. Examples are adapted from Bellugi & Fischer (1972).  These pairs of 
sentences demonstrate differences between English and ASL constructions. .... 98 
Table 3. Adapted from Brentari (2002), who describes the typological 
distribution of canonical word shapes. These assumptions are reexamined 
throughout the current discussion in Chapter 3 because they require an 
examination of syllable and morpheme rates and the ratio of these rates for 
languages. ...................................................................................................... 120 
 
vii 
List of Figures  
 
Figure 1. Reproduced from Petitto, Solowka, Sergio, Levy, & Ostry (2004), this 
figure shows the distribution of the frequencies (in Hz) of the manual 
movements among sign-exposed and speech-exposed babies.  Sign-exposed 
babies had movements that were at two different frequencies, where manual 
babbling in the signing space was marked by a slower rhythm (~1 Hz) than 
ordinary gestures outside the signing space (~2.5 Hz), whereas speech-
 exposed babies had movements at a higher frequency (~3 Hz).......................... 22 
Figure 2. Reproduced from Fischer, Delhorne, & Reed (1999), these figures 
show the intelligibility of stimuli as a function of playback rates for 14 
participants.  Error bars represent plus or minus one standard deviation of the 
mean.  With sentences, a sharp drop in intelligibility is found at compressions 
by a factor of 3. ................................................................................................ 28 
Figure 3. Reproduced from Ghitza & Greenberg (2009), this graph shows the 
percent error in an intelligibility experiment, where sentences were 
compressed by a factor of 3 and silences were inserted periodically or 
aperiodically.  Error bars represent the standard deviation of the mean. ............ 30 
Figure 4. Reproduced from Greenberg & Arai (2001), this figure demonstrates 
how locally-reversed speech stimuli are created.  Here, each 80 ms segment 
is played backwards, but the original order of the segments is maintained. ....... 36 
Figure 5. Reproduced from Saberi & Perrott (1999), this figure shows subjective 
intelligibility ratings by 7 participants on a single sentence that was repeated 
for all conditions. ............................................................................................. 37 
Figure 6. Reproduced from Greenberg & Arai (2001), this figure demonstrates 
1) the spectrogram of locally reversed sentences, 2) the intelligibility curve as 
a function of reversal sizes, and 3) the complex modulation spectrum of the 
sentences.  Intelligibility results are from 27 participants tested on 40 
sentences.  Intelligibility of sentences falls drastically between 40 and 50 ms 
reversals, falling to 50% at 60 ms reversals, and reaches ~0% by 100 ms 
reversals. .......................................................................................................... 38 
Figure 7. Reproduced from Miller & Licklider (1950), this figure demonstrates 
the intelligibility of English sentences as a function of frequency of 
interruption and speech-time fraction (where the duration of interruptions 
were dependent on the frequency of the interruptions and speech-time 
fractions and were spaced regularly)................................................................. 40 
Figure 8. Reproduced from Green & Miller (1985), this figure demonstrates that 
perceptual boundary, reflected by the percentage of voiced responses for [bi]-
 [pi] continuum, varies depending on durations.................................................. 43 
Figure 9. Reproduced from Figueroa (2009), this figure shows the intelligibility 
of English sentences as a function of compression and reversal size. ................ 45 
Figure 10. Reproduced from Stilp, Kiefte, Alexander, & Kluender (2010), this 
graph shows intelligibility curves of English sentences as a function of the 
size of local-reversals (segment durations in ms) and speech rates (in 
syllables per second: slow = 2.5, medium = 5.0, fast = 10). .............................. 46 
viii 
Figure 11. Reproduced from Tweney, Heiman & Hoemann (1977), this figure 
shows the intelligibility of ASL and English sentences as a function of 
temporal disruption frequency and signing/speech-time fractions.  These 
results demonstrate that sign language is more resistant to temporal 
disruptions than speech..................................................................................... 59 
Figure 12. Demonstration of how locally time-reversed stimuli were created for 
sentences of ASL. This specific example shows reversals 133 ms in duration 
(reversals by 4 frames). .................................................................................... 61 
Figure 13. Results from Experiment 1 from 14 participants, demonstrating the 
intelligibility curve of ASL sentences as a function of reversal size, which 
implicates ~300 ms temporal integrations windows.  50% intelligibility of 
even the most degraded stimuli is attributed to spatial-encoding in sign 
language. Error bars represent plus or minus one standard error of the mean. ... 63 
Figure 14. Reproduced from Liddell (2000), this illustration represents the sign 
for GIVE, where the direction of movement can mean I-GIVE-YOU but the 
reverse would result in the opposite meaning YOU-GIVE-ME......................... 67 
Figure 15. Results from Experiment 1 and 2 (14 participants in each experiment), 
demonstrating the intelligibility curve of ASL sentences as a function of 
reversal size and compression by a factor of 2, where temporal integration 
windows are proportional to the input rate (indicated by a sharp drop in 
intelligibility at ~267 ms reversals at the normal rate and ~133 ms reversals 
at the 2x rate).  These results suggest that temporal integration windows in 
sign language are determined by the rate and durations of linguistic units. 
Error bars represent plus or minus one standard error of the mean. ................... 73 
Figure 16. Results from Experiment 1 and 3, demonstrating the effects of age-of-
 acquisition in processing time-distorted stimuli. Note: n=14 in Experiment 1 
and n=8 in Experiment 3.  Late learners demonstrate greater sensitivity to 
time distortions in the input, but performance among the early and late 
learners plateau at similar distortion scales. Error bars represent plus or 
minus one standard error of the mean. .............................................................. 79 
Figure 17. Examples of signs used by Wilson (2001), with images from 
www.aslpro.com (top) and www.signingsavvy.com (bottom).  The top row 
shows images taken from a video recording of BRIDGE, a two-contact sign 
that involves hopping motion from the wrist to the elbow.  The bottom row 
shows images from a video recording of CREDIT-CARD, a one-contact sign 
that involves sliding motion from the palm and outward across the hand. ......... 93 
Figure 18. Reproduced from Schroeder, Lakatos, Kajikawa, Partan, & Puce 
(2008), this figure illustrates the hierarchical coupling of neural oscillations. ... 95 
Figure 19. Reproduced from Brentari, Poizner, & Kegl (1995) (and Brentari 
(1998)), this figure demonstrates sign-internal and sign-external transitions in 
an ASL sentence.  The above sentence is WORD BLOW-BY-EYES MISS 
SORRY (?The word went by too quickly. I missed it, sorry?).......................... 102 
Figure 20. Reproduced from Bosworth, Dobkins, & Wright (2010), this figure 
demonstrates the 2D movement trace for an elicited sentence containing the 
sign KNOW. .................................................................................................. 111 
ix 
Figure 21. Reproduced from Hale (2001), this figure demonstrates how entropy 
(or ?surprisal?) fluctuates over the course of a sentence. ................................. 117 
Figure 22. Adapted from Mathur & Rathmann (2011), this figure demonstrates an 
example of numeral incorporation in ASL. ..................................................... 121 
Figure 23. Reproduced from Mathur & Rathmann (2011), this figure 
demonstrates the grammatical form for TEN DAY and the ungrammatical 
form TEN+DAY that would result with numeral incorporation.  The latter is 
believed to be not possible due to phonological constraints against complex 
movement. ..................................................................................................... 122 
Figure 24. Estimated probability density functions for the length in seconds of 
sentences in two corpora of English: TIMIT (prompted) and CALLFRIEND 
(conversational).............................................................................................. 129 
Figure 25. Estimated probability density functions for words rates (words per 
second) of sentences in two corpora of English: TIMIT (prompted) and 
CALLFRIEND (conversational)..................................................................... 130 
Figure 26. Estimated probability density functions for syllable rates (syllables per 
second) of sentences in two corpora of English: TIMIT (prompted) and 
CALLFRIEND (conversational)..................................................................... 131 
Figure 27. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences in two corpora of English: TIMIT 
(prompted) and CALLFRIEND (conversational). ........................................... 132 
Figure 28. Estimated probability density functions for length in seconds of 
sentences from conversational data in English and Korean. ............................ 135 
Figure 29. Estimated probability density functions for word rates (words per 
second) of sentences from conversational data in English (a more analytic 
language) and Korean (a more synthetic language). ........................................ 136 
Figure 30. Estimated probability density functions for syllable rate (syllables per 
second) of sentences from conversational data in English and Korean. ........... 137 
Figure 31. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences from conversational data in English 
and Korean..................................................................................................... 138 
Figure 32. Estimated probability density functions for length in seconds of 
sentences from conversational data in English, Korean, and ASL. .................. 144 
Figure 33. Estimated probability density functions for word/sign rates (words or 
signs per second) of sentences from conversational data in English, Korean, 
and ASL.  This comparison word and sign rates replicate the findings from 
Bellugi & Fischer (1972) for English and ASL.  A comparison with Korean 
demonstrates that word rates depend on grammatical properties of the 
language......................................................................................................... 145 
Figure 34. Estimated probability density functions for syllable rates (syllables per 
second) of sentences from conversational data in English, Korean, and ASL. 
Syllables rates in ASL may be the basis for the temporal integration window 
of ~250-300 ms found in Experiment 1 in Chapter 2....................................... 146 
Figure 35. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences from conversational data in English, 
Korean, and ASL.  This figure demonstrates that English and Korean, two 
x 
spoken language with distinct grammars, have the same morpheme rate (~6 
per second), in contrast with the morpheme rate in ASL (~3 per second). ....... 147 
Figure 36. The comparison of morpheme:syllable ratios in English, Korean, and 
ASL suggests that the globally, morphemes and syllables are processed at 
approximately the same rate.  However, the results from ASL are different 
from spoken languages in that the ratios reveal a trimodal distribution.  This 
may be attributed to properties unique to sign languages, such as productive 
use of reduplication (resulting in ratios lower than 1:1) and productive use of 
spatial modulations (resulting in ratios higher than 1:1), in addition to simple 
signs..... .......................................................................................................... 150 
Figure 37. Reproduced from Padden & Perlmutter (1987), where reduplicating 
circular movement turns the adjective QUIET to mean ?characteristically 
quiet?, or taciturn............................................................................................ 156 
Figure 38. Reproduced from Aronoff, Meir, & Sandler (2005), demonstrating a 
complex ASL classifier construction: ?A person walks forward, (dragging) a 
dog squirming behind.? .................................................................................. 157 
Figure 39. Reproduced from Boyes-Braem (1999), demonstrating the difference 
between early and learners of Swiss German Sign Language in their lateral 
torso movements while signing. ..................................................................... 173 
Figure 40. Reproduced from Jantunen (2010), demonstrating the acceleration 
peaks in the biomechanics of both hands while signing, annotated for 
traditional sign boundaries and transitions between signs. .............................. 184 
1 
1 Introduction 
 
1.1 Overview  
 
 
The goal of this dissertation is to contribute to a better understanding of the 
universal temporal processing constraints in the perception and production of 
language and how they are manifested in particular sensori-motor channels.  This 
endeavor requires a cross-linguistic, and crucially, a cross-modal, study of the 
temporal dynamics in language processing.  Building upon a large body of previous 
work on spoken languages (Poeppel, 2003; Poeppel, Idsardi, & van Wassenhove, 
2008; van Wassenhove, Grant, & Poeppel, 2007; Viemeister & Wakefield, 1991; 
Yabe, Tervaniemi, Sinkkonen, Huotilainen, Ilmoniemi, & N??t?nen, 1998; Ahissar, 
Nagarajan, Ahissar, Protopapas, Mahncke, & Merzenich, 2001; Saberi & Perrott, 
1999; Greenberg & Arai, 2001; Figueroa, 2009; Stilp, Kiefte, Alexander, & Kluender, 
2010), this dissertation is the first to investigate temporal integration windows and 
processing rates in American Sign Language (ASL).  A key aim of speech perception 
research is to determine how an acoustic signal is mapped onto meaningful linguistic 
representations.  However, the existence of sign languages demonstrates that visual 
signals can also be transformed into rich grammatical meaning.  Thus, to more 
broadly understand how linguistic information is extracted from a sensory signal, 
common mechanisms in spoken and signed languages must be identified.   
This dissertation?s focus on temporal aspects of language processing is 
inspired by two perspectives in language research.  Psychophysical investigations of 
speech reveal an intimate connection between the temporal properties of the acoustic 
2 
stimulus and corresponding behavioral and neural responses.  The biomechanics of 
the articulators in the vocal tract create a dynamic acoustic signal with rapidly 
changing spectro-temporal information, which is transmitted through the air and then 
through the auditory pathway. Work in theoretical linguistics and experimental 
neuroscience has suggested that the information in this signal is layered into levels 
that correspond to units of linguistic representations (phonemes and syllables) and 
that neural processes underlying speech perception also occur in multiple time scales 
(Poeppel, Idsardi, & van Wassenhove, 2008).  
One possibility is that all the time properties we observe in speech come from 
the particular properties of the oral articulators, coordination with breathing, and the 
auditory pathway, but sign language research suggests otherwise.  When comparing 
the rate of production in English and ASL, where the semantic content of the 
narratives were roughly matched, studies found that the propositional rate in these 
two languages is the same (Bellugi & Fischer, 1972).  Although differences in rates 
emerged when looking at the level of words and signs, where on average twice as 
many words are produced per second (~4-5 words per second) as compared to signs 
(~2 signs per second), the overall result implicates a modality-independent basis for 
the rates found in language.  The fact that words and signs are not always equivalent 
linguistic units and the emergence of simultaneous morphology and spatial grammar 
in sign languages (Bellugi & Fischer, 1972; Senghas & Coppola, 2001; Aronoff, 
Meir, Padden, & Sandler, 2004; Mathur & Rathmann, 2011) have been attributed as 
the key underlying factors in these findings.  Klima and Bellugi (1979: 194) write, ?It 
is possible that the tendency toward compacting linguistic information in signs may 
3 
be a response to temporal pressure on language production.? When studying the 
propositional rate in signed English, which does not employ these strategies and 
notably is not a natural human language, the propositional rate is twice as slow as 
ASL (Klima & Bellugi, 1979). 
How language design allows for the interaction of core linguistic processes 
with two completely different sensori-motor systems, as well as other cognitive 
domains, remains a remarkable puzzle.  One of the goals of this dissertation is to 
show that slower temporal dynamics in sign production results in larger temporal 
integration windows in perception. I present evidence that these temporal integration 
windows come from mechanisms that are not just inherent to visual processing but 
from sensitivity to the time duration of linguistic units in sign language.  The 
methodology employed here is testing the intelligibility of locally time-reversed 
sentences as a function of reversal size. Although the time-scale of temporal 
integration windows in sign language differ from the findings in speech, the results 
point to universal patterns (integration according to durations of representational 
units).  A corpus-based study of rate of production in conversations taken from 
English, Korean, and ASL also provides greater insights on what are the relationships 
between time, form, and meaning in natural data. This research fits within the broader 
aims to disentangle properties that are inherent to core processes underlying language 
from those that are driven by modality. 
 
4 
1.2 Why sign language? 
 
 
A valid model for how humans process language requires coverage of 
typologically diverse languages, crucially including those that use different sensory 
channels and motor systems for communication.  Previous studies on temporal 
integration windows in language were limited to speech (Poeppel, 2003; Viemeister 
& Wakefield, 1991; Greenberg & Arai, 2001; Luo & Poeppel, 2007), making it 
difficult to determine whether ongoing processes were about auditory or more general 
mechanisms for analyzing linguistic input. 
One of the great discoveries of modern linguistic research has been the 
realization that sign languages are true languages with all of the fundamental 
properties shown by spoken languages (Stokoe, 1960; Klima & Bellugi, 1978; 
Emmorey, 2002).  A cross-modal approach to language research has been used 
productively to understand universal grammatical properties, the functional 
organization of language processing areas in the brain, and the developmental 
patterns seen in language acquisition. All languages have multiple levels of 
representation, including phonology, morphology, and syntax, with rules for how 
units in these domains combine (Sandler & Lillo-Martin, 2006).  Beneath the level of 
signs, sublexical phonological units combine in systematic and rule-constrained ways.  
Signs can vary in their degree of meaning complexity due to morphological 
processes.  In addition to having structural constituents, sign languages also show 
sensitivity to island constraints (Padden, 1988; Lillo-Martin, 1991; Ross, 1967).   
5 
Lesion and neuroimaging studies show that the same cortical areas support 
core language functions for both speakers and signers (Hickok, Bellugi & Klima 
1998; Emmorey, Mehta & Grabowski 2007; Petitto, Zatorre, Gauna, Nikelski, Dostle, 
& Evans, 2000).  Previously, speculations about the left-hemisphere dominance in 
spoken languages pointed to a specialization for processing rapidly changing 
temporal information (Tallal, Miller, & Fitch, 1993).  Moreover, the topographic 
location of Broca?s area near speech-production areas of the motor cortex and 
Wernicke?s area near speech-perception areas of the auditory cortex raised the 
possibility that both areas mainly support the function of spoken languages.  
However, neuroimaging studies among deaf signers show overlapping activation in 
these areas (Emmorey, Mehta & Grabowski 2007; Petitto, Zatorre, Gauna, Nikelski, 
Dostle, & Evans, 2000), and damage to those areas result in similar aphasic profiles 
(Hickok, Bellugi & Klima 1998).   
Although sign languages use manual articulators, the linguistic status of their 
movements is distinct from gesture (see Table 1 for a comparison of linguistic and 
non-linguistic gestures). Evidence for dissociations of sign language and non-
 linguistic gesture has been found in lesion cases, where a patient?s production and 
comprehension of non-linguistic gestures remained intact while performance on sign 
language was impaired (Corina, Poizner, Bellugi, Feinberg, Dowd, & O?Grady-
 Batch, 1992).  In development, at the age of 6 months, even hearing infants with no 
previous input to signing treat videos of signing and pantomime movements 
differently, which has been suggested as evidence that children are born with a 
6 
specific interest in linguistic signals, whether they are produced vocally or manually 
(Krentz & Corina, 2008). 
 Pantomime ASL 
Handshapes Few handshape types  
Simple handshapes: e.g. broad (open 
hand) and compact (closed hand) 
variants 
Multiple handshape types  
Complex handshapes: e.g. broad 
(open), compact (closed) and 
intermediate variants (?V? index and 
middle fingers open, ring and pinky 
fingers closed; ?F? index finger and 
thumbing touching, middle, ring and 
pinky finger open) 
Location Numerous on- and off-body locations 
(e.g. above head, below waist, behind 
body) 
Adherence to limits of defined signing 
space (head to waist, directly in front of 
signer) 
Movement More movement types  
More undefined movement types  
Frequent repetitions of movements 
Fewer movement types  
More defined movement types  
Limited number of repetitions 
Eyes and brows Eyes follow actions that model 
performs; eyes are not independent of 
actions 
More eye contact with camera; actions 
are independent of eyes 
Facial expression Expressivity based on actions 
performed (e.g. frustration at trying to 
fix hair, satisfaction in finishing a 
task) 
More rapid changes in mouth  
More varied movement in mouth 
 
Table 1. Adapted from Krentz & Corina (2008), this table lists some of 
the qualitative differences between pantomime and ASL. 
 
Parallels seen in the developmental course of young children, from babbling 
to putting words together, has contributed evidence for biological maturation of a 
modality-independent language faculty.  In the same study described above (Krentz & 
Corina, 2008), at 10 months of age, children no longer treated signing and pantomime 
movements differently.  This is convergent with studies on infant speech perception, 
where preference for native input is sharpened and sensitivity to distinctions in non-
 native languages shows significant declines around 10 months (Werker, Gilbert, 
Humphrey, & Tees, 1981).  More broadly, similar linguistic milestones are observed 
among deaf and hearing children (Newport & Meier, 1985; Lillo-Martin, 1999). 
Word learning at around the first year of life (Bonvillian & Folven, 1993) is preceded 
7 
by babbling, which in of itself has numerous stages (de Boysson-Bardies, 1993; 
Meier & Willerman, 1995; Masataka, 2003). Early forms of manual babbling are 
attested among all infants, but only sign-exposed children develop complex 
handshape and movement patterns (Petitto & Marentette, 1991).  In addition, the class 
of utterances produced in babbling is predictive of the phonological features of first 
words in both modalities (Oller, Wieman, Dole, & Ross, 1976; Cheek, Cormier, 
Repp, & Meier, 2001).  
Other parallels include patterns in vocabulary growth, increases in 
grammatical complexity, and even similarities in errors during the acquisition 
process, which includes phonological errors (Conlin, Mirus, Mauk, Meier, 2000; 
Meier, 2006; Masataka, 2003), overgeneralization (Meier, 1987), and pronoun errors 
(Petitto 1987; Jackson, 1989; Meier & Newport, 1990).  However, this holds true 
only in cases where children are receiving signing input since birth. Because >95% of 
deaf individuals are born to hearing parents (Mitchell & Karchmer, 2004), age of 
exposure and acquisition of a sign language greatly vary, along with levels of ultimate 
attainment.  Comparison of early and late learners of ASL, as well as comparisons of 
late learners for whom ASL is either an L1 or L2, provide valuable insights on the 
impact of critical periods for language development.  Similar to the distinctness 
observations for L1 and L2 acquisition among spoken languages, later acquisition of 
sign languages is marked by different profiles in perception and production compared 
to native learners (Kantor 1978; Newport, 1990; Mayberry & Eichen, 1991; Morford 
& Mayberry, 2000). In the case of spoken languages, differences in L2 performance 
are confounded with L1 entrenchment, where deficits in L2 may be attributable to 
8 
interference effects from L1 (MacWinney, 2006).  Signers who are exposed to 
English as an L1 in early childhood and then later acquire ASL as an L2 outperform 
signers who receive little to no input in early childhood until exposure to ASL as an 
L1 in late childhood.  This suggests that entrenchment cannot be the main factor 
behind age-effects in acquisition.  Growing research from sign language research 
(Newport, 1990; Mayberry, 1993; Mayberry, del Giudice, & Lieberman, 2010; 
Wilbur, 2000; inter alia) continues to highlight the importance of early language 
exposure ? whether spoken or signed ? for full language development. 
Studying sign language users has been relevant to the research of bilingualism 
more broadly.  Signers fit many profiles in bilingualism.  As mentioned above, 
depending on the onset of hearing loss, degree of hearing loss, and type of early 
education, a spoken language is the first language for many deaf individuals.  For 
hearing individuals who are born to deaf parents, often referred to as ?CODAs? 
(children of deaf adults), sign language is their first language, with acquisition of a 
spoken language from mainstream society.  Finally, deaf individuals who are born to 
deaf parents and grow up in signing environments both at home and school still have 
considerable experience using English in the United States through reading and 
writing.   
Growing evidence from bilinguals who use two spoken languages shows that 
both languages are active even when using only one of those languages (Marian & 
Spivey, 2003; Kroll, Bobb, & Wodniecka, 2006). One possibility is that these co-
 activation effects are dependent on shared modality, but recent work on the bilingual 
activation of English and ASL among deaf signers suggests otherwise (Morford, 
9 
Wilkinson, Villwock, Pi?ar, & Kroll, 2011).  Here, the reaction time of deaf signers 
on making judgments about the semantic relatedness of a given pair of English words 
was slowed down or speeded up based on the phonological similarity of equivalent 
ASL signs, which were not presented at any point during the experiment. Such 
patterns were not found among a group of sign-na?ve participants. 
Cases of bimodal bilingualism with hearing participants who are fluent in a 
spoken and signed language provide unique opportunities to study how bilingualism 
is manifested when two articulatory channels are available.  Unlike with two spoken 
languages, a spoken and signed language can be produced simultaneously, leading to 
common cases of code-blending (Pyers & Emmorey, 2008; Casey & Emmorey, 
2009).  Moreover, testing cognitive control on bimodal bilinguals allows for a better 
understanding of the ?bilingual advantage? that is reported for unimodal bilinguals 
(Bialystok, 2001), specifically whether better cognitive control is caused by switching 
between any two languages or whether it requires switching within one modality 
(Emmorey, Luk, Pyers, & Bialystok, 2008).  Emmorey et al. (2008) found that 
bimodal bilinguals did not perform differently from monolinguals.  Despite the 
availability of two different channels, it is important to note that simultaneous 
production of both English and ASL is extremely difficult because of the large 
difference in their grammars and other processing constraints, leading performance in 
both languages to suffer (Wilbur & Petersen, 1998).  The degree to which both 
languages are activated among bimodal bilinguals and the extent to which cognitive 
control is exercised by bimodal bilinguals while sticking to one modality or code-
 blending remains unclear.  Among deaf adults who are bilingual in ASL and written 
10 
English, better performance on higher-order attention tasks are found among those 
with high proficiency in both languages (Kushalnagar, Hannay, & Hernandez, 2010). 
Sign language research has also contributed insights on the interaction of 
sensory experience and language modality with other aspects of cognition.  Cortical 
reorganization following auditory deprivation results in differences in visual 
attention, where greater attention is allocated to peripheral areas (Bavelier, Dye, & 
Hauser, 2006). This is not true for bilingual hearing signers, who show the same 
profile of devoting greater attentional resources to central fields of vision as non-
 signing hearing individuals (Bavelier, Dye, & Hauser, 2006).  However, experience 
with a sign language does transfer to differences in visual processing in some cases, 
in particular with tasks that involve mental imagery and rotation. When compared to 
non-signers, both deaf and hearing signers were found to have enhanced abilities in 
mental rotation tasks and in generating complex images (Emmorey, Klima, & 
Hickok, 1998; Emmorey & Kosslyn, 1996; Emmorey, Kosslyn, & Bellugi, 1993).  
Early signing exposure has also been associated with enhanced visuo-spatial working 
memory, as tested by Corsi block experiments (Milner, 1971), where the participant 
has to identify the sequence of locations that were indicated by the experimenter 
(Wilson, Bettger, Niculae, & Klima, 1997; Parasnis, Samar, Bettger, & Sathe, 1996).  
Only deaf children with early exposure to sign language had higher spans than 
hearing children in this spatial processing task.   
Much of past research has shown how spoken and signed languages are 
similar, but a newer challenge has been to also make sense of their differences. Meier, 
Cormier, and Quinto-Pozos (2002) provide a useful overview that focuses on these 
11 
issues.  A critical aspect of determining modality effects in language processing is 
understanding the physiological properties of the sensori-motor channels. However, it 
is also important to consider the differences in the diachronic history of these two 
types of languages, where most sign languages are relatively young and are 
frequently reinvented by its users, most of whom are not exposed to sign language 
since birth.  While acknowledging the status of sign languages as full-fledged, natural 
human languages, recognizing and understanding the differences between spoken and 
sign languages can lead to better targeted strategies to improve learning and education 
throughout development for each population. 
 
1.3 Temporal integration windows 
 
The experience of many perceptual phenomena, including language 
processing, feels seamless and continuous, but from a neurophysiological and 
computational perspective, sensory inputs are analyzed in chunks, or time windows, 
that lead to discrete units and combinatorial mechanisms.  Integration windows exist 
at more than one time-scale, and the process that occurs at each level may differ 
(Viemeister & Wakefield, 1991). In perceptual terms, temporal integration windows 
are considered to be time durations for the summation of the input. Under certain 
characterizations, the information about dynamics that occur at smaller time-scales 
may be lost, leading to limits in temporal resolution. Physiologically, the lower limits 
of temporal encoding may be tied to the duration of action potential spikes and 
subsequent refractory period, leading to upper limits of sampling rates, and the 
12 
properties of individual sensory pathways.  At the level of a neuron, the duration of 
all the processes that contribute to the spiking output is characterized as the 
integration window (Theunissen & Miller, 1995).  Psychophysically, the temporal 
resolution of the auditory system is ~2 ms and ~20 ms for the visual system.   These 
values refer to the smallest time gaps that can be detected between a sequence of 
inputs in each respective domain (Green 1971; Kohlrausch, P?schel, & Alphei,1992; 
Chase & Jenner, 1993).   
As Viemeister and Wakefield (1991) emphasize, however, it is important to 
distinguish the phenomenon from the process underlying integration.  In their model 
of auditory processing, integration windows that occur at larger time-scales (~200 
ms) arise from different mechanisms than those that set the temporal resolution of 
sensory processing (2-3 ms). The larger time window is mediated by short-term 
memory, where samplings of the processed input are stored and remain available for 
comparisons and further computations.  This ?multiple looks? model is one way to 
account for the phenomena of smaller as well as larger temporal integration windows 
in auditory perception. 
A model for integration at multiple time scales is critical to understanding 
many aspects of cognition, including sensory processing, multi-sensory integration, 
and sensori-motor coordination (P?ppel, 1997).  Representations in language are 
organized into a hierarchy that includes units in phonology, morphology, and syntax.  
In on-line processing, these units unfold in different time scales.  Determining how 
acoustic information is mapped onto the building blocks for the representation of 
words is the focus of speech perception research.  One of these challenges includes 
13 
identifying the spectro-temporal characteristics of particular features or phonemes 
(Ladefoged, 2005). More broadly, temporal properties of speech contain correlated 
information about linguistic features (Rosen, 1992).  In particular, fine-structure 
information in smaller windows (20-40 ms) is critical to the identification of place-of-
 articulation features of stop segments, and the amplitude-modulated envelopes in 
larger windows (150-300 ms) for the perception of syllables (Poeppel, 2003). 
Temporal integration windows roughly 150 ms in duration are reported in a 
study that tested facilitation to vowel identification via two types of primes that 
ranged in durations of 25 ? 500 ms (Wallace & Blumstein, 2009).  Priming at short 
durations (25 ? 500 ms) was only found when nonspeech tone complexes (which 
were matched to formant frequencies of the vowels) were used.  These facilitation 
effects were strongest up to prime durations of ~150 ms. Overall, facilitation effects 
were much more robust with vowel primes, where speeded reaction times peaked 
with prime durations in the range of 100 ? 150 ms. These results are compatible with 
the multi-time scale model for processing speech proposed by Poeppel (2003). 
In the domain of audio-visual processing of speech, temporal integration 
windows also constrain how information from different sensory sources can be 
combined (van Wassenhove, Grant, & Poeppel, 2007).  The integration of audio-
 visual information is evident in cases like the ?McGurk effect,? where the percept of 
[ta] is neither the information in the acoustic [pa] nor the visual signal [ka] (McGurk 
& MacDonald, 1976).  Studies have investigated to what degree this phenomenon 
requires temporal alignment (Dixon & Spitz, 1980; Massaro, Cohen, & Smeele, 1996; 
McGrath & Summerfield, 1985; Pandey, Kunov, & Abel, 1986; Munhall, Gribble, 
14 
Sacco, & Ward, 1996; van Wassenhove, Grant, & Poeppel, 2007).  Among those for 
whom the fused percept is possible at all, approximately ~200 ms of audio lag can be 
tolerated for the fused percepts to be recognized in these audio-visual speech 
experiments.   
The study of temporal integration windows cover a broad range of processing 
in cognition, where the temporal resolution of sensory encoding is only one 
component.  Experimental materials also appear to influence the size of these 
windows in ways that cannot be solely attributable to the sensory process.  In contrast 
to these findings from audio-visual speech, a study that used non-speech stimuli 
(white noise and LED light) found smaller windows (~100 ms) within which 
simultaneous percept was possible with asynchronous input (Zampini, Guest, Shore, 
& Spence, 2005).  A possible explanation for the longer duration in the integration of 
speech-stimuli is associated with the average syllable duration across languages (van 
Wassenhove, Grant, & Poeppel, 2007; Arai & Greenberg, 1997; Whalen & Liberman, 
2000).  
The multi-time scale model for speech perception (Poeppel, Idsardi, & van 
Wassenhove, 2008) is inspired in part by theories on the multiple spatial resolution in 
vision, where information is analyzed both locally and globally, at various spatial 
frequencies to recover information (Merigan & Maunsell, 1993).  In vision, high and 
low spatial frequency information is processed by different populations of neurons in 
different parts of the visual cortex (Singer & Gray, 1995).  The fact that a coherent 
percept can result from more than one anatomical and functional organization is 
called the binding problem.  Singer and Gray propose that binding is achieved by the 
15 
time synchronization of neural activity, which is often associated with oscillatory 
firing patterns.  Poeppel et al. (2008:1076) write, ?Whereas in the visual case the 
image can be fractionated into different spatial scales, in the auditory case both 
frequency and time can be thought of as dimensions along which one could 
fractionate the signal.? Nevertheless, no such approach has been used to determine 
temporal integration windows in the visual processing of sign language.  
Although this discussion has focused on the role of time windows as durations 
for integration, their existence may also entail a process of discretization, which 
remains poorly understood (Van Rullen & Koch, 2003).  In vision, the perception of 
continuous motion is understood to be only ?apparent,? meaning that viewers 
construct the experience by piecing together a sequence of discrete images 
(Wertheimer 1912; Korte 1915). When reading, the discrete, saccadic movements of 
the eyes track blocks of linguistic constituents that are not marked by any special 
spacing or punctuation (Rayner, 1998).  Discretization of the input goes well beyond 
sensory processing to guidance from knowledge about the representational nature of 
the signal.  To distinguish between words like bear and pear, listeners are 
categorically tuned to the timing difference between the onset of laryngeal voicing and 
the onset of the stop release burst.  Even when presented with tokens that are varied 
continuously along this parameter, listeners process these sounds categorically (Lisker 
& Abramson, 1964; Lisker, 1975; Klatt, 1975). When attending to a continuous 
speech stream, listeners automatically extract a sequence of segments, syllables, 
words, and phrases, which is easy to take for granted until one is trying to 
communicate in a foreign language. Similarly, when non-signers view sentences of 
16 
ASL, accurately discerning the boundaries between the signs can be difficult, a 
process that is automatic for signers (Brentari, 2006). Thus, although many perceptual 
experiences appear holistic and continuous, research from a wide range of domains 
reveal underlying mechanisms that are discontinuous.   
Although temporal integration windows are not proposed here to account for 
all of these phenomena, they are listed here to demonstrate some of the challenges in 
understanding both the continuous and discrete aspects of perception. Van Rullen and 
Koch (2003) suggest that studies on the temporal dynamics of neural oscillations at 
different frequency bands may result in a more unified explanation of integration and 
discreteness in perception.  With this idea in mind, the following section summarizes 
work investigating the neural correlates of temporal integration windows in speech.  
A discussion of studies using locally time-reversed stimuli to study temporal 
integration windows, which is the primary methodology used in the perceptual 
experiments of this dissertation, will be provided in depth in Chapter 2. 
 
1.4 Neural correlates of temporal integration windows  
 
In speech perception, two particular time-scales, those that correspond to 
segments (~50 ms durations) and syllables (~200 ms durations), are especially 
interesting because of the convergence of findings about the time-scales of acoustic 
fluctuations in speech (Rosen, 1992) and neural oscillations (Poeppel, 2003).  Marked 
by high temporal resolution, electrophysiology is currently the best available 
17 
methodology for investigating the potential neural correlates of these temporally 
dynamic processes in speech.  Hypotheses about these neural correlates are based on 
the acoustic properties of speech, behavioral studies on listeners? response to sounds 
where the fine-structure and temporal envelopes are manipulated (Shannon, Zeng, 
Kamath, Wygonski, & Ekelid, 1995; Zeng, Nie, Stickney, Kong, Vongphoe, 
Bhargave, Wei, & Cao, 2001), electrophysiological studies on temporal integration 
windows auditory processing (Yabe, Tervaniemi, Sinkkonen, Huotilainen, Ilmoniemi, 
& N??t?nen, 1998), and electrophysiological studies on neural oscillations for 
integration and binding in cognition (Buzs?ki & Draguhn, 2004; Engel, Fries, & 
Singer, 2001; P?ppel, 1997).   
Language comprehension requires successful integration of the sensory signal. 
One study (Ahissar, Nagarajan, Ahissar, Protopapas, Mahncke, & Merzenich, 2001) 
builds upon previous findings that intelligibility of sentences decreases as a function 
of the rate of compression (Foulke & Sticht, 1969). Speech compression algorithms 
that preserve the spectral and pitch content were applied (Portnoff, 1981). In this 
study, magnetoencephalography (MEG) was used to measure the degree to which 
cortical signals followed the speech signal modulation. It was found that the 
frequency of the evoked cortical signals and the temporal envelopes of the stimuli 
only matched at lower compression ratios (where sentences remained somewhat 
intelligible). These conditions also showed phase-locking between the speech 
envelope and MEG signal recorded from the auditory cortex, reflecting entrainment 
of the neural activity to speech signals and sensitivity to the temporal characteristics 
of speech.  Similar findings are reported by Luo and Poeppel (2007), where 
18 
intelligibility of sentences was modulated by creating ?auditory chimaeras,? where 
the envelope of one sound is combined with the fine structure of another sound 
(Smith, Delgutte, & Oxenham, 2002). Also using MEG, Luo and Poeppel report 
cortical activity that is correlated with speech intelligibility, more specifically, in the 
phase patterns of endogenous brain rhythms in the theta band (4-8 Hz), where period 
durations correspond to the average size of syllables in speech (~ 200 ms).  These 
findings implicate that an important aspect of processing speech is continuous 
segmentation and integration of the input in ~200 ms temporal windows (Poeppel, 
Idsardi, & van Wassenhove, 2008). 
Temporal integration windows in such time-scales converge with results from 
other electrophysiological studies on auditory processing (Yabe, Tervaniemi, 
Sinkkonen, Huotilainen, Ilmoniemi, & N??t?nen, 1998).  Using MEG, Yabe et al. 
(1998) varied stimulus-onset asynchronies (SOAs) of pure tones and tested the 
elicitation of the magnetic counterpart to the mismatch negativity (MMN) in 
electroencephalography (EEG).  The MMN is taken to be an index of change 
detection, whether caused by a change in stimuli or by an omission, which is thought 
to be implemented via a comparison to a neural memory trace of a repetitive sound 
(Cowan 1995; N??t?nen, 1992). Shorter SOAs improve the chances that a stimulus is 
in the same temporal window of integration as the previous item.  Results showed 
that MMNs were only elicited at SOAs shorter than 175 ms, supporting a temporal 
integration window of 150-175ms in auditory processing. This time window and 
model for MMN effects are compatible with the ?multiple looks? theory of 
19 
Viemeister and Wakefield (1991), where integration windows at these larger time-
 scales are mediated by short-term memory. 
Processing speech in ~200 ms time windows for syllables, as well as smaller 
~50 ms windows for segments, requires neural substrates that are sensitive to 
temporal information that fluctuates at those intervals.  Such evidence comes from a 
functional imaging study using fMRI (Boemio, Fromm, Braun, & Poeppel, 2005), as 
well as a simultaneous EEG-fMRI study (Giraud, Kleinschmidt, Poeppel, Lund, 
Frackowiak, & Laufs, 2007).  In addition to activations that are sensitive to these time 
windows, both studies report hemispheric asymmetries in these sensitivities.  Boemio 
et al. (2005) used non-speech stimuli with varying spectro-temporal properties, with 
modulations in segment length and frequency sweep patterns.  Among three regions 
of interest (superior temporal gyrus (STG), superior temporal sulcus (STS), and 
transverse temporal gyrus (TTG)) in auditory cortical areas, activity in STG was 
sensitive to the local spectro-temporal structure, where effects were greatest at 45 ms 
SOAs, a time scale that is important for extracting information about segment 
boundaries.  Activity in STS was not sensitive to segment type but showed 
hemispheric asymmetry to segment duration (or stimulus rate) and the greatest 
response was to those with durations >85 ms.  Giraud et al. (2007) demonstrate that 
spontaneous EEG rhythms in the gamma range (30-50 Hz), where periods correspond 
to the duration of phonemes, and theta range (4-7 Hz), where periods correspond to 
the duration of syllables, show similar hemispheric asymmetries in the auditory 
cortex (convergent with the model proposed by Poeppel (2003)), and suggest that 
20 
these endogenous oscillations serve as important precursors that support the function 
of speech processing. 
To what degree this multi-time resolution model of speech processing, 
subserved by gamma and theta band activity, can be applied to visual processing of 
language remains unknown.  Nevertheless, similar temporal integration windows (see 
Holcombe (2009) for a discussion of multi-time resolution model of vision) and 
neural activity in these frequency bands have also been implicated in visual cognition.  
The phase of EEG oscillations in the theta and alpha range is reported to be closely 
tied to a viewer?s ability to detect flashes of light, suggesting that visual detection 
thresholds fluctuate at these frequencies (Busch, Bubois, & Van Rullen, 2009).  A 
study using an MMN paradigm with visual stimuli found temporal windows of 150-
 170 ms in duration (Czigler, Winkler, Pat?, V?rnagy, Weisz, & Bal?zs, 2006) like the 
auditory studies.  Drawing upon broader literature on the temporal organization of 
information (Warren, 1999; Yost & Popper, 1993), Poeppel (2003) points to the 
prevalence of windows that are ~50 ms and ~ 200 ms in duration across many sensory 
systems.  The implication of gamma-band oscillation during attentional selection of 
sensory information and theta (and alpha) range oscillation in top-down effects in 
processing (Engel, Fries, & Singer, 2001) potentially make periods of theses 
frequencies privileged time-scales that are relevant to all language processing, and 
even more broadly to core cognitive functions, not just speech. 
 
21 
1.5 Oscillation of sub-lexical units in language 
 
As described above, research on brain rhythms reveals new insights on the 
basis for oscillatory patterns in speech processing.  However, the most prominent 
theory on the periodic basis of language production focuses on the motor frame of the 
mandible (MacNeilage, 1998; MacNeilage & Davis, 2001).  When its closing and 
opening are coupled with vocalization, consonant-vowel syllables emerge.  In early 
stages of language acquisition, babbling is marked by its rhythmic qualities.  When 
the oscillatory property of language is attributed to biomechanics that are unique to 
the jaw, it is predicted that other forms of language production, such as sign language, 
should not have similar cyclic characteristics. 
Those who are familiar with sign language are sensitive to rhythmic qualities 
in signing, but perhaps the discovery of manual babbling among children exposed to 
sign language has provided the most powerful counterevidence to claims that rhythms 
in language come from the biomechanics of the jaw (Petitto & Marentette, 1991; 
Meier & Willerman, 1995). Manual babbling is marked by repetitive qualities, 
involvement of possible sign language handshapes, syllabic organization, and 
production without reference, and is distinct from ordinary gestures.  Such 
movements have also been attested among deaf and hearing babies born to deaf 
parents and exposed to signing (Petitto, Holowka, Sergio, & Ostry, 2001).  When the 
temporal dynamics of these movements were studied using opto-electronic position-
 tracking equipment, manual babbling was marked by a slower rhythm (~ 1 Hz) than 
ordinary gestures (~2.5 Hz) and the manual movements of babies who were not 
22 
exposed to sign language input (~ 3 Hz) (Petitto, Solowka, Sergio, Levy, & Ostry, 
2004) (see Figure 1).  Moreover, low-frequency manual babbling was restricted to a 
smaller linguistic signing space.  Because this babbling is quantitatively and 
qualitatively different from the manual gestures of hearing babies who were not 
exposed to sign language, it cannot be simply attributed to general motor 
development. 
 
Figure 1. Reproduced from Petitto, Solowka, Sergio, Levy, & Ostry 
(2004), this figure shows the distribution of the frequencies (in Hz) of 
the manual movements among sign-exposed and speech-exposed 
babies.  Sign-exposed babies had movements that were at two 
different frequencies, where manual babbling in the signing space 
was marked by a slower rhythm (~1 Hz) than ordinary gestures 
outside the signing space (~2.5 Hz), whereas speech-exposed babies 
had movements at a higher frequency (~3 Hz).  
 
Although numerous studies have studied the qualitative aspects of vocal 
babbling (de Boysson-Bardies, 1999; Locke, 1983; Oller & Eilers, 1988; Jusczyk, 
23 
1997; Elbers, 1982; Vihman, 1996), few have investigated its frequency 
characteristics (Dolata, Davis, & MacNeilage, 2008; Levitt & Wang, 1991).  In the 
analysis of infants learning French and English, an average syllable duration of ~300 
ms (~ 3.4 Hz) is reported by Levitt and Wang (1991), and similar values were found 
by Dolata et al. (2008).  When compared to the rates found in adults, where syllables 
are ~200 ms in duration (~ 5 Hz) (Arai & Greenberg, 1997), these frequencies for 
vocal babbling in infants is notably lower.  While these differences suggest 
developmental constraints in the maturation of rhythm and repetitive production, 
studies on infant-directed language also reveal the possible role of input.  Masataka 
(1992) found that infant-directed signs (?motherese?) among users of Japanese Sign 
Language were produced at a mean rate of 1.3 per second, which approximately 
matches the manual babbling rate reported by Petitto et al. (2004).  Nevertheless, 
since Masataka?s report of 1.5 signs per second in adult-directed production is 
considerably slower than what is reported in American Sign Language (Bellugi & 
Fischer, 1972), it is difficult to draw conclusions about the reliability of this 
connection.  
Although sign language production does not have one main oscillator like the 
mandible in speech, rhythmic patterns underlie many biological systems (Ghez & 
Krauker, 2002; Fen?lon, Casasnovas, Simmers, & Meyrand, 1998), not just speech.  
Thus, it should be no surprise that these findings from babbling provide evidence for 
the role of oscillatory patterns in linguistic processing regardless of modality.  
Although modality seems to have a key effect on the frequency of these oscillations, 
rhythmic production is a universal precursor to language development (and perhaps 
24 
development more broadly), and it may also play an integral role in perceptual 
processes as well.  If frequencies in babbling, where manual babbling is slower than 
vocal babbling, have any parallels for processing at maturity, it is possible that sign 
language processing involves larger temporal integration windows than in speech.  
However, because these windows are posited to occur at multiple time-scales, it is 
also possible that the size of these windows converge at longer time-scales.  
 
1.6 Rates of processing in language 
 
Bellugi & Fischer (1972) examine the rates of natural production in English 
and ASL and demonstrate that the rates converge at the overall propositional (or 
sentential) level although they are different at the word and sign level.  This 
comparison was made by studying three bilingual CODAs who narrate the same story 
in both languages.  To ensure that the rates in ASL were not attributable to the 
hearing status of the signers, a rate analysis was also conducted on three deaf native 
signers, which resulted in similar findings (Klima & Bellugi, 1979).  Making sense of 
the differences in rates for words in English and signs in ASL despite the similarities 
in the global rates requires an understanding of sign language grammar, which 
employs a signing space, the obligatory use of which cannot be found in any spoken 
language.  Contrary to common myth, ASL is not a manual form of spoken English.  
Signing systems that are artificially created to help teach deaf individuals learn 
English, referred to as Manually Coded English, cannot be learned naturally, and 
when taught, often get reduced to forms that more closely resemble ASL (Supalla, 
25 
1991).  The fact that production in such signing systems is twice as slow as ASL 
(Klima & Bellugi, 1979) suggests that natural sign languages follow critical time 
pressures, and that only the grammar of natural sign languages are compatible with 
these constraints at the manual-visual interface.  Natural language processing seems 
to require a specific range of rates for informational flow.   
As discussed in the previous sections, a prominent theory for the basis of rates 
in syllables focuses on the motor constraints of the mandible (MacNeilage, 1998), 
which converge with findings implicating endogenous theta band oscillations in 
speech perception (Giraud, Kleinschmidt, Poeppel, Lund, Frackowiak, & Laufs, 
2007; Luo & Poeppel, 2007).  Since syllables provide a frame for speech content, 
including units of meanings, it is logically possible that syllable rate determines the 
overall rate of information transfer.  However, the evidence for similarity in rates 
across modalities suggests that temporal constraints in language processing go 
beyond bottlenecks at particular motor interfaces.  Sign languages are phonologically 
encoded through handshape, location, movement, orientation, and non-manual 
features, with the dominant hand as the primary articulator.  Because the articulators 
are physically larger in signing than in speech and must overcome greater inertia, it is 
theoretically possible that signing could be slower than speech.  In contrast, the 
availability of two hands and a potentially richer set of phonological features may 
also permit signing to result in overall faster rates than speech.  Given two very 
different motor systems, it is puzzling that rates in signing and speech converge the 
way they do. 
26 
Perceptual studies on rate-compressed sentences, where artificially accelerated 
input remains intelligible (to a certain limit) (Foulke & Sticht, 1969; Foulke, 1971; 
Ahissar, Nagarajan, Ahissar, Protopapas, Mahncke, & Merzenich, 2001; Fischer, 
Delhorne, & Reed, 1999) demonstrate that the production system does impose a 
bottleneck in processing to some degree.  Artificially accelerated inputs remain 
intelligible at rates that are above what the motor system is capable of producing.  
Although the similarity in rates in English and ASL suggests that this bottleneck is 
not particular to an individual motor system, parallel bottlenecks are also found in 
perception across the two modalities.  In studies of speech, intelligibility of sentences 
declines dramatically as a function of the compression rate, where intelligibility is 
measured as the fraction of correct words in a sentence that the participants are able 
to produce back.  Although compression by a factor of 2 remains almost perfectly 
intelligible and shows only a slight dip in performance, compression by a factor of 3 
results in a steeper decline (to 50% intelligibility) (Ahissar, Nagarajan, Ahissar, 
Protopapas, Mahncke, & Merzenich, 2001; Ghitza & Greenberg, 2009), and further 
compressions quickly reach zero levels of intelligibility.   
In the first study to investigate the effect of compression in sign language 
perception, Heiman and Tweney (1981) used compression only by a factor of 2.  
Using sentences in narratives, intelligibility was measured by performance on a 
comprehension test rather than by requiring the participants to produce the sentences 
back and measuring accuracy.  They find that comprehension decreased ~ 20% as a 
result of the compression.  To test whether this should be attributed to signal 
degradation or to an overload of short-term memory, they also tested stimuli where 
27 
black, blank films were inserted between ?semantically unitary statements? in the 
compressed versions so that the length of the narratives was equivalent to the 
noncompressed versions.  With these blank film insertions, comprehension scores 
were similar to (and slight lower than) the compressed condition.  This result does not 
converge with similar experiments in speech that were conducted later by others, 
where silences in prosodic boundaries improve the intelligibility of compressed 
speech (Wingfield, Lombardi, & Sokol, 1984; Ghitza & Greenberg, 2009).  It is 
important to recognize that silences in speech are not equivalent to blank films in 
ASL, where the repetition of a still image of a signer may have been a fairer 
comparison. Nevertheless, Heiman and Tweney (1981) conclude that memory 
constraints do not underlie poorer comprehension in compression.  When testing the 
intelligibility of single signs, comparing the control condition to whether they were 
also compressed by a factor of 2, they find similar decreases in performance as with 
sentences.  When interpreting the results as a whole, Heiman and Tweney (1981:12) 
conclude that ?decrements in comprehension may be due to cumulative decrements in 
intelligibility.?   
Fischer, Delhorne, and Reed (1999) test intelligibility of ASL sentences and 
single signs as a function of several compression rates in ASL.  Intelligibility was 
measured by taking the percent accuracy of correctly identified signs for each 
condition.  According to their summary, similar patterns in speech and sign, where 
compression by a factor of 3 results in a steep decline in the intelligibility of 
sentences (see Figure 2), suggest that even in the perceptual interface, time 
28 
constraints for processing are modality-independent at the sentence level.  At the 
level of words and signs, however, different patterns emerged (see Figure 2).   
Sentences Single Signs 
 
 
Figure 2. Reproduced from Fischer, Delhorne, & Reed (1999), these 
figures show the intelligibility of stimuli as a function of playback rates 
for 14 participants.  Error bars represent plus or minus one standard 
deviation of the mean.  With sentences, a sharp drop in intelligibility is 
found at compressions by a factor of 3. 
 
In the findings from speech, the intelligibility of single (monosyllablic) words is more 
sensitive to compressions (Beasley, Schwimmer, & Rintelmann, 1972), which is 
attributed to the benefit of having sentential context (Miller, & Heise, & Lichten, 
1951).  In contrast, Fischer et al. (1999) find that individual signs are more resistant to 
compressions than sentences, where same compression factors result in higher 
intelligibility scores.  This result is attributed to the fact that signs in isolation take 
longer to produce than when produced inside a sentence.  An analysis of the number 
of video frames revealed that signs in isolation took twice the number of frames.  The 
additional frames included those that show a sign in the ?final hold? (Liddell, 1984), 
which involves holding a hand configuration in place or a repetition movement.  They 
report that extra frames contain enough information to reveal the identity of the sign.  
29 
Although a follow-up experiment that controls for these factors would be useful, 
where single signs are extracted from sentences rather than produced in isolation, the 
overall pattern seems to be that tokens that take longer to produce are more resistant 
to compression.  Similar findings are found in speech, when compression is achieved 
by time sampling. Vowels are longer in duration than consonants, and they are also 
more resistant to compressions than consonants (Kurtzrock, 1957).  Overall, words 
that have higher number of phonemes are more resistant to compressions than shorter 
words (Henry, 1966).  
Ghitza and Greenberg (2009) test the intelligibility of sentences through the 
combination of compression and insertions of silence.  Unlike Heiman and Tweney 
(1981), who selectively inserted blank ?silent? films at phrasal boundaries, these 
silences were inserted either periodically (silences of fixed duration) or aperiodically 
(silences with randomized duration within a range).   
30 
 
 
Figure 3. Reproduced from Ghitza & Greenberg (2009), this graph 
shows the percent error in an intelligibility experiment, where 
sentences were compressed by a factor of 3 and silences were 
inserted periodically or aperiodically.  Error bars represent the 
standard deviation of the mean.   
 
As shown in Figure 3, in cases with time compressions by a factor of 3 and no 
insertions of silence, the intelligibility of the sentences are 50% (error rate ~ 50%).  
As silences are inserted periodically, intelligibility improves significantly, peaking to 
80% (error rate ~ 20%) where 80 ms silences are inserted in between 40 ms of speech 
material.  This alternation creates sentence durations that match the original 
uncompressed sentences.  These results suggest that perhaps only 20% of the 50% 
drop in intelligibility of sentences compressed by a factor of 3 should be attributed to 
sensory loss of information.  Ghitza and Greenberg use these findings to argue for the 
importance of endogenous rhythms, specifically in the theta frequency range, for 
speech decoding.  
31 
As Foulke and Sticht (1969:60) emphasize, there is a point at which ?a factor 
in addition to signal degradation begins to determine the loss of comprehension,? in 
sentences. In their review, there are cases where performance on the identification of 
words is lower than overall comprehension of sentences, and where it is also higher.  
The point at which processing full sentences becomes compromised by fast rates in 
English seems to be 275 words per minute.  In addition to identifying words, sentence 
processing requires temporarily storing and performing operations with those words.  
Beyond the issues of recovering an input from a degraded signal, there seems to be a 
channel capacity for processing the flow of linguistic information.  The results from 
ASL in production (Bellugi & Fischer, 1972; Klima & Bellugi, 1979) and perception 
(Fischer, Delhorne, & Reed, 1999) provide support for a model in which this channel 
capacity is modality-independent.  In addition to better understanding channel 
capacities, determining the time-windows of integrating information that flows 
through a channel and determining if these windows are dependent on a sensory 
modality will lead to an improved model of language processing. 
 
1.7 Outline of the dissertation 
 
Thus far, I have provided background information that motivates the studies 
that follow.  Studying languages that use different sensori-motor systems are essential 
for understanding core language properties.  An important aspect of language 
processing is its temporal dynamics in on-line perception and production.   
32 
Chapter 2 focuses on temporal integration windows in the visual processing of 
ASL.  As outlined in sections 1.3 and 1.4, evidence from temporal integration 
windows come from a wide range of methodologies and domains.  Experiments 1, 2, 
and 3, tests the intelligibility of ASL sentences as a function of the size of local-
 reversals, a methodology that is motivated by previous work in speech perception 
(Saberi & Perrott, 1999; Greenberg & Arai, 2001; Figueroa, 2009; Stilp, Kiefte, 
Alexander, & Kluender, 2010). Through these experiments, it will be shown that 
time-scales of temporal integration arise from 1) modality, 2) size of linguistic units, 
and 3) developmental factors. 
Chapter 3 focuses on the rates of natural production in English, ASL, and 
Korean, where rates of words, signs, morphemes, and syllables are reported.  
Previous studies of English and ASL have attributed differences in word and sign 
rates to both modality and grammar.  By also including an analysis of Korean, which 
has relevant grammatical properties similar to ASL, a better understanding of the role 
of modality and grammar in language rates is possible.  As will be demonstrated in 
Chapter 2, an understanding of rates in natural production is a fundamental part of 
building models for temporal integration windows in language. 
Chapter 4 concludes with a synthesis of all the results and a discussion of 
implications for future research.  
33 
2 Temporal integration windows in sign language 
 
2.1 Introduction 
 
Linguistic structures are processed in time, whether listening to acoustic 
speech or viewing the visual input of sign language.  The goal of the three 
experiments presented in this chapter is to investigate factors that affect the temporal 
integration windows in language perception.  Temporal integration windows are 
chunks of times during which information is collected and integrated and refer to 
durations among many phenomena, from the level of the neuron, where output spikes 
are dependent on the sum of activities (Theunissen & Miller, 1995), to the 
psychophysical level, where sensory stimuli can be detected and compared to 
previous inputs (Viemeister & Wakefield, 1991; N??t?nen, 1992), to the level of 
higher cognition, where information is consolidated in memory (Wilson & 
McNaughton, 1994; Buzs?ki & Draguhn, 2004; Furman, Dorfman, Hasson, Davachi, 
& Dudai, 2007).  In studies of speech perception, temporal integration windows refer 
to durations of time over which the sensory signal are mapped to units of linguistic 
representations ? such as phonemes and syllables (Poeppel, Idsardi, & van 
Wassenhove, 2008).  In audio-visual studies, it has also been used to describe the 
durations for multi-sensory integration (van Wassenhove, Grant, Poeppel, 2007).  
Beyond levels of pure sensory processing, some differences between studies that use 
speech-like and non-speech-like stimuli, where longer lags are tolerated with speech-
 like stimuli, suggest that the nature of linguistic information in the input also 
influences the duration of these windows. 
34 
The importance of temporal direction in processing can be demonstrated by 
the simple scenario of playing backwards a spoken sentence that is about 2 seconds in 
duration ? it is utterly unintelligible.  In contrast, a sentence that is locally reversed in 
20 ms increments is perfectly intelligible (Saberi & Perrott, 1999; Greenberg & Arai, 
2001).  The mechanisms underlying the mapping of the acoustic signal to meaningful 
linguistic representations cannot handle distortions over longer time-scales like 2 
seconds.  Speech is somewhat robust to a variety of adverse conditions, such as noise 
(Sumby & Pollack, 1954), compression (Foulke & Sticht, 1969), and interruptions 
(Miller & Licklider, 1950).  In these cases, portions of the signal is either masked or 
deleted, but in the case of backwards speech, all of the input is intact.  The 
unintelligibility of backwards speech is probably a result of several factors, from at 
the sensory level to higher levels of linguistic processing.  For example, simply 
reversing the order of words in a string (global reversal with local integrity) results in 
an ungrammatical sentence (sentence ungrammatical an in results string a in words 
of order the reversing simply), and one that can be extremely hard to understand or 
repeat back.  Nevertheless, it would still be possible to pick out a few words from the 
reversed sentence, which feels like a random list of words.  At another level, a 
sentence can be difficult to understand because no words can be recognized from an 
acoustic stream (for example, lacitammargnu  ecnetnes for ungrammatical sentence).  
These examples demonstrate the importance of temporal direction in language 
processing. 
Backwards speech is the most drastic form of temporal order distortion 
(Saberi & Perrott, 1999).  Gradually increasing the degree of this distortion can lead 
35 
to a better understanding of the temporal constraints for the construction of linguistic 
representations from the sensory input.  The use of local-reversals has already 
provided insights to mechanisms for processing the acoustic signal through time and 
the duration of temporal integration windows in speech.  However, without a 
comparison to languages that use a different modality, it is impossible to determine 
whether such mechanisms are specific to auditory processing or more generalizable to 
all sensory processing in language.  Experiment 1 is designed to determine temporal 
integration windows in the visual processing of language through locally time-
 reversed sentences.  It serves as the basis for two follow-up studies, where the 
duration of temporal integration windows are tested as a function of rate in 
Experiment 2 and as a function of age-of-acquisition in Experiment 3. The next three 
sections motivate each of these experiments in turn.   
 
2.2 Cognitive restoration of locally time-reversed 
sentences 
 
As mentioned above, backwards, or globally reversed, speech is utterly 
unintelligible, but locally reversed speech demonstrates a different phenomenon 
(Saberi & Perrott, 1999).  The creation of locally-reversed stimuli is like rotating the 
orientation of slats in Venetian blinds.  First, a sound waveform is subdivided into 
intervals of fixed duration, then each interval is reversed, such that only the temporal 
order of the input within the interval is altered, not the global order of the entire 
36 
speech stream (see Figure 4).  In these studies, the intelligibility of the sentences 
drops as a function of the size of the reversals.   
 
Figure 4. Reproduced from Greenberg & Arai (2001), this figure 
demonstrates how locally-reversed speech stimuli are created.  Here, 
each 80 ms segment is played backwards, but the original order of the 
segments is maintained. 
 
In the first study of this kind, Saberi and Perrott (1999) find that intelligibility 
of a single sentence, as measured by subjective reports, drops to 50% at 130 ms 
reversals, falling to 0% around 200 ms (Figure 5). Thus, the ability to cognitively 
restore locally-reversed sentences up to ~100 ms was presented as further evidence 
for temporal integration windows of these durations.  They note that reversals of short 
durations are less likely to disrupt the temporal envelopes of speech, which have been 
proposed to be important cues to intelligibility (Greenberg & Arai, 1998).   
37 
 
Figure 5. Reproduced from Saberi & Perrott (1999), this figure shows 
subjective intelligibility ratings by 7 participants on a single sentence 
that was repeated for all conditions. 
 
However, Saberi and Perrot?s (1999) conclusion that ?a detailed auditory 
analysis of the short-term acoustic spectrum is not essential to the speech code? is 
oversimplified. It is clear from examples like wolf and flow that the direction of 
information within a temporal envelope provide critical cues for correct word 
recognition.   As emphasized in the multi-time resolution model of speech perception 
(Poeppel, 2003; Poeppel, Idsardi, van Wassenhove, 2008), windows that are short and 
long in duration are important for information processing at different hierarchies.  
Whether or not Saberi and Perrot?s (1999) result support integration at shorter or 
longer time scales is hard to determine without having data on the stimulus that they 
used, such as information on the average size of segments and syllables in the 
sentence.  Notably, they tested only a single sentence, which was repeated for all the 
conditions.  As they report, repetition of the sentence, even with larger reversal sizes, 
improves its intelligibility, reflecting ?cognitive recalibration? and learning.  
Greenberg and Arai?s (2001) replication of the study produced different 
results, where intelligibility fell quite sharply at smaller time scales, reaching 50% at 
38 
60 ms (Figure 6).  They argue that information about syllable segmentation as well as 
fine structure with phonetic details are important for speech perception.  In their 
study, intelligibility was measured quantitatively by scoring the number of words in a 
sentence that were identified correctly.  
 
Figure 6. Reproduced from Greenberg & Arai (2001), this figure 
demonstrates 1) the spectrogram of locally reversed sentences, 2) the 
intelligibility curve as a function of reversal sizes, and 3) the complex 
modulation spectrum of the sentences.  Intelligibility results are from 
27 participants tested on 40 sentences.  Intelligibility of sentences falls 
drastically between 40 and 50 ms reversals, falling to 50% at 60 ms 
reversals, and reaches ~0% by 100 ms reversals.     
 
40 sentences from the TIMIT corpus were chosen for their low semantic 
predictability and diversity of speakers.  Because this minimizes the influence of 
context effects in guessing words or adjusting to the acoustics of a single speaker, the 
study targets sensory processing in sentence comprehension.  Moreover, an analysis 
was conducted on all the stimuli to better understand spectro-temporal consequences 
of local reversals.  Greenberg and Arai (2001) tested the relationship between the size 
of the reversals and the amplitude component of the modulation spectrum alone, as 
well as the complex modulation spectrum, which is calculated by taking information 
about amplitude and phase components of the modulation spectrum at various 
39 
frequency bands.   The inclusion of phase information, which was referenced with 
respect to the phase of the control condition, was critical to finding a correlation 
between the modulation spectrum and intelligibility.  This should not be too 
surprising given that with reversals in duration of syllables, the temporal envelope of 
the speech stream is almost entirely preserved, yet sentences are utterly unintelligible 
at reversals of 100 ms, which is smaller than the duration of average syllables, and 
stay unintelligible with larger reversals.  As the size of the reversals are increased, the 
phases between the original and reversed stimuli become increasingly dispersed.  
Greenberg and Arai attribute the difficulty in processing sentences with local 
reversals to the distortion of information that is critical for identifying phonetic 
information.   
Greenberg, Hollenback, and Ellis (1996) find that the median duration for 
most segments is 60-100 ms (shorter for stops and longer for diphthongs) in natural 
speech (Switchboard corpus).  In a different study using sentences from the TIMIT 
corpus, Arai and Greenberg (1998) report that the mean duration of a phonetic 
segment is 72 ms.  The sharp decline of speech intelligibility at local reversals >50 
ms and falling to 50% at 60 ms, suggests that acoustic signals must be integrated in 
short time windows to recover phonetic information (Greenberg & Arai, 2001).  
Reversals in shorter durations preserve not only the relative order of the phonetic 
segments within words but may also capture the fine-structures within a phonetic 
segment for recognition in word contexts. 
Experiments on the perception of interrupted speech have shown that speech 
can be almost entirely intelligible under certain conditions where 50% of the speech 
40 
material is deleted (Miller & Licklider, 1950) (see Figure 7).  Intelligibility of 
sentences varies as a non-monotonic function of the frequency of the 
interruptions/deletions.  Interruptions at low frequencies occur at longer durations per 
interruptions, causing certain words to be either entirely captured or missed.  At 
higher frequencies, parts of words get interrupted, and intelligibility is relatively high 
in cases where some information about every phoneme in the word is preserved.  
They write, ?It appears that one glimpse per phoneme is sufficient [for intelligibility]? 
(Miller & Licklider, 1950: 168).  Taken together with these findings, results from the 
local-reversal of speech suggests that ?looks? to each phoneme and the order of these 
looks are important for intelligibility, but that the temporal direction within each look 
is more flexible. 
 
Figure 7. Reproduced from Miller & Licklider (1950), this figure 
demonstrates the intelligibility of English sentences as a function of 
frequency of interruption and speech-time fraction (where the duration 
of interruptions were dependent on the frequency of the interruptions 
and speech-time fractions and were spaced regularly). 
 
41 
Beyond sensory integration, it is likely that additional factors play an 
important role in findings from the local reversal of speech.  The intelligibility curve 
is likely to shift rightward (more intelligible with more severe distortions) with the 
semantic predictability of the sentences (Miller & Isard, 1963).  Moreover, the curve 
is likely to shift leftward (less intelligible with less severe distortions) if a word-list 
was used rather than sentences (Miller, 1951).  As Ghitza and Greenberg (2009) 
conclude based on evidence where the periodic insertion of silences in time-
 compressed sentences significantly improved intelligibility, ?Intelligibility is not 
simply a matter of decoding the spectro-temporal pattern.?  Listening comprehension 
and word intelligibility can be dissociated in compression studies (Foulke & Sticht, 
1969; French & Steinberg, 1947), and similar patterns may emerge in the case of 
local-reversals, although this has yet to be tested.   
In summary, the ability to detect, integrate, and decode rapid acoustic signals 
in the speech stream is essential for auditory speech perception.  One of the 
implications from the intelligibility of locally time-reversed speech is that the 
acoustic signals are not integrated continuously but in a discrete manner.  Although 
the ability to cognitively restore time-reversed stimuli is limited, listeners? tolerance 
for local reversals is still remarkable. By manipulating stimuli so that it is chunked in 
larger sizes, Saberi and Perrott (1999) and Greenberg and Arai (2001) determine 
perceptual and cognitive limitations for integrating the signal.  Findings from 
Greenberg and Arai (2001) suggests that one important time-window for integrating 
the speech signal lies somewhere below ~60 ms.  Reversals that go beyond these 
42 
perceptual integration windows cannot be cognitively restored and no linguistic 
representations can be recovered.   
 
2.3 Flexibility of perceptual parameters to rates 
 
Speech perception requires flexible mechanisms that can accommodate a wide 
variety of conditions, created by individual speakers (age, gender, accents/dialect, 
emotional state, speaking rate, etc.) and environments (noise).  Within the speech 
stream of one individual, one can find phonetic segments that vary in duration (stop 
consonants are shorter than fricatives, for example), and a given segment may have 
different phonetic realizations depending on context (whether occurring in word 
initial or final positions, or in stressed or unstressed syllables, for example).  On 
average, phonemes and syllables are produced at relatively consistent rates 
(Greenberg, Hollenback, & Ellis, 1996; Arai & Greenberg, 1998), but each unit has 
its own range of variability.  A pattern that underlies rate uniformity is that shorter 
syllables are flanked by longer syllables, and vice versa (Greenberg, 1999).  Since 
speech is not perfectly periodic, the temporal integration process in speech must be 
flexible. 
The ability to adjust perceptual parameters to a variety of contexts is referred 
to as perceptual normalization.  One area where the effect of speaking rate has been 
tested (by varying the duration of syllables) is in categorical perception, where 
perceptual boundaries are found within a continuously varying parameter.  For 
example, a key acoustic cue to distinguish between [b] and [w] is in the duration of 
43 
the formant transition at stimulus onset, which is longer for [w].  The transition 
durations at which the perceptual response changes from [b] to [w] has been shown to 
shift depending on whether the onset was produced within a long or short syllable 
(Miller & Liberman, 1979).  Another example comes from voiced and voiceless stop 
onsets, where the timing difference between the onset of laryngeal voicing and the 
onset of the stop release burst (Voice Onset Time, VOT) is a prominent cue for 
distinguishing between them (Lisker & Abramson, 1964; Lisker, 1975; Klatt, 1975). 
Moreover, there is an interaction between VOT and place of articulation, where the 
boundary between voiced and voiceless stops shifts towards longer VOTs as the 
closure for the stop is made further back in the mouth. The categorical boundary 
between a particular set of voiced and voiceless stop onsets is not absolute, however.  
The boundaries shift towards longer VOTs when the syllable is lengthened either by 
acoustic or visual cues (Summerfield, 1981; Green & Miller, 1985) (Figure 8).   
 
Figure 8. Reproduced from Green & Miller (1985), this figure 
demonstrates that perceptual boundary, reflected by the percentage 
of voiced responses for [bi]-[pi] continuum, varies depending on 
durations. 
44 
 
Greenberg and Arai?s (2001) finding that intelligibility falls sharply at 
reversals >50 ms, reaching 50% at ~ 60 ms reversals, and the measurement of 
phonetic segment durations with similar materials in a separate study (where 
phonemes were 72 ms long on average) (Arai & Greenberg, 1998) suggests a link 
between temporal integration windows and the duration of linguistic units in speech, 
as has been proposed by Poeppel (2003).  Under this hypothesis, temporal integration 
windows, as revealed through local reversals, should be variable depending on the 
distribution of phonetic segments and overall rates of speech.  Moreover, it has the 
potential to be generalized as a fundamental language processing mechanism, 
applying also to sign languages.  With the same technique, the intelligibility of signed 
sentences may also be dependent on the size of locally reversed video segments and 
rates, albeit at different time-scales.  However, another possibility is that temporal 
integration windows of such time-scales are important to general auditory processing 
and somewhat independent of the linguistic nature of the acoustic signal.  Yet a third 
possibility is that ~ 60 ms integration windows are important for all language 
processing, regardless of modality, and not linked to the auditory channel.  The 
discrepancy between the findings in Saberi and Perrot?s (1999) and Greenberg and 
Arai?s (2001) studies already suggests that ~ 60 ms is not a perceptual primitive but a 
result of a combination of factors.    
To investigate the relationship between temporal integration windows and 
duration of linguistic units, Figueroa (2009) (also Figueroa, Howard, Idsardi, & 
Poeppel, 2009) used a novel combination of compression and local-reversals (see 
Figure 9 for results).  Stimuli consisted of TIMIT sentences, which were presented in 
45 
10 conditions, where the reversal sizes ranged from 0 to 100 ms, lengthened in 10 ms 
increments.  At the normal rate of speech, a sharp drop in intelligibility to 50% was 
found around 60-70 ms reversals, replicating the results of Greenberg and Arai 
(2001).  In addition, intelligibility was tested on conditions where the sentences were 
either compressed by a factor of 2 or dilated by a factor of 1.5.  With the faster rate, 
intelligibility fell to 50% around 30-40 ms reversals, revealing time windows that are 
half the durations found in normal speech.  In the case of the dilated condition, 
performance on the intelligibility task was close to ceiling even at 70 ms reversals, 
falling to 50% around 80-90 ms.   
 
Figure 9. Reproduced from Figueroa (2009), this figure shows the 
intelligibility of English sentences as a function of compression and 
reversal size. 
 
Similar findings are reported in a study using synthetic speech, even though 
the sentences selected from the HINT corpus (Hearing In Noise Test, Nilsson, Soli, & 
46 
Sullivan, 1994) were reported to be semantically more predictable and easier than 
TIMIT sentences  (Stilp, Kiefte, Alexander, & Kluender, 2010).  The sentences were 
synthesized to produce 2.5, 5.0 or 10.0 syllables per second, where the average 
duration of sentences was 2.6, 1.4, or 0.8 s, respectively.  Five reversal conditions 
were used (0, 20, 40, 80, and 160 ms reversals).  Across the three rate conditions 
(slow, medium, and fast), intelligibility reached relative minimum levels when the 
reversals were roughly the durations of one syllable, suggesting that tolerance for 
temporal distortions do not arise through absolute perceptual limits for durations but 
are proportionally relative to the amount of distortion.  A visual inspection of the 
graph shown in Figure 10 indicates that intelligibility falls to 50% at ~30 ms at the 
slow rate, ~60 ms at the medium rate, and ~120 ms at the fast rate. 
 
Figure 10. Reproduced from Stilp, Kiefte, Alexander, & Kluender (2010), 
this graph shows intelligibility curves of English sentences as a 
function of the size of local-reversals (segment durations in ms) and 
speech rates (in syllables per second: slow = 2.5, medium = 5.0, fast 
= 10). 
 
47 
Although these findings demonstrate the flexibility of perceptual processes, to 
adjust to rate changes as well as directional distortions, they may also reveal 
limitations.  For example, Figueroa (2009), who used 10 conditions of reversal sizes, 
notes that when doing equivalent time window comparisons, the intelligibility curve 
does not shift directly in proportion to the rate in the dilated sentences, where 
performances fall slightly sooner (that is, at smaller reversals than predicted).  This is 
attributed to the possibility that the perceptual system has inherent properties that 
constrain its ability to adjust to distortions beyond a certain range.  This limit (80-90 
ms) for temporal integration may be reached even before the stimuli properties are 
predicted to induce processing difficulty.  In the same vein, if speech processing 
relies upon temporal integration windows that adjust to all speech rates, it is predicted 
that a wider range of temporal integration windows can be found in this specific 
processing task.  Testing the intelligibility of locally reversed sentences that are 
further compressed and dilated in gradual steps may reveal that the curves do not shift 
beyond certain points. 
 
2.4 Perspectives from development and bilingualism 
 
An important part of development is learning the same perceptual tuning 
process that is present in adults.  Some of the abilities underlying perceptual 
normalization are present at early ages.  Using the continuum between [b] and [w], 
Eimas and Miller (1980) found that infants as early as at the age of 2 ? 4 months 
perceive these sounds categorically.  Moreover, they report that perceptual boundaries 
48 
shift in relation to the duration of the syllable, as found among adults (Miller & 
Liberman, 1979).   
Although infants are born with high sensitivity to potentially phonemic 
contrasts found among the world?s languages, their capacity to discriminate non-
 native contrasts declines by the age of 10 months (Werker & Tees, 1984).  However, 
this sensitivity is not lost forever, as demonstrated by the native acquisition of a 
second language through exposure during sensitive periods (Werker & Tees, 2005).  
An examination of the conditions under which sensitivity to non-native phonemic 
contrasts can be recovered reveals the important role interpersonal interaction during 
exposure (Kuhl, Tsao, & Liu, 2003).  It is reported that 9-month old American infants 
do not show the ability to discriminate the alveolo-palatal affricate and fricative of 
Mandarin Chinese.  These infants who were exposed to Mandarin Chinese either 
through audio-only or audio-visual recordings through experimental sessions did not 
show sensitivity to differences among these sounds, whereas those who were exposed 
to the language input through interpersonal interaction showed a recovery of these 
phonemic contrasts.   
Although children develop sensitivity to native contrasts at early stages, the 
maturation of perceptual abilities through a variety of speech conditions takes longer 
periods of time through physiological changes. Studies have shown that children have 
poorer temporal resolution than adults (Abel, 1972; Wrightman, Allen, Dolan Kistler, 
Jamieson, 1989), reaching adult performance levels around age 8 (Davis & 
McCroskey, 1980). The ability to discriminate small changes in frequency may not 
reach adult-level acuity until about 6 years of age (Olsho, Schoon, Sakai, Turpin, & 
49 
Sperduto, 1982; Jensen, Neff, & callaghan, 1987). Children have more difficulty than 
adults in perceiving speech in noise (Mills, 1975; Elliott, 1979; Nittrouer & 
Boothroyd, 1990; Fallon, Trehub, & Schneider, 2000).  This is attributed to age-
 related differences in auditory sensitivity, where children have higher auditory 
thresholds than adults in especially low frequency ranges, and these thresholds may 
not reach adult levels until around the age of 10 (Elliott & Katz, 1980). Fallon et al. 
(2000) report that children even at 11 years of age require higher signal-to-noise 
ratios than young adults to perform comparably in word identification tasks where 
sentences are embedded in multitalker babble.  
For both children and adults, the amount of exposure to a language plays an 
important role in speech processing.  Familiar words generally require less acoustic 
information for identification (Rosenwieg & Postman, 1957).  Part of children?s 
poorer performance with word detection in noise is associated with their limited 
language experience (Elliott, 1979; Nittrouer & Boothroyd, 1990).  Similarly, non-
 native listeners are more adversely affected by noise than native listeners (Buus, 
Florentine, Scharf, & Can?vet, 1986; Mayo, Florentine, & Buus, 1997).  In tasks 
where natives and non-natives may perform similarly in quiet conditions, differences 
emerge with the introduction of noise.  Mayo et al. (1997) investigate the effect of 
age-of-acquisition on English L2 performance in different degrees of noise.  The task 
for participants, whose first language was Spanish, was to identify the target word at 
the end of an English sentence, where sentences were either high or low in 
predictability.  Those who learned English before the age of 6 were more resilient to 
noise (in other words, had higher accuracy with similar noise-levels) than those who 
50 
learned English after the age of 14.  In addition, late learners did not show a 
difference in performance between sentences with low and high predictability, unlike 
the early learners.  Because the subjects were all similar in age, late learners overall 
had shorter duration of exposure than the early learners.  Nevertheless, statistical 
analysis taking into account exposure duration demonstrated that it is not as strong of 
a predictor as age-of-onset in these results.  When exposure duration was matched 
with the early learners, late learners still showed a significantly poorer performance.   
All together, these findings suggest that factors underlying difficulty in speech 
processing among children and non-native adults differ.  While children undergo 
developmental changes in their auditory processing abilities and can benefit from 
increasing exposure to their language, late learners seem limited by constraints that 
are more permanent.  Another example of the difference between the developmental 
constraints underlying children and adult late-learners comes from the differences in 
performances to conversational and clear speech.  Clear speech, which is marked by 
enunciation and corresponding acoustic-phonetic markers, benefits adults with 
normal hearing, impaired hearing, as well as children (Picheny, Durlach, & Braida, 
1985; Bradlow, Kraus, & Hayes, 2003).  However, a comparison of performance 
while processing conversational and clear speech among non-native adult listeners 
did not show the same degree of benefit from clear speech (Bradlow & Bent, 2002).  
This suggests that benefits from clear speech are derived from rich experience with a 
language, and it appears that this experience must be gained in early ages of 
development. 
51 
The difference between early and late bilinguals in the study conducted by 
Mayo et al. (1997) demonstrates the role of input during sensitive periods for 
developing the flexibility to adjust to a wider range of perceptual environments.  
However, the difference between monolinguals and early bilinguals in the same study 
also point to the effect of bilingualism itself on perceptual capacity.  Although the 
performance among monolinguals and early bilinguals (as well as late bilinguals) 
were very similar in quiet conditions, monolinguals performed better than both 
groups in noisy conditions.  Similar patterns are replicated in another study that tested 
bilinguals whose first language is Italian (Meador, Flege, & Mackay, 2000).  Because 
accent and proficiency of English was not assessed in Mayo et al.?s (1997) study, and 
because speakers even in the early bilingual group were reported to have a noticeable 
foreign accent in Meador et al?s (2000) study, Rogers et al. (2006) focus their 
comparison on monolinguals and early bilinguals, whose abilities in English and 
Spanish were assessed through questionnaires, interviews, and recordings (Rogers, 
Lister, Febo, Besing, & Abrams, 2006).  Only monolingual and bilingual participants 
that were rated by monolingual speech-language pathology trainees as having little or 
no regional or foreign accent in English were included for the full study. Among the 
bilinguals, language assessments suggested that some degree of language attrition in 
Spanish was present among more than half the participants.  To extend the findings of 
previous work, Rogers et al. included noise conditions with reverberations. 
Reverberations refer to the persistence of a sound, which is common in enclosed 
spaces.  Noise and reverberations often occur simultaneously, and their combination 
is more detrimental for speech perception than the sum of the individual components 
52 
(Nabelek, 1988).  Comparing the consequences of these distortions, which are present 
in typical environments, among monolinguals and bilinguals contributes to a better 
understanding of the factors underlying perceptual adaptability.    
Roger et al. (2006) replicate previous pattern of results, where both 
monolinguals and early bilinguals have perfect performance in a word recognition 
task in quiet conditions, and moreover, the performance of bilinguals decreases more 
dramatically to the noisy conditions.  The findings suggest that in addition to age-of 
acquisition factors, acoustic degradations more adversely affect bilinguals than 
monolinguals.  One potential explanation for these results is that bilinguals have a 
larger number of target phonemes with two languages, forcing them to have more 
fine-tuned perceptual abilities that are consequently less robust in noise.  Another 
possibility is that language processing for bilinguals requires more cognitive 
resources to suppress the other language.  Because the baseline performance of 
bilinguals is thought to be already attentionally demanding, less resource may be 
available to them in adverse conditions. While constant experience with more than 
one language has been shown to have beneficial effects in the domain-general aspects 
cognitive control (Bialystok, 2001), these findings suggest that reduced perceptual 
adaptability to noise is a cost. 
Although this summary has focused on studies where differences among 
native and non-native participants only emerge in adverse listening conditions, age of 
acquisition has consequences for many areas of language processing, including 
phonology (Oyama, 1976; Flege, MacKay, Meador, 1999) and morphosyntax (Klein 
& Dittmar, 1979; Johnson & Newport, 1989; Beck, 1998; DeKeyser, 2000; 
53 
DeKeyser, Ravid, Alfi-Shabtay, 2005).  Evidence from sign language research 
demonstrates that age-effects in language acquisition are not specific to auditory 
perception or vocal production (Newport, 1990; Mayberry, 1993).  Late-learners of 
sign language show ?accents? in their production (Cicourel & Boese, 1972; Kantor, 
1978; Mirus, Rathmann, & Meier, 2001; Rosen, 2004; Chen Pichler, 2006; Boyes-
 Braem, 1999).  Some features that reveal non-nativeness include handshapes, facial 
expressions, rhythm, and movements.  Accents also exist for those who are native 
signers in one sign language and learning another (Budding, Hoopers, Mueller, & 
Scarcello, 1995).  
In perception, native signers and late learners differ in judgments about what 
phonological aspects of signs are most salient (Corina & Hildebrandt, 2002).  An eye-
 tracking study has also shown that native signers and beginning signers fixate on 
different locations in the signing space (Emmorey, Thompson, & Colvin, 2009).  In 
the perception of handshapes, which sometimes show categorical perception among 
native signers (Emmorey, McCullough, & Brentari, 2003; Baker, Idsardi, Golinkoff, 
& Petitto, 2005), late learners show different profiles (Best, Mathur, Miranda, & 
Lillo-Martin, 2010).  In particular, performance of deaf late-learners reflect more 
attention to fine-grained phonetic properties of signs than deaf native signers and 
hearing late signers (Best, Mathur, Miranda, & Lillo-Martin, 2010).  This converges 
with previous studies that suggest that deaf late-learners experience ?phonological 
bottlenecks? that have consequences for many other aspects of processing (Mayberry 
& Fischer, 1989; Mayberry, 2007). Late-learners take longer to identify ASL signs in 
gating tasks, requiring more phonetic or phonological information than native signers 
54 
(Emmorey & Corina, 1990).  When testing sentence recall among signers who first 
acquired ASL at ages ranging from birth to 13, later ages of acquisition were linked to 
lower performance, due to increasing phonological errors and inefficient sign 
recognition (Mayberry & Fischer, 1989), even when length of signing experience was 
comparable (Mayberry & Eichen, 1991).  Furthermore, phonological errors were 
correlated with poorer comprehension. In a probe recognition task, where the 
participant has to accurately respond whether or not a target sign was present in a 
sentence, late signers were slower to reject phonologically similar substitutes 
(Emmorey, Corina, and Bellugi, 1995).  In contrast, native signers were only affected 
by semantic substitutes.   
Later ages of acquisition result in difficulty with grammatical aspects as well.  
In tasks testing morphological processing, late learners show variable use of 
morphology (where obligatory morphemes are omitted) as well as inappropriate use 
of whole-word signs that reflected incorrect representation of the morphological 
structure (Newport, 1990).  Grammatical judgment accuracy of sentences decreases 
with delays in exposure to a first language, (Mayberry, 2003; Boudreault & 
Mayberry, 2006; Mayberry & Lock, 2003).  In cases where late learners have 
comparable levels of performance on an off-line grammaticality judgment tasks, 
differences emerged in on-line tasks (Emmorey, Bellugi, Friederici, & Horn, 1995; 
Emmorey, 1995).   
When sign language skills were tested in a sentence shadowing task, where 
participants simultaneously watch and produced sign language narratives, better 
performance among native signers also reflected better comprehension (Mayberry & 
55 
Fischer, 1989).  In this study, performance in good and poor viewing conditions were 
also compared, where the poor conditions were created by adding visual noise of 
randomized black and white dots, which looked like video ?snow.?  Although this 
reduced shadowing accuracy overall, the effect was similar for both native and non-
 native signers.  As mentioned previously in spoken language studies, late bilinguals 
were more adversely affected by auditory noise than monolinguals or early bilinguals 
(Buus, Florentine, Scharf, & Can?vet, 1986; Mayo, Florentine, & Buus, 1997).  
Although more studies on the effect of visual disruptions in sign language processing 
is necessary, if these patterns persist, it would suggest that some aspects of perceptual 
adaptability to adverse conditions are modality dependent. 
No studies have investigated the effect of late language acquisition on 
temporal integration windows in processing.  Understanding the effects of late 
learning is particularly important in sign language processing because >95% of deaf 
individuals are born to hearing parents and do not receive exposure to language from 
birth (Mitchell & Karchmer, 2004).  Because the experiments presented here examine 
temporal integration windows through locally time-reversed sentences, understanding 
the consequences of processing imperfect input among native and non-native users of 
a language are relevant factors to consider. 
 
56 
2.5 Experiment 1 ? Effect of modality on temporal 
integration windows: evidence from local-reversals of 
ASL sentences 
 
Building upon studies on the cognitive restoration of locally time-reversed 
speech (Saberi & Perrott, 1999; Greenberg & Arai, 2001), the aim of Experiment 1 is 
to better understand the universal constraints for temporal direction and integration by 
testing the visual processing of language.  The ability to detect and integrate rapid 
acoustic signals in the speech stream is essential for auditory speech perception.  The 
intelligibility of temporally distorted sentences suggests that one important time-
 window for integrating the speech signal lies somewhere below ~ 50 ms (Greenberg 
& Arai, 2001).   Sentences with local-reversals within this window can be cognitively 
restored successfully, whereas sentences with reversals in sizes that go beyond these 
perceptual integration windows cannot.  What happens in a language that is processed 
visually?  Anecdotal accounts suggest that globally reversed ASL is somewhat more 
intelligible than backwards English, but not entirely so. To what degree the spatial 
encoding and the temporal properties of ASL affect time window of integrating 
linguistic information in the visual modality remains unknown.   
If previous findings from the perception of locally-reversed speech can be 
extended to the processing of ASL, this would suggest that 50 ms time-windows are 
modality-independent.  Given that <50 ms time windows were attributed to the 
analysis of fine-structures in speech (Greenberg & Arai, 2001), this prediction is 
unlikely.  For ASL, time-windows at longer time-scales would point to differences in 
processing due to the sensory channel of communication.  Finding a different 
57 
psychophysical profile for a sign language would point to unique temporal integration 
windows for the visual interface for language perception, but it does not necessarily 
rule out the possibility that these windows are dependent on the linguistic nature of 
the signal.  Communication through a different sensori-motor system affects 
linguistic properties of the language.  For example, although the vast majority of 
spoken languages employ sequential morphology to increase the complexity of 
words, all sign languages use simultaneous or non-concatenative forms by using 
spatial modulation and non-manual markers for grammatical inflection.  This means 
that for spoken languages, the addition of each new unit of meaning often takes 
longer to produce, but this is not necessarily the case in sign languages.  What could 
be the cause of such differences?  Simultaneous strategies may be favored in sign 
languages not only because spatial encoding allows for layered units of information 
but also because of the time pressures for natural language processing. 
Although using ASL involves manual articulators that are massive compared 
to the organs of the vocal tract in speech, the global rate of naturally produced 
utterances, as measured by the number of propositions in a given amount of time, is 
similar to spoken English (Bellugi & Fischer, 1972; Klima & Bellugi, 1979; 
Grosjean, 1979).  When comparing smaller units such as words and signs, however, 
the rate of signs is half the rate of words in the equivalent measure of time.  Similar to 
English, ASL constructs linguistics structures incrementally over time.  One 
hypothesis is that each step contains more linguistic information and takes longer to 
produce than an incremental step in English.  Although these larger steps take longer 
to produce, because information is encoded simultaneously in ASL, overall they may 
58 
contain similar amounts of linguistic information that is sequentially presented in the 
same amount of time in speech.  Artificially created systems that force sequential 
grammar on a signing system, such as various forms of Manually Coded English, 
may not be learned naturally like ASL because even in highly skilled usage, 
communication is too slow.  These findings reveal an intimate connection between 
modality and linguistic properties.   
Moreover, since phonological features of signs such as handshape and 
location are spatially encoded, is ASL more resistant to the temporal distortion of 
time-compression? Findings similar to speech indicate that there is a modality-
 independent upper limit to language processing (Fischer, Delhorne & Reed 1999; 
Foulke & Sticht 1969), suggesting that the channel capacity for language is not 
unique to a particular sensory system.   
In addition to providing to way to learn about time scales for processing 
sucessive input, local-reversals also test a more general ability to tolerate distorted 
signals.  Intuitively, spatial encoding of phonological features is predicted to make 
sign languages more robust to temporal distortions, but according to Fischer et al.?s 
(1999) conclusion about their results seen in time-compressed studies, this is not 
always the case.  However, the greater resistance of ASL to temporal distortions of 
disruptions in the signal created by repetitive temporal interruptions, as compared to 
English (see Figure 11), has been indeed attributed to properties of the visual 
modality (Tweney, Heiman & Hoemann, 1977).   
59 
 
 
Figure 11. Reproduced from Tweney, Heiman & Hoemann (1977), this 
figure shows the intelligibility of ASL and English sentences as a 
function of temporal disruption frequency and signing/speech-time 
fractions.  These results demonstrate that sign language is more 
resistant to temporal disruptions than speech.  
 
Thus, although spatial encoding of phonological features in ASL may result in 
higher accuracies overall for local-reversals, its effect on the size of the time-
 windows is less clear.  Based on the hypothesis that there is an intimate connection 
between modality and the rate at which linguistic units are incrementally presented 
over time, it is predicted that ASL sentences are integrated over larger-time windows 
than English. 
Materials 
40 sentences of ASL were constructed by sign language linguists.  These 
sentences were designed to be natural sentences of ASL, using a diverse range of 
60 
phonological parameters, but low in semantic predictability to force the participants 
to pay close attention to the input rather than guess the sentences from identifying few 
key signs.  These criteria are similar to the ones that are reported for sentences of 
English in the TIMIT corpus.  Although the ideas for semantic content of the 
sentences were taken from TIMIT sentences, the sentences constructed here were not 
direct translations but rather followed ASL grammar.  Two examples of stimulus 
sentences translated into English are, ?The girl tends to visit the frog on Wednesdays? 
and ?Heavy snow and strong winds make it hard to see the outline of the mountains.? 
The mean average duration of these sentences was 3.53 s (standard deviation ? 0.94 
s).  Since there is currently no large ASL corpus that contains frequency measures for 
signs (although see Morford and MacFarlane (2003), the first to describe frequency 
characteristics of ASL), effort was taken to incorporate signs of varying lexical 
frequency based on intuition.  These sentences also incorporated non-manual features 
that are appropriate to standard use of ASL.  Based on the estimate that fingerspelled 
words appear as frequently as 7%-10% in the overall vocabulary in everyday signing 
(Padden, 1991), a few fingerspelled words were incorporated into these sentences as 
well.  These words were either lexicalized signs (e.g., B-N-K for ?bank?) or common 
proper names (e.g., N-A-N-C-Y for ?Nancy?).   
A female native signer modeled 40 experimental sentences and 8 practice 
sentences facing the camera.  All sentences started and ended with a ?neutral? 
position, which was defined here as a relaxed position with hands crossed below the 
waist.  The best tokens of each sentence produced by the model were then trimmed to 
768 x 576 pixel frames (29.97 fps, or 33 ms per frame). Frame sequences in 
61 
uncompressed AVI files were locally reversed at increments of 4, 8, 12, 16, 20, 24, 
and 28 frames (133 ? 934 ms) with a control condition without any reversal 
manipulation (0 ms) (see Figure 12 for an example).  This resulted in a total of 320 
sentences so that 40 sentences could be randomly assigned to 8 different conditions, 
with 5 examples per condition, for each participant.  All videos were processed with 
Cinepak codec for stimuli presentation. 
 
Figure 12. Demonstration of how locally time-reversed stimuli were 
created for sentences of ASL. This specific example shows reversals 
133 ms in duration (reversals by 4 frames). 
 
Procedure 
Fourteen deaf participants (11 female, 23 mean age) who all grew up using 
ASL before one year of age participated in this study at Gallaudet University campus.  
Participants were compensated $15/hour for the study.  All instructions for the 
experiment, as well as informed-consent for participation in the experiment and 
video-recording of responses, were given to the participants both in ASL and written 
English, following the protocols of the Institutional Review Boards at Gallaudet 
University and University of Maryland, College Park.  Participants were told that they 
would be viewing videos of sentences of ASL that had been modified by various 
degrees.  They were instructed to sign to the camera whatever they could understand, 
trying to sign back the original sentence as much as possible.  Unlike previous 
experiments from spoken English, where responses were type-written by the 
62 
participants, here the responses were signed and video-recorded.  Participants were 
allowed to view a sentence up to four times before continuing onto the next sentence, 
moving at their own pace (following Greenberg and Arai (2001)).  After the 
experiment, these responses were coded and analyzed for accuracy in comparison to 
the original sentences.  Accuracy of the responses was calculated by determining how 
many of the manual signs, facial features for grammatical inflection, and spatial 
modulation were correctly reproduced by the participants.  Signs were scored as 
being correct only when they matched all target parameters (handshape, location, and 
movement) and were produced in the original order.  Data were analyzed to 
determine the intelligibility of the sentences as a function of the reversal size.   
Results 
Results for Experiment 1 are presented in Figure 13, where intelligibility is 
plotted as a function of reversal size.  A one-way, repeated measures ANOVA (using 
R 2.8.1, R Development Core Team (2005)), revealed that intelligibility varies with 
the duration of the reversals (F7,104 = 22.924, p < 0.001).  A remarkable difference 
from the patterns seen in the local reversal of speech is that intelligibility in ASL 
levels off to ~50% for even the most degraded stimuli.  Tukey?s Honestly Significant 
Difference (HSD) pair-wise post-hoc tests were conducted to check for differences in 
accuracy among the reversal conditions.  In ASL, reversals 133 ms in duration only 
resulted in a small decrease in accuracy that was not significantly different from the 
control condition.  A significant decrease in intelligibility is found between reversals 
133 ms and 267 ms in duration.  Intelligibility continues to fall until reversals of 533 
ms, and no differences were found among larger reversals (533 ms ? 933 ms).  
63 
 
 
Figure 13. Results from Experiment 1 from 14 participants, demonstrating 
the intelligibility curve of ASL sentences as a function of reversal size, 
which implicates ~300 ms temporal integrations windows.  50% 
intelligibility of even the most degraded stimuli is attributed to spatial-
 encoding in sign language. Error bars represent plus or minus one 
standard error of the mean. 
 
Discussion 
In speech studies, temporal integration windows have been approximated by 
looking for the reversal size at which intelligibility falls to 50%, or ~60 ms for normal 
speech rates, where reversal sizes where increased in 10 ms increments (Greenberg & 
Arai, 2001).  However, it is not possible to use this criterion in these results from a 
64 
sign language because even the most degraded sentences remain approximately 50% 
intelligible.  Moreover, in this study, reversal sizes were increased in much larger 
increments (133 ms).  Perhaps one way to compare the results across the modalities is 
to determine the reversal size for the half-way point between the highest and lowest 
accuracy scores, approximately 60 ms in speech and 300 ms in ASL.  Alternatively, 
the first experimental condition at which intelligibility falls sharply can be compared 
across the modalities, 50 ms in speech and 267 ms in ASL.  These estimates are not 
presented as absolute values but as approximations for time scales at which temporal 
integration occurs. 
These results support the hypothesis that ASL sentences are integrated over 
longer time windows than English.  The important differences across the modalities 
are as follows.  In speech, 100+ ms reversal durations leads to intelligibility that is 
close to 0% (Greenberg & Arai, 2001).  In sign language, reversals must be 500+ ms 
in duration to reach lowest accuracy scores.  Even the most degraded sentences 
remain approximately 50% intelligible. Although 100+ ms reversals result in 
unintelligibility in speech, 133 ms reversals do not result in performance that is 
significantly different from the control condition (no reversals) in sign language.  
This should be contrasted with speech, where reversals do not result in drastic 
reduction of intelligibility as long as they are 40 ms or less.  At 50 ms reversals of 
speech, intelligibility falls sharply, reaching 50% by 60 ms.   In sign language, 267 
ms reversals result in a significant decline in intelligibility.  
In speech, accuracy in performance ranges from 100% at the control condition 
to 0% at the most degraded stimuli.  Noticeably in this experiment with ASL, the 
65 
percent accuracy without local-reversals was at 90%.  This is similar to the findings 
by Fischer et al. (1999), who tested the intelligibility of time-compressed ASL 
sentences, where percent accuracy was also around 90% for sentences played at the 
control/normal rate.  Also in a sentence-shadowing task (Mayberry & Fischer, 1989), 
the upper boundary in percentage of ASL sentences that were signed back without 
any error in good viewing conditions for native signers was 76% in one experiment 
and 88% in another.  The percentage of native signers who made no mistakes 
shadowing ASL sentences was 91%.   
One reason why accuracy scores were ~90% for even temporally intact 
sentences could be attributed to task differences between this study and speech 
experiments (Greenberg & Arai, 2001).  In speech, participants were asked to type 
their responses, which suggests that they had a workspace where they could reference 
and revise their answers while listening to the sentences.  If hearing participants are 
asked to speak back the sentences in the same way that sentences were signed back in 
this experiment, it is possible that accuracy could be less than 100%.  If, however, 
better controlling how responses are given and collected still results in significant 
differences at ceiling levels of performance, it may suggest that modality contributes 
to differences in the working memory for sentence processing. 
Fischer et al. provide some speculations on why accuracy was overall higher 
for speech even with normal sentences in their compression study.  They suggest the 
possibility that the sentences they used were not completely natural in ASL since they 
were translations of English sentences.  Moreover, they attribute the unnaturalness of 
some of their sentences to the fact that isolated sentences with verb agreement and 
66 
topicalization are awkward in ASL because they are discourse-dependent (Lillo-
 Martin, 1991).  If discourse-dependence of a language has a significant effect on the 
ceiling performance in sentence intelligibility tasks, it is predicted that other spoken 
languages that are described as being more discourse-dependent than English (such as 
Chinese, Korean, and Japanese (Lillo-Martin, 1991) will show similar patterns as 
ASL. 
A review of the errors that occurred in the control condition in Experiment 1 
indicates that participants did not sign certain pronouns.  Because ASL is a pro-drop 
language, such omissions do not often change the meaning of sentences in 
conversations, but these were scored as errors in the experiment.  Other errors 
included changes in sign order, selection of different variant handshapes, or selection 
of different signs that did not affect the meaning of the sentences.  For example, one 
participant substituted WITH for HAVE in a control condition. 
The mean average intelligibility of the conditions where performance levels 
off at its lowest point (533 ms, 667 ms, 800 ms, and 933 ms reversals) is 47%, which 
is notably higher than 0% found in speech.  This is most likely attributable to the 
spatial encoding of phonological features in ASL. Information on handshape, 
location, and orientation of the signs, along with facial features, can be extracted from 
a single frame of a sign.  Brentari (2002) compares signs without movement 
information to words without vowels, where consonants are more informative. 
Nevertheless, all signs have movement, and signs can exist as minimal pairs 
just through the difference of movement.  Moreover, native signers consider 
movement to be the most salient component of a sign (Corina & Hildebrandt, 2000).  
67 
Reversing the temporal order of the movement sequence can result in complete 
differences in meaning.  Because of the spatial modulation in the verb GIVE, ?I-
 GIVE-YOU? played backwards results in ?YOU-GIVE-ME? (see Figure 14).   
 
Figure 14. Reproduced from Liddell (2000), this illustration represents the 
sign for GIVE, where the direction of movement can mean I-GIVE-
 YOU but the reverse would result in the opposite meaning YOU-
 GIVE-ME. 
 
Some movements also involve changes in handshapes.  For example, TAKE 
played backwards results in DROP, and SEND played backwards results in SHUT-
 UP (Brentari, 1998).  Another example of the importance of temporal order is in 
fingerspelling.  However, there are many cases where movements are not as sensitive 
to temporal direction.  Trilled movements have also been called local movement or 
oscillations (Liddell, 1990). For example, it is possible to accurately identify the sign 
for tree, which involves trilled radial-ulnar movement, whether it is played forwards 
or backwards.  Finally, there are many signs that would not mean anything if they 
were played backwards, such as CHINA. 
When examining the errors that participants make, some signs seem more 
sensitive or robust to the local time-reversals.  In one sentence, REDUCE was 
68 
replaced by the opposite sign INCREASE.  Even though the verb ASK involves both 
a change in movement and hand-orientation, one participant substituted ASK-YOU 
for ASK-ME even though ASK-ME played backwards has the wrong hand-
 orientation.  A similar example is the substitution of the one-handed sign TELL-ME 
for the two-handed sign ANNOUNCE.  These cases suggest that even in cases where 
the backwards version of a sign does not perfectly match up to a real sign, the 
saliency of the movement helps the viewer perceive approximates, which is consistent 
with Corina & Hildebrandts?s (2002) results.  Another participant interpreted the sign 
for PULL-APART, as in peeling an orange, as representing the shape of the orange, 
where the hands come together rather than apart. For many other signs, the decrease 
in accuracy scores was simply a result of omissions rather than wrong substitutions. 
Although fingerspelled words were recognized as fingerspellings, accuracy was low 
for these signs.  The trilled signs DIRTY (wriggling movement) and ORANGE 
(closing movement) were always identified correctly. 
Further analysis that is planned for the future includes studying the error types 
by determining what percent of errors were due to misses (no guesses) or mis-
 guesses.  None of the previous studies on locally-reversed speech (Saberi & Perrott, 
1999; Greenberg & Arai, 2001; Figueroa, 2009; Stilp, Kiefte, Alexander, & Kluender, 
2010) report error types, but a comparison across the English and ASL may indicate 
modality-effects for error types.  Because of the kinds of errors listed above, where 
playing a sign backwards can result in a close approximation to another sign, one 
prediction is that ASL will have a higher proportion of errors that are due to mis-
69 
guesses than English, which is predicted to have a higher proportion of misses than 
ASL. 
 The fact that even the most degraded sentences in ASL are still  ~50% 
intelligible reveals a modality effect in temporal integration, presumably due to the 
spatial encoding of phonological features in sign language.  This result is similar to 
findings of Fischer et al. (1999), who tested intelligibility of ASL sentences that were 
compressed at different rates.  Although they do not discuss this aspect of their data, 
figures show that even in the most accelerated condition (compression by a factor of 
6), sentences are  ~20% intelligible and individual signs are ~40% intelligible.  As 
compared to the findings from English by Miller and Licklider (1950), Tweney et al. 
(1977) also find that ASL is more resistant to temporal disruptions.  Even at the most 
disruptive condition of frequency of interruptions and speech-time fractions, ASL 
was found to be ~35% intelligible (versus 5% in speech).  Like the findings from 
speech, intelligibility of a sequence of words is affected by whether they are 
presented in a random list or grammatical sentences.  In both cases, ASL is more 
resistant to temporal disruptions than English.  Tweney et al. (1977: 255) speculate 
?whether [resistance to disruption] derives from the linguistic structure of the signs or 
from the redundancy that would be possessed by any dynamic visual display.?  
Findings from the local-reversal of ASL and an examination of errors suggest that 
both factors are involved in the results.  While ASL sentences are more resistant to 
local-reversals overall, the phonological/grammatical or prosodic features of some 
signs make them more robust than others. 
70 
Nevertheless, the sharp decline in intelligibility at 267 ms reversals suggests 
that reversals in such durations impose a significantly greater processing difficulty 
than conditions with reversals of shorter durations.  In speech, acoustic signals in the 
speech stream fluctuate over smaller time-scales, and 50 ms reversals cause a 
significant decrease in intelligibility.  This value is mostly likely attributable to the 
duration of fine-structures in speech, particularly consonants.  Linguistic units in ASL 
have been reported to fluctuate over longer time-scales (Bellugi & Fischer, 1972; 
Wilbur & Nolen, 1986).  Temporal integration windows in the time-scale of 250 ? 
300 ms may correspond to the average duration of syllables in ASL (Wilbur & Nolen, 
1986).  A comparison of English and ASL in the cognitive restoration of locally-
 reversed sentences suggest that temporal processing locks to different linguistic units 
across modalities (segments in speech and syllables in ASL).  Nevertheless, although 
the values at which intelligibility falls drastically may differ between a signed and 
spoken language, they both implicate temporal integration windows that are 
dependent on the duration of linguistic units.   
 
2.6 Experiment 2 ? Effect of modality-independent 
mechanisms on temporal integration windows: 
evidence from compression and local-reversals of ASL 
sentences 
 
In Experiment 1, intelligibility as a function of reversal size implicate 
temporal integration windows that are ~ 250 ? 300 ms in duration.  Experiment 2 is 
designed to test whether these results are linked to the size of linguistic units in ASL 
71 
or should be attributed to more general visual processing mechanisms.  One way to 
test this is to manipulate the rate of the sentences, where compressions by a factor of 
2 reduces the average duration of the articulations by half.  Two studies in speech 
have used the combination of compression and local-reversals and demonstrated that 
the point at which intelligibility falls drastically is dependent on speech rates 
(Figueroa, 2009; Stilp, Kiefte, Alexander, & Kluender, 2010).  Whereas intelligibility 
falls to ~50% at ~60 ms reversals at normal rates of speech, it falls the same amount 
at ~30 ms reversals at 2x rates, proportional to the reduction of sentence durations.  
These findings from speech, as well as the findings from Experiment 1, suggest that 
although durations of temporal integration may vary within and across languages, the 
mechanism of integrating the sensory input according to windows that track the 
fluctuation of linguistic units is universal.   
Materials 
The same 40 sentences of ASL from Experiment 1 were used again.  The 
sentences were compressed by a factor of 2 by deleting every other frame from the 
original videos.  The resulting videos were then locally reversed at increments of 4 ? 
28 frames (133 ? 934 ms) with a control condition without any reversal manipulation 
(0 ms).  This resulted in a total of 320 sentences so that 40 compressed sentences 
could be randomly assigned to 8 different conditions, with 5 examples per condition, 
for each participant.  All videos were processed with Cinepak codec for stimuli 
presentation. 
72 
Procedure 
Fourteen deaf participants (10 female, 21 mean age) who all started learning 
ASL before one year of age were recruited for this study on Gallaudet University 
campus.  The procedure for the experiments were identical to Experiment 1 except 
the videos of compressed sentences were played.  Participants were compensated 
$15/hour for the study.  
Results 
Results for Experiment 2 are given below, superimposed with the results from 
Experiment 1, where intelligibility is plotted as a function of reversal size (Figure 15).  
A two-way repeated measures ANOVA (using R 2.8.1, R Development Core Team 
(2005)) indicated that intelligibility varies significantly by rate (F1,208 = 23.33, p < 
0.001) and by reversal size (F7,208 = 55.74, p < 0.001), and that there is an interaction 
between rate and reversal size (F7,208 = 2.41, p < 0.05), most likely due to similar 
floor-effects of ~50% in both rate conditions.  A one-way repeated measures 
ANOVA for the compressed sentences showed that intelligibility varies with the 
duration of the reversals (F7,104 = 39.34, p < 0.001).  Tukey?s Honestly Significant 
Difference (HSD) pair-wise post-hoc tests were conducted to check for differences in 
accuracy among all the reversal conditions for the compressed sentences. A sharp 
decrease in intelligibility is found between the control condition and 133 ms reversals 
(p<0.001), and additionally between 133 ms and 267 ms reversals (p<0.001).  No 
differences were found among larger reversals (400 ms - 933 ms).  Tukey?s HSD 
post-hoc tests comparing normal and 2x rates at each condition indicate that there 
were no differences in conditions without any reversals, a significant difference at 
73 
133 ms reversals (p < 0.05), marginally significant difference at 267 ms and 400 ms 
reversals ( p ~ 0.10), and no significant difference among larger reversals (533 ? 933 
ms).   
 
Figure 15. Results from Experiment 1 and 2 (14 participants in each 
experiment), demonstrating the intelligibility curve of ASL sentences 
as a function of reversal size and compression by a factor of 2, where 
temporal integration windows are proportional to the input rate 
(indicated by a sharp drop in intelligibility at ~267 ms reversals at the 
normal rate and ~133 ms reversals at the 2x rate).  These results 
suggest that temporal integration windows in sign language are 
determined by the rate and durations of linguistic units. Error bars 
represent plus or minus one standard error of the mean. 
 
 
74 
Discussion 
These findings support the assumption that local time-reversals of the sensory 
input provide insights about temporal integration windows of linguistic units.  The 
results from Experiment 2 suggest that ASL sentences are integrated over time 
windows that scale to the duration of linguistic units.  The key difference between 
Experiment 2, where sentences were presented at double the normal rate, from 
Experiment 1, is that intelligibility falls drastically earlier at 133 ms (compared to 267 
ms). Moreover, intelligibility of compressed sentences plateaus sooner at 400 ms 
(compared to 533 ms in normal rate sentences). 
These results are similar to the findings from speech (Figueroa 2009, Stilp, 
Kiefte, Alexander, & Kluender, 2010), where the implication of ~60 ms temporal 
integration windows from local reversals of sentences played at normal rates does not 
seem to be inherent to auditory processing since windows decrease in duration as the 
linguistic units also decrease in duration with compression.  In the same way, the 
results from Experiment 1 (~ 250 ? 300 ms temporal integration windows) are 
indicative of both visual and linguistic aspects of integration.  Experiment 2 provides 
evidence that a universal, modality-independent mechanism in sensory processing for 
language is to integrate the input in time-scales that scale to the fluctuation of 
representational units.   
In speech, findings from local-reversals provide evidence for temporal 
integration windows that correspond to the sizes of phonemes.  The temporal order of 
phonemes is crucial for word identification.  Based on previous reports of sign and 
syllable rates (Bellugi & Fischer, 1972; Wilbur & Nolen, 1986), it is possible that ~ 
75 
250 ? 300 ms temporal integration windows correspond to the average duration of 
syllables in ASL.  Chapter 3 explores in greater depth the average rate of signs, 
morphemes, and syllables among native signers of ASL.  However, an analysis of 
signing rate for the 40 sentences used in these experiments indicated an average 
period of ~500 ms per grammatical component (manual sign, facial features for 
grammatical inflection, and spatial inflections), and longer for individual signs. In 
Experiment 1, intelligibility dropped sharply at 267 ms reversals.  Wilbur and Nolen 
(1986) report that syllables are on average 250 ms in duration.  Although segment 
analogies have been made in signs (Liddell, 1984), and reversing the temporal order 
of these segments can result change in meaning, these results suggest that sign 
languages can tolerate changes in temporal direction to a greater degree than in 
spoken languages, suggesting that the nature of how segments are encoded in signing 
and speech are quite different (Brentari, 1998; Wilbur & Allen, 1991).  In addition to 
simultaneous encodings, repetitions that are possible in sign without producing 
lexical differences (Channon, 2002) make sign language much more robust to 
reversals in temporal direction than speech.   
 
2.7 Experiment 3 ? Effect of developmental factors on 
temporal processing: evidence from late-learners of 
ASL  
 
An important aspect of language acquisition is to recognize ? and match in 
both perception and production ? the temporal dynamics of the target language.  
Here, preliminary results are presented with deaf late L2-learners of ASL who self-
76 
report English as their first language.  Acquiring a language later in life often has 
consequences for properly learning the temporal parameters that distinguish different 
representations and processing a rapid sequence of input in on-line processing.  The 
goal of Experiment 3 is to explore the effect of developmental factors on temporal 
processing by comparing the native signers of Experiment 1 with late-learners of 
ASL.  Studying this group will also lead to a better understanding of whether the 
mechanism of temporally integrating the sensory signal according to the size of 
linguistic units in the language is universally present among all users of a language or 
only the native users.  If this aspect of language processing is universally present even 
among late-signers, it is predicted that intelligibility of sentences would fall 
drastically at 267 ms reversals and level off at 533 ms.  In contrast, if the 
phonological bottlenecks that late learners experience is due to having temporal 
integration windows that do not match the time scale at which the sensory 
information generated, then it expected that intelligibility will fall drastically at 133 
ms reversals and level off around 400 ms.  Another logical possibility is that 
intelligibility will fall drastically and level off at larger reversals, but together with 
previous work on spoken languages, where late learners are more sensitive to 
distortions in the input (Rogers, Lister, Febo, Besing, & Abrams, 2006), and pilot 
trials testing the experimental materials on late signers before Experiment 1 was 
conducted,  it is predicted that late-learners are less tolerant to distortions than native 
signers.  This is in contrast to the opposite prediction from the assumption that late 
learners are more tolerant of variability since they have less robust representations. 
77 
Processing locally time-reversed sentences requires perceptual flexibility.  
Perceptual flexibility may come from more efficient processing, where efficient 
processing is dependent on the coupling of perceptual processes to the signal that has 
to be analyzed.  Previous work on bilinguals of spoken languages suggests that the 
ability to adapt to noisy listening conditions is weaker than monolinguals, and that 
late bilinguals face the greatest degree of difficulty in their second language (Mayo, 
Florentine, & Buus, 1997).  Local-reversals create disruption that is akin to noise.  
Moreover, since studies on late learners of ASL have shown that they may use finer-
 grained phonological processing (Best, Mathur, Miranda, & Lillo-Martin, 2010), 
resulting in ?phonological bottlenecks? that affect other aspects of processing 
(Mayberry & Fischer, 1989; Mayberry, 2007), it is possible that such results can be 
attributed to temporal integration windows in time-scales that are smaller than those 
that are characteristic of native signers.  The longer temporal integration windows 
that more precisely match the duration of linguistic units may require language 
exposure at early stages of development.  Experiments on non-linguistic stimuli 
(using flashes of light) have shown that visual processing can operate in smaller time-
 scales around ~150 ms (Busch, Bubois, & Van Rullen, 2009; Perrett, Rolls, & Caan, 
1982).  However, it is also possible that the nature of signing rates present in the input 
may still require late-learners to integrate over appropriate durations that map to 
representational units.   
Materials 
The same materials from Experiment 1 were used in Experiment 3.   
78 
Procedure 
8 deaf participants (7 female, 31 mean age) who all started learning ASL as an 
L2 after age 10 were recruited for this study at Gallaudet University campus. Based 
on the finding that L2 learners of ASL have distinct profiles from late L1 learners 
(Mayberry, 1993), it was decided that investigations on late learners should progress 
in at least two stages.  This preliminary experiment tests the group that is assumed to 
have stronger language skills, so that age of acquisition is not confounded with 
additional factors associated with late L1 acquisition, although language skill 
assessment pre-tests were not included in the experiments.  Moreover, because visual 
processing among deaf and hearing are known to be different due to different sensory 
experiences (Bavelier, Dye, & Hauser, 2006), it was decided that hearing late learners 
of ASL should not be included at this early stage of the project.   
Here, only participants who reported profound hearing loss before the age of 
two years old and listed English as an L1 were included in the study.  Participants 
were also required to have at least 5 years of ASL signing experience to be eligible.  
The average number of years of signing experience was 15. The procedure for the 
experiments was identical to Experiment 1.  Participants were compensated $15/hour 
for the study.  
Results 
Results from Experiment 3 are shown in Figure 16, presented together with 
the results from Experiment 1, where intelligibility is plotted as a function of reversal 
size.  Because of the difference in sample sizes thus far (although plans are made to 
continue the study to full sample size to match Experiment 1), differences were not 
79 
tested statistically.  However, the overall patterns indicate that performance among 
late-signers is lower than early-signers by ~10%.  Late-signers also show a drastic 
decrease in performance at 133 ms reversals.  Moreover, their performance levels off 
at two different time scales, first at 267-400 ms reversals, which is followed by 
another sharp decrease at 533 ms, after which intelligibility scores do not change 
significantly. 
 
Figure 16. Results from Experiment 1 and 3, demonstrating the effects of 
age-of-acquisition in processing time-distorted stimuli. Note: n=14 in 
Experiment 1 and n=8 in Experiment 3.  Late learners demonstrate 
greater sensitivity to time distortions in the input, but performance 
among the early and late learners plateau at similar distortion scales. 
Error bars represent plus or minus one standard error of the mean. 
 
80 
Discussion 
These findings point to differences in temporal processing due to acquiring a 
language later in life.  Experiment 3 tested two hypotheses: 1) temporal integration 
windows for late signers are the same as native signers if the size of linguistic units 
determines the duration of these windows, and 2) temporal integration windows for 
late signers are shorter in duration than early signers because the development of 
longer windows requires early exposure to signing.  The current resuls partially 
support both hypotheses.  The performance of early- and late-signers are similar in 
that intelligibility plateaus at similar reversal durations (533 ? 933 ms).  In contrast, 
Experiment 2 showed that shorter temporal integration windows result in earlier 
reaches to lowest levels of performance.  However, performance among late-learners 
fell drastically at 133 ms reversals, similar to the findings from Experiment 2.  Based 
on the shape of the intelligibility curve, it is likely that this was not simply due to an 
overall lower performance due to late acquisition.  Although the task is generally 
more difficult for late-signers, they are also more sensitive to temporal distortions.  
Processing difficulty at smaller reversals implicate shorter temporal integration 
windows.  Reversals that go beyond these windows become much more difficult to 
integrate and map to linguistic representations.  Nevertheless, knowledge about the 
fluctuation and duration of linguistic units from signing experience may help late-
 signers recover information from reversals that exceed these windows. 
More data is needed before conclusions can be made about whether late L2-
 learners processing normal rate sentences pattern more like early-learners processing 
normal rate sentences or compressed sentences, or whether they have their own 
81 
unique profiles as a group, as they seem thus far.  Moreover, to better understand 
what processing of locally reversed stimuli tell us about supporting language skills 
and other cognitive factors, such as working memory, it would be worthwhile to 
conduct the experiments with language and cognitive assessments in the future, not 
only among late learners but all participants in experiments studying the cognitive 
restoration of time-distorted stimuli.  
 
2.8 Conclusion 
 
The three experiments presented in this chapter are the first to investigate the 
impact of modality on temporal integration windows in language processing.  Locally 
time-reversing the sensory signals in a sentence provides insights on the mechanisms 
for recovering linguistic representations.  Stimuli with reversal durations that go 
beyond certain limits cannot be integrated properly and cause problems for 
comprehension.   
Experiment 1 demonstrates two effects that are driven by modality in 
language processing.  Temporal integration windows are much longer in duration in 
the visual processing of language (approximately 250 ? 300 ms) than in speech 
(approximately 50 ? 60 ms).  Moreover, spatial encoding in a visual language makes 
ASL much more resistant to temporal distortions, resulting in ~ 50% intelligibility for 
even the most degraded sentences (compared to 0% in speech).  In speech, distorting 
the temporal direction of the input reveals smaller temporal integration that scale with 
the duration of segments.  In sign language, locally-reversing sentences tap into the 
82 
processing and integration of larger syllabic units (Wilbur & Nolen, 1986).  These 
differences have implications about the impact of modality on the temporal 
organization of representational units in languages, in addition to structural 
hierarchies (Brentari, 1998), and when temporal processing converges. 
Despite these differences, spoken and signed languages seem to share the 
characteristic of having temporal integration windows that scale with the size of 
representational units in the languages.  This modality-independent property was 
confirmed by the results of Experiment 2, where the duration of temporal integration 
windows were reduced to ~133 ms, in proportion to compression rate. 
Results from late-signers in Experiment 3 offer a unique developmental 
perspective to temporal integration windows that has never been studied before, even 
in speech research.  The findings suggest that longer temporal integration windows 
found in ASL processing is partially dependent on early exposure to a language 
where phonological units fluctuate at those time-scales.  Early exposure to the target 
language gives earlier signers the advantage of being less sensitive to temporal 
distortions, perhaps due to longer integration windows.  Moreover, having temporal 
integration windows that better match the size of linguistic units in the language may 
lead to more efficient processing.  Although late-signers show indications of being 
sensitive to the duration of linguistic units, they also seem to be integrating the visual 
input at shorter time-scales.  Testing late signers on compressed and locally-reversed 
sentences in the future will lead to a better understanding of how sensitive they are to 
time-scales of the input.   
83 
Like other studies (Mayberry, 1993; Mayberry & Fischer, 1989), Experiment 
3 shows that late L2 signers have overall lower levels of performance on sentence 
repetition tasks.  Many factors seem to contribute to this effect, including difficulty in 
grammatical processing, phonological processing, and working memory.  Experiment 
3 also indicates that late L2 signers are more sensitive to distortions in the input than 
early signers.   
A comparison between late L2 deaf signers and late L1 deaf signers has yet to 
be tested.  Performing the intelligibility tasks, where a distorted sentence must be 
repeated back, is assumed to require strong language skills from early language 
exposure (Mayberry & Fischer, 1989).  Based on previous studies that compare these 
two types of late learners, where late learners of a first language perform more poorly 
on language and cognitive assessment compared to their early learning counterparts, 
(Mayberry, 2003; Boudreault & Mayberry, 2006; Mayberry & Lock, 2003), it is 
predicted that late L1 deaf signers will overall have lower accuracy with normal and 
locally-reversed sentences.  If resilience to disruptions in the input is dependent on 
robustness of representations associated with early L1 acquisition, it is also predicted 
that late L1 signers will show a sharper drop in performance accuracy with local-
 reversals.    
Based on the findings from bilinguals of spoken languages, it is possible that 
the present results from late L2 signers is due to late-learning or having ASL as 
another language.  Another potential explanation is that having English as an L1 ? a 
spoken language with much shorter temporal integration of segments ? impacts 
temporal integration of ASL.  A way to tease apart these explanations is to test 
84 
bilingual CODAs, for whom ASL is often the L1 and English an L2.  If CODAs also 
show more sensitivity to noise in the input caused by temporal distortions than deaf 
native signers, it would suggest that bilingualism reduces perceptual flexibility across 
modalities.  Such a finding would support the hypothesis that greater sensitivity to 
noise or other distortions in the input among bilinguals is due to the use of greater 
cognitive resources for language processing associated with suppressing the unused 
language.   
On the other hand, if greater sensitivity to input distortions is due to sharing of 
phonological space by two languages, such effects should not be found across 
modalities, where phonological spaces do not overlap to such degrees.  However, 
results from CODAs that show greater sensitivity to shorter reversals would be 
confounded with experience with spoken English, experience with which might bias 
shorter integration windows.  Thus, it would be valuable to test late-signers for whom 
ASL is an L1.  If this group is equally sensitive to temporal distortions as late-signers 
for whom ASL is an L2, then late-learning may be the best explanation for the results 
found in Experiment 3.  Finally, it would be valuable to test late-learners of a spoken 
language (for example, English L2 bilinguals) to better understand the effects of age-
 of-acquisition and bilingualism on the integration of sensory signals for language 
processing.  Comparing unimodal and bimodal bilinguals would be particularly 
interesting because time-scales in languages within a modality are much more similar 
than two languages in different modalities.   
The difference in time-scales of temporal integrations windows in sign 
language (~ 250 ? 300 ms) and speech (~ 50 ? 60 ms) is attributed to locking to units 
85 
of different time-scales.   However, another reason that distortions over larger time-
 scales are tolerated in ASL is that visually represented information may be encoded 
with greater temporal flexibility than auditory/speech-based representations.  In 
working memory experiments, signers have been found to have spans in ranges that 
are shorter than hearing individuals (Wilson, Bettger, Niculae, & Klima, 1997; 
Boutla, Supalla, Newport, & Bavelier, 2004).  Nevertheless, native signers are 
reported to have an equally easy time recalling a list of items forwards or backwards 
(Wilson, Bettger, Niculae, & Klima, 1997).  Although Boutla et al. (2004) report that 
hearing bilingual signers have shorter spans on sign language tasks compared to 
speech tasks, order flexibility is not mentioned.  In a non-linguistic task, Kimura et al. 
(2010) provide evidence that sequential regularities are automatically encoded in the 
visual system.  Thus, the flexibility for order found among signers may be attributed 
more specifically to visual working memory or the prosodic features of ASL rather 
than visual encoding.  A promising area of future research is to study the relationship 
among temporal integration windows, sensory encoding, and working memory, for 
both linguistic and non-linguistic functions. 
Viewing locally-reversed videos and cognitively restoring the movements that 
encode the original linguistic message requires discrimination between movements 
actually produced by the signer and apparent motion created by discontinuous video 
frames.  In studies that tested viewer?s ability to re-create Chinese pseudocharacters 
from dynamic point light displays, those who had sign language experience (deaf and 
hearing signers) were able to determine the underlying discrete stroke movement 
patterns better than non-signers, although neither group was familiar with Chinese  
86 
(Klima, Tzeng, Fok, Bellugi, Corina, & Bettger, 1995; Bettger, 1992).  One question 
that arises is how much experience with signing results in this enhanced ability to 
analyze movement, and how late signers compare with native signers in these skills. 
Results from Experiment 2 suggest that ~ 250 ? 300 ms windows in 
Experiment 1 are not absolute values in the visual processing of language but rather 
are relative to the size of linguistic units in the sentences at different rates of signing. 
Temporal integration windows in sign language perception may also converge with 
findings from reading.  The average duration of eye fixations in reading is 200-300 
ms (Rayner, 1998).  One can also consider the possibility that language processing 
takes advantage of these time-scales that exist also for non-linguistic visual 
processing.  In a study that tested viewer?s ability to detect flashes of light, visual 
detection thresholds corresponded to the phase of EEG oscillations in the theta (4-8 
Hz) and alpha (8-12 Hz) range (Busch, Bubois, & VanRullen, 2009).  Experiments on 
non-human primates have shown that some neurons do not respond to complex visual 
stimuli until 100-150 ms after stimulus onset (Perrett, Rolls, & Caan, 1982).  One 
way to examine whether time-scales ~ 250 ? 300 ms in duration are privileged 
windows for other aspects of visual processing is to test the local reversals of non-
 linguistic gestural movements.  Anecdotal accounts of viewing videos while 
rewinding suggests that non-linguistic gestures are overall more tolerant of temporal 
reversals than ASL sentences but testing this more systematically through different 
degrees of temporal reversals may also reveal a sharp decline in intelligibility at 
similar time-scales.   
87 
In addition to learning how integration windows are linked to the sensory 
channel and the rate at which information is transmitted in language, a better 
understanding of the neural mechanisms for language processing may provide 
insights into its temporal dynamics.  Oscillatory neuronal activity are known to be the 
underlying basis for temporal integration in a wide variety of domains (Buzsaki & 
Draguhn 2004), including speech perception (Poeppel, 2003).  Specifically, activity in 
the frequency of gamma (~40 Hz) and theta (~5 Hz) bands has been proposed to 
correspond to integration of segments and syllables of a speech stream, which are on 
average 50-80 ms and 150-300 ms in duration, respectively (Boemio, Fromm, Braun, 
& Poeppel, 2005; Luo & Poeppel 2007).  How such a model of a multi-time 
resolution process and neural oscillations, and in particular these frequency bands for 
integration and comprehension, extends to sign languages has never been 
investigated. 
In addition to entraining to the physical characteristics of the sensory signal, 
endogenous brain rhythms also subserve other neurocognitive processes. For 
example, synchronization of neuronal firing with gamma-frequency oscillations is 
associated with feature binding and attention (Singer & Gray 1995; Fries, Nikolic, & 
Singer, 2007; Schroeder & Lakatos 2008).  Among others, activity in the theta band 
has been implicated in working memory tasks and also attention (Jensen & Lisman 
2005; Deiber, Missonnier, Bertrand, Gold, Fazio-Costa, Iba?ez, & Giannakopoulos, 
2007).  What remains unclear, however, is the nature of the relationship between the 
neuronal oscillations that subserve language-independent functions and those that 
entrain to the sensory input in language processing.  A better understanding of the 
88 
temporal dynamics of sign languages and delineating the similarities and differences 
with spoken languages will be an asset in these investigations. 
 
89 
3 Temporal Dynamics in Natural Production 
 
3.1 Introduction 
 
Understanding the time properties of perceptual processes in language 
requires information about the temporal dynamics in natural production.  In this 
chapter, I provide a review of previous studies that have examined the rate at which 
information unfolds in spoken and signed languages.  Then I provide new insights 
from comparisons of word, sign, morpheme, and syllable rates in English, Korean, 
and ASL, where data is taken from corpora of natural conversations.  In Chapter 2, I 
demonstrated that perceptual mechanisms for analyzing the sensory signal depend on 
rate in sign language as well as speech.  The goal of this chapter is to replicate and 
extend the findings from Bellugi and Fischer (1972) by investigating the relationship 
between linguistic primitives, the time durations over which they are phonologically 
instantiated, and the grammatical properties of particular languages.   
Whether on the perception or production end of communication, one must 
track how linguistic information unfolds over time.  Mismatches between these two 
interfaces have the potential to overwhelm working memory capacities and create 
serious information bottlenecks.  In this section, I explain how the research program 
advanced by Poeppel, Idsardi, and van Wassenhove (2008) ? with computational, 
algorithmic, and implementational levels for speech perception ? may also be 
extended to a visual-gestural language.  Interestingly, this way of approaching speech 
perception research with three different levels of analyses is inspired in part by 
Marr?s (1982) model for visual perception, making its application especially relevant 
90 
here.  In the following sections, I also provide a background on what is currently 
known about the dynamics of sign language production and representations of 
linguistic units.   
Theories of language processing require models which specify several details:  
what are the representational units that enter into language-specific computations, 
how those representational units are recovered from or transformed into sensory 
signals, and what are the neural bases for these processes. The consideration of 
lexical and phonological representations are sometimes lacking in other approaches to 
speech perception that focus on auditory neuroscience. Poeppel et al. (2008: 1017) 
write: 
Speech perception consists of a set of computations that take 
continuously varying acoustic waveforms as input and generate 
discrete representations that make contact with the lexical 
representations stored in long-term memory as output. Because the 
perceptual objects that are recognized by the speech perception enter 
into subsequent linguistic computation, the format that is used for 
lexical representation and processing fundamentally constrains the 
speech perceptual processes. 
 
In the same way, sign language perception consists of the generation of discrete 
representations, which compose to form lexical representations, from continuously 
varying visual light waves.  Thus, the assumption adopted here is that processes 
underlying sign language perception require recognition of perceptual objects that 
enter into linguistic computations and are constrained by the format that is used for 
sign representations and processing.  Understanding sign language perception 
undoubtedly also requires principles from visual neuroscience, including knowledge 
about shape perception, change detection, and the processing of biological, human 
91 
motion.  In the same way, psycholinguistic studies of speech draw upon auditory 
theories of spectro-temporal processing, pitch extraction, and object recognition.  
However, assuming the existence of linguistic representations, there are two possible 
models in how sensory signals and language-specific representations interrelate.  
Poeppel et al. (2008: 1074) write: 
If one is disinclined to invoke linguistically motivated representations 
early in the processing stream, then one owes a statement of linking 
hypotheses that connect the different formats (unless one does not, 
categorically, believe in any internal abstract representations for 
language processing). Alternatively, perhaps the representations of 
speech that are motivated by linguistic considerations are in fact 
active in the analysis process itself and therefore active throughout the 
subroutines that make up the speech perception process. 
 
I also assume that the linguistic nature of the information encoded in visual-
 gestural signals must play a critical role in sensory processing.  The effect of 
linguistic knowledge on visual processing can be found when comparing signers and 
sign-na?ve individuals.  Using ASL, Emmorey et al. (2003) report that images of 
phonemically contrastive handshapes that are varied continuously are perceived 
categorically by signers, marked by non-linear identification and peak in 
discrimination around the categorical boundary.  No categorical effects were found 
among hearing non-signers, who showed non-linear identification patterns but not 
peaks in discrimination around the categorical boundary.  Baker et al. (2005) extend 
these results to other handshapes in ASL, finding again that only ASL signers 
exhibited linguistic categorical perception and supporting the hypothesis that these 
effects are based on linguistic categorization rather than purely perceptual 
categorization (Baker, Idsardi, Golinkoff, & Petitto, 2005).  Campbell et al. (1999) 
92 
tested whether facial expressions that are used in Yes/No and Wh- questions in 
British Sign Language (BSL) can be perceived categorically, and how the processing 
of linguistic facial expressions compare with emotional facial expressions (Campbell, 
Woll, Benson, & Wallace, 1999). They found that both deaf signers and hearing non-
 signers showed categorical perception to emotional expressions, but only deaf signers 
showed categorical perception to the grammatical expressions when identified as a 
question marker.   
Evidence for the role of linguistic knowledge in visual processing can also be 
found in the perception of apparent motion.  Previous studies on the perceptual 
construction of motion have shown that viewers interpret the shortest possible path in 
apparent motion (Wertheimer, 1912; Korte, 1915).  However, stimuli involving 
biological motion can cause viewers to interpret the apparent motion as involving 
biologically plausible motion, even when it is not the shortest path, under specific 
time-windows that would permit such movements (Shiffrar & Freyd, 1990).  Building 
upon these studies, Wilson (2001) tested viewers? perception of apparent motion 
using signs that involve movement in ASL. Two-touch signs that involve indirect 
?hopping? motion and one-touch signs that involve direct ?sliding? motion were 
chosen as stimuli (see Figure 17).  When presented with two images in rapid 
sequences, viewers perceived apparent motion. Although hearing non-signers 
interpreted all signs as involving sliding motion, which involves the shortest 
biological plausible path of movement, deaf signers of ASL interpreted hopping 
motion when the motion resulted in a lexical item in ASL.   
 
93 
 
 
Figure 17. Examples of signs used by Wilson (2001), with images from 
www.aslpro.com (top) and www.signingsavvy.com (bottom).  The top 
row shows images taken from a video recording of BRIDGE, a two-
 contact sign that involves hopping motion from the wrist to the elbow.  
The bottom row shows images from a video recording of CREDIT-
 CARD, a one-contact sign that involves sliding motion from the palm 
and outward across the hand. 
 
When adopting a view of sensory processing where linguistic representations 
play an active role, an explicit theory about the format of these representations is 
necessary.  At the computational level, Poeppel et al. (2008) support a view where 
words consist of a series of segments, ?each of which is a bundle of distinctive 
features that indicate the articulatory configuration underlying the phonological 
segment,? as well as syllable-level representations (Stevens, 2002; Halle, 2002; Lahiri 
& Reetz, 2002; F?ry & van de Vijver, 2004; Archangeli & Pulleyblank, 1994; Kabak 
& Idsardi, 2007).  Sign languages also have sublexical units that are organized in a 
hierarchical way.  Parameters of signs include handshapes, locations, movements, 
orientations, and non-manual features.  Bundles of these features combine to form 
94 
signs, with internal structure based on features, segments, and syllables (Liddell & 
Johnson, 1989; Perlmutter 1992; Brentari, 1998), although modality impacts how this 
set of primitives can be fractionated by time.  While theses analyses on sign language 
differ on the structural organization of the sublexical features, they all agree that signs 
have sublexical structure. 
The hierarchical organization of linguistic representations motivates a multi-
 time resolution implementation for sensory processing, where analyses in short and 
long time-scales occur in parallel.  This model relies on the concept of temporal 
integration windows, as has been described by previous sections.  In speech, this 
approach is supported by evidence from psychophysics, electrophysiology, and 
functional imaging.  Describing what determines temporal integration windows, 
Schroeder et al. (2008:109) write: ?Because neuronal oscillations cover a wide 
frequency spectrum, from well below 1 Hz to well over 200 Hz, they enable the 
integration of inputs on many biologically relevant time scales.?   
 
95 
 
Figure 18. Reproduced from Schroeder, Lakatos, Kajikawa, Partan, & 
Puce (2008), this figure illustrates the hierarchical coupling of neural 
oscillations.   
 
In speech, time-scales approximately 20 ? 80 ms and 150 ? 300 ms 
correspond to the duration of segments and syllables, which may be reflected by 
activity in gamma and theta bands.  The concurrent analysis of segments and syllables 
may be possible by the phase-amplitude coupling of oscillations in these two 
frequencies, which are prominent rhythms in the primary auditory cortex (Lakatos, 
Shah, Knuth, Ulbert, Karmos, & Schroeder, 2005).  The involvement of delta (1 ? 3 
Hz) oscillations in this hierarchical coupling may be tied to the rhythms of prosodic 
intonations in speech (Munhall, Jones, Callan, Kuratate, & Vatikiotis-Bateson, 2004).  
Functionally, processing of these temporal signals has been shown to be subserved by 
the superior temporal gyrus (STG) for high frequency fluctuations and the superior 
temporal sulcus (STS), with right hemispheric bias, for longer-duration signals 
(Boemio, Fromm, Braun, & Poeppel, 2005).   
At the algorithmic level of description, which specifies the procedure for 
mapping sensory signals to linguistic representations, Poeppel et al. (2008) adopt an 
96 
analysis-by-synthesis model, where perception is driven by internal guesses about the 
upcoming representations (Halle & Stevens, 1959, 1962; Stevens & Halle, 1967; 
Yuille, & Kersten, 2006).  In this view, the perceptual system does not simply wait 
for the input to be completed before trying to map the signals to representations.  
Based on the previous segment or a minimal amount of the current signal, the system 
may form predictions about the possible inputs that follow, which is a point of 
comparison to the input signal.  In particular, phonological knowledge about how 
sounds sequence in a given language may be one basis for making such predictions 
(Hwang, Monahan, & Idsardi, 2010).  I will assume that knowledge about the rate at 
which linguistic representations are generated and the constraints for how those 
representations are constructed is an important foundation for both speech and sign 
language perception, and that this is a modality-independent aspect of language 
processing. 
These three levels of analysis (computational, algorithmic, and 
implementational) for speech perception are thus more broadly applicable to sign 
language processing.  This approach to language research underscores the importance 
of integrating knowledge about the representations of linguistic units, how they 
combine, the time-scales at which these representations unfold, and the temporal 
constraints for supporting perceptual processes at the cognitive and neural levels. 
 
97 
3.2 Bellugi & Fischer (1972) revisited: Beyond the rate of 
signs 
 
Convergence of the rate of propositions (sentences) in English and ASL 
despite the discrepancy in the rate of words and signs (Bellugi & Fischer, 1972) 
presents an interesting puzzle that raises questions about the rate of language-internal 
computations and what grammatical properties arise due to modality and temporal 
processing constraints.  In the first comparison of rates in sign language and spoken 
language, Bellugi and Fischer had three hearing bilingual CODAs narrate a story that 
they knew well.  Before starting the rate analysis, the researchers subtracted the times 
taken for pauses.  When the story was told in English, the mean duration of 
propositions was 1.27 s, and on average 4.7 words were produced per second.  When 
the story was told in ASL, the mean duration of propositions was 1.47 s, and on 
average 2.36 signs were produced per second.  This means that signs are produced on 
average at 423-ms cycles.  Although no statistical tests could be made with only 3 
participants, the comparison of these rough numbers suggest no difference in 
magnitude for propositions but that the ratio of words to signs is 2:1 .   
Propositions may be considered simple sentences or clauses, and they were 
measured in Bellugi and Fischer?s study by counting all main verbs or predicates that 
had overt or covert subjects.  From the description of the methodology, it appears that 
words were counted according to orthographic convention, and contractions like 
don?t  and it?s were each counted as one word.  It appears that signs were counted 
according to intuitions about what counts as whole signs for native signers.  Signs are 
regarded as complete bundles of features, including handshape, orientation, location, 
98 
and movement.  For example, the basic sign for LOOK can be varied to mean ?YOU-
 LOOK-AT-ME?, ?EVERYONE-IS-LOOKING-AT-ME?, ?THEY-LOOK-AT-EACH-
 OTHER?, and ?GAZE-AT-ONE-ANOTHER-LIKE LOVERS? depending on the 
number of hands, the orientation of the hands, the movement of the hands, and non-
 manual features.  Even though these constructions convey complex meaning, each 
was counted as a single sign. 
These criteria already suggest that words and signs are not equivalent units in 
sentential context and that one sign of ASL may be equivalent to several words in 
English.  Bellugi and Fischer discuss three possible reasons for why the propositional 
rates in these two modalities converge despite the apparent differences in the rates of 
words and signs: 1) doing without, 2) incorporation, and 3) body movements and 
facial expression.  Sentences in ASL can convey unambiguous meaning with fewer 
items than English.  They note that ASL uses ?denser constructions,? which is 
illustrated by examples below. 
English ASL 
and I went back into the kitchen (7) RETURN TO KITCHEN (3) 
So they came in (4) ENTER (1) 
I turned on the gas (5) ME TURN-ON G-A-S (3) 
I pulled open the drawer (5) I PULL-OUT-DRAWER (2) 
And I struck the match (5) AND STRIKE-MATCH (2) 
Until I finally decided to go through the 
gate. (9) 
UNTIL DECIDE GO-THROUGH GATE 
(4) 
OK, so they got off the streetcar (7) AND ARRIVE GET-OFF TRAIN (4) 
Table 2. Examples are adapted from Bellugi & Fischer (1972).  These 
pairs of sentences demonstrate differences between English and ASL 
constructions. 
99 
 
Bellugi and Fischer remark that ASL lacks redundancy because elements that are not 
essential to convey the message are deleted.  In English, it is possible to reduce 
redundancy by replacing proper names with pronouns.  In ASL because information is 
preserved by reference to points in signing space, these nouns can often be 
eliminated.  Because signs can incorporate location, number, manner, and shape/size 
features, a single sign can involve many layers of information.  Through non-manual 
features, it is possible to layer information across a sequence of signs, so that 
equivalents for ?I understand? and ?I don?t understand? take the same amount of time, 
where a head shake during ?UNDERSTAND? changes the meaning to negation.  
Thus, the grammatical property of ASL plays a key role in its temporal properties.  
Bellugi and Fischer (1972:199) write, ?It seems to us that (this) condensation [in 
ASL] may be a response to pressure when the rate of articulation of the language is so 
different from speech... [ASL] has special ways of compacting and incorporating 
linguistic information that, because of its nature, are different from spoken language.?  
Bellugi and Fischer entertain the possibility that propositional rate in English and 
ASL were the same in their study because the participants were hearing bilinguals.  
However, in a different study with native deaf signers, Klima and Bellugi (1979) 
found that the rates of signs and propositions were overall similar to the ASL findings 
from CODAs.   
A form of communication that uses the visual-gestural modality while 
following the grammar of English offers a unique perspective on the interaction of 
grammar and modality.  These systems are referred to as Manually Coded English 
(MCE), which were developed to make English visible for deaf children (Ramsey, 
100 
1989).  However, findings that MCE cannot be learned naturally suggest that 
processing difficulties occur when grammatical structure based on a spoken language 
is imposed upon sign language (Supalla & McKee, 2002).  One form of MCE called 
Signing Exact English (SEE 2) adapts ASL signs, which serve as roots.  Invented and 
borrowed signs are added as functional morphemes or words, most of which are 
added linearly as in English.  88% of affixes in SEE 2 have full sign formational 
structure by having movement and thus are ?sign-like.?  In ASL, IMPROVING and 
IMPROVEMENT involve a modification of the sign for IMPROVE, where all three 
are single signs.  In IMPROVING and IMPROVEMENT, the inflections overlap with 
the root.  In SEE 2, IMPROVING and IMPROVEMENT involve the sequence of two 
signs, IMPROVE plus an affix.  In cases where a sequence like KNOW and -ING can 
be assimilated, so that there is only one movement, the duration of the form is cut in 
half compared to the unassimilated form.  However, the resulting form is not a 
possible sign because of the relationship of the handshapes in KNOW (with B-
 handshape) and -ING (with the I-handshape) (Battison, 1978).  In many cases, ?MCE 
morphology does not meet the constraints on sign structure? (Supalla & McKee, 
2002:156).  Giving into time pressures can lead to phonological-ill formedness, which 
leads to other aspects of processing difficulty or unnaturalness.  In terms of rate, 
Klima and Bellugi (1979) report that the average length of propositions signed in 
MCE is 2.8 seconds, which is almost double the duration found in ASL and English.  
Thus, it appears that the temporal properties of MCE also violate more general 
constraints on language processing than just grammar. 
101 
Grosjean (1979) compared the rates of English and ASL, where participants 
were asked to speak or sign at different subject rates.  At the normal rate, signers 
produced on average 1.94 signs per second and speakers produced 4.57 words per 
second, similar to the findings from Bellugi and Fischer (1972).  Wilbur (2009) also 
reports that signers produced an average of 1.95 signs per second in normal 
conditions and 2.43 signs per second in fast conditions.  Taking into consideration the 
amount of pauses in natural production, both studies also report that signers spend 
more time articulating than speakers.  In other words, a higher percentage of the time 
spent narrating a story was filled with pauses for speaking than signing.  In a different 
experiment, Klima and Bellugi (1979) had signers and speakers produce a list of 
monomorphemic signs or words at a rate of one per second.  They found that twice 
the amount of time during the one-second intervals is taken for signing than in 
speaking. Klima and Bellugi (1979:186) point out, ?One might imagine, therefore, 
that signed sentences and their underlying propositions might normally be stretched 
out in time periods longer than comparable propositions in spoken language.?  
However, because of the structural and discourse properties of ASL, as outlined by 
Bellugi and Fischer (1972), large mismatches are avoided. 
Although the average duration of signs reported in these studies is ~400 ? 500 
ms, it can depend greatly on context.  As previously described, Fischer et al. (1999) 
test the intelligibility of rate-compressed ASL sentences and single signs.  They 
compare the duration of five signs (ROOM, MOUNTAIN, APPLE, TELEPHONE, 
and FATHER) that occurred both in sentential and isolated context.  The average 
duration of these signs were 313 ms in sentences and 553 ms in isolation, 167 ms of 
102 
which was attributed to ?final hold? (Liddell, 1984).  Liddell (1978) analyzed the 
duration of same signs that appear in different sentence positions and syntactic 
functions.  The duration of signs are shortest in medial position (reaching as low as 
233 ms) and highest for topic signs in initial position (reaching as high as 600 ms).  
Moreover, Friedman (1974) compared the average duration of unstressed signs (367 
ms on average) and stressed signs (835 ms on average), where the longer duration of 
stressed syllables was attributed to longer ?holds,? similar to the findings of Fischer et 
al. (1999).  Measuring sign durations by taking the beginning and end of sign 
boundaries or by taking the length of an utterance and dividing it by the number of 
signs can also result in considerably different figures.  The reason is that there are 
transition times between the signs (see Figure 19).   
 
Figure 19. Reproduced from Brentari, Poizner, & Kegl (1995) (and 
Brentari (1998)), this figure demonstrates sign-internal and sign-
 external transitions in an ASL sentence.  The above sentence is 
WORD BLOW-BY-EYES MISS SORRY (?The word went by too 
quickly. I missed it, sorry?). 
 
103 
Wilbur and Nolen (1986) analyze the rate of syllables in ASL, which are 
phonological units that are composed of movements (M) and holds (H) (akin to 
vowels and consonants, respectively, in spoken languages1), following Liddell (1984) 
(see also Perlmutter (1992) for an account with movements (M) and positions (P)).  
As outlined by Sandler and Lillo-Martin (2006:218), the argument for syllables in 
ASL is as follows: ?1) There is a prosodic unit that organizes the timing of phonetic 
gestures, 2) there are constraints on the content of this unit, 3) it is referred to by 
rules, and 4) there is distributional evidence for the following saliency hierarchy: path 
movement > internal movement > location.? 
It has been proposed that most signs in ASL are monosyllabic (Coulter, 1982) 
and have the following configurations: HMH, MH, HM, or M, where movements are 
considered to be the nucleus of syllables (Perlmutter, 1992).  However, when signs 
are connected in sentences, transitional movements also occur between the signs.  
Some of these inter-sign transitional movements occur between lexical signs that 
have their own internal (intra-sign) movements or provide movements to signs like 
MOTHER and NOON that do not have their own lexical movements.  Taking these 
factors into account, Wilbur and Nolen (1986) provide a thorough analysis of 
syllables in ASL taken from natural conversations and prompted sentences.   
In conversational data where 889 syllables were measured among 3 signers, 
the mean duration of syllables was 250 ms.  This figure reflects the duration of 
syllables when inter-sign transitional movements were counted.  Surprisingly, the 
                                                
1 Consider also the opposite view, where vowels are understood to be steady states 
(= holds) and consonants as transition states (= movements). 
 
104 
range of syllable durations was 33 ? 1300 ms, and the total standard deviation was 
162 ms.  The shortest syllables were those with only movements (mean duration 195 
ms, standard deviation of 128 ms), and these occurred most frequently in the data.  
Overall, there was a negative correlation between syllable length and frequency, as 
found in spoken languages (Zipf, 1935).  Compared to initial holds, which were 74 
ms in duration on average, final holds were 156 ms long.   
Wilbur and Nolen (1986) note that the average duration of 250 ms for 
syllables in their study converges with Liddell?s (1978) findings that monosyllabic 
signs taken from sentential contexts are 233 ? 450 ms long.  However, Bellugi and 
Fischer (1972), Klima and Bellugi (1979), and Grosjean (1979) report that 
approximately 2 signs are produced per second, but they are careful to not make the 
claim that the average sign is 500 ms in duration, although generally it can be 
assumed that length (duration) of units correspond to the periods at which they are 
produced. Perhaps one way to reconcile these findings is to assume some 
combination of the possibility that many signs are multisyllabic or that many syllables 
are not part of the signs.  Coulter (1982) has argued that most signs in ASL are 
monosyllabic, which Wilson and Nolen acknowledge.  Wilbur (1986) has also argued 
that multisyllabic signs exist as bidirectional signs, which have two movements.  A 
sequence of movements is highly constrained, however, such that the second 
movement must be opposite of the first or a 90-degree rotation of the first (Supalla & 
Newport, 1978).  Other cases include reduplicated forms and some compounds.  
Although many compounds consisting of two signs fuse to become monosyllabic 
105 
(Liddell, 1984; Liddell & Johnson, 1986), others retain the syllable from each of the 
signs.   
As mentioned previously, Wilbur and Nolen (1986) take into account 
transitions between signs into their measurement of syllables. They observe, ?Signing 
differs from speech, where the sound stream may be discontinued while the 
articulators are in transition.  The hands cannot be made invisible while they make 
transition movement? (1986:273).  Among 889 syllables that they measured, 114 
syllables consisted of only transitions, where the mean duration of these transitions 
was 203 ms.  When examining syllables with both transitional movement and lexical 
movement (255 cases), it was found that the ratio of the movements was 1:1. Some 
transitional movements provide movements to signs that do not have their own lexical 
movements, and these occurred in 50 cases.  Overall, the total number of syllables 
with inter-sign transitional movements was 419, almost half of all the syllables. 
Whether the faster rate of syllables over signs comes from multisyllabic signs 
or syllables that do not have lexical content, these findings suggest that the ratio of 
syllables to signs is 2:1.  However, it may not be appropriate to draw this conclusion 
since sign rates were not analyzed by Wilson and Nolen.   
In a different experiment, when examining the duration of syllables of elicited 
sentences with phrasal or compound variants of signs, they found that the syllable rate 
is slightly slower (where the average syllable duration was 292 ms), which is 
attributed to the fact that these were not taken from natural conversations.  An 
example of a phrasal sign is FACE CLEAN which can have the literal meaning 
?clean face? as in ?He has a clean face.?  The same sequence can be used as a 
106 
compound, notated FACE-CLEAN, where the meaning is ?handsome? as in ?He is 
handsome.?  The ratio of sign to syllable was measured for conditions where these 
forms were produced in isolation.  In other words, inter-sign transitional movements 
were not relevant for this analysis.  The ratios were 3.13 syllables per sign for a 
combination of simple lexical items and 3.92 syllables per sign for compounds 
Wilbur and Nolen remark that a rate of 4 syllables per second is similar to 
syllable rates found in English (239 ms for unstressed syllables and 301 ms for 
stressed syllables), taking data from Adams (1979).  However, a more recent analysis 
of English from natural conversations (Switchboard corpus) reveals fasters rates 
where the mean duration of English syllables is 190 ms (Arai & Greenberg, 1998; 
Greenberg, Hollenback, & Ellis, 1996).  Other studies report that monosyllabic 
English words are approximately half the duration of monosyllabic ASL signs 
(Emmorey & Corina, 1993; Corina & Knapp, 2006; Capek, Grossi, Newman, 
McBurney, Corina, Roeder, & Neville, 2009). 
These syllables in adult sentence production can also be compared to the rates 
found for babbling during infancy, which is considered an important stage for 
phonological development.  As discussed in Chapter 1, the average syllable duration 
was found to be ~300 ms in speech babbling (Levitt & Wang, 1991; Dolata, Davis, & 
MacNeilage, 2008) and ~1000 ms in sign babbling (Petitto, Solowka, Sergio, Levy, & 
Ostry, 2004).  Petitto et al. (2004) found that non-linguistic gestures of sign-exposed 
babies move at a frequency of ~2.5 Hz and that the gestures of children who were not 
exposed to sign language input move at ~ 3 Hz.  If frequencies in babbling have any 
107 
parallels in adult production, they suggest that the rhythmic properties across 
modalities are notably different.   
Although this provides a good overview of the phonological temporal 
dynamics of ASL in addition to the rates of signs and propositions provided by earlier 
studies, no systematic patterns emerge except that the global rate across the 
modalities is the same.  Although the rate of words is double the rate of signs, since 
signs and words are not equivalent linguistic units, it is difficult to interpret the 
meaning of these results.  One possibility is that signs in ASL contain double the 
amount of information as words in English on average.  Individual signs can 
incorporate layers of information using nonconcatenative strategies, and additional 
information can be layered across phrases.  However, some of these nonconcatenative 
strategies, such as reduplication to show aspect on verbs, do lengthen the duration of 
signs.  Examples described earlier in this section demonstrate that certain signs may 
be much richer in morphology than single words of English.  However, without a 
systematic study of morpheme rates across sentences in both languages, it is difficult 
to determine to what degree simultaneous strategies make up for differences in word-
 sign rates quantitatively.  Moreover, Bellugi and Fischer (1972) note that ASL 
sentences can ?do with less? and may be less redundant than English.  Thus, despite 
the simultaneous strategies of ASL grammar, the existence of other tactics may 
suggest that layering information does not sufficiently meet time pressures in 
language processing. 
An analysis of morphemes (units of meaning) as well as an analysis of 
syllables (units of form) would be helpful in better understanding where and how 
108 
rates converge across modalities.  In the only study that I am aware of that reports the 
rate of morphemes in a sign language, Senghas and Coppola (2001) investigate the 
evolution of Nicaraguan Sign Language (NSL).  They describe the emergence of 
systematic spatial modulations in signing among individuals who were exposed to 
Nicaraguan Sign Language at different ages and also those who entered signing 
communities at different stages in the evolution of the language.  Overall, the first 
cohort (the generation that entered the signing deaf community before 1983) showed 
significantly less spatial modulation per verb in their natural production.  The second 
cohort showed almost double the number of spatial modulations, but only among 
those who entered the signing deaf community before the age of 10.  They also tested 
whether there was a link between the use of spatial modulations and overall fluency, 
which they measured as signing rate.  Signers from the second cohort who entered the 
communities before the age of 6 had the highest fluency rate of ~ 350 morphemes per 
minute (or 5 ? 6 morphemes per second).  Late learners in both cohorts had the lowest 
fluency, where morpheme rates were half in comparison.   
One conjecture is that spatial modulation emerges as a result of reaching some 
sort of upper limit of processing without simultaneous layering of information, a limit 
that is not rapid enough for full-fledged language processing. It is also possible that 
overall fluency and grammaticalization that leads to aspects of structure like spatial 
modulation develop together but not in a cause-effect relationship.  Insights from 
Abu-Shara Bedouin Sign Language (ABSL), another new sign language that is still 
continuing to develop, give indications that verbal agreement through spatial 
modulations takes time to mature and become grammaticalized (Aronoff, Meir, 
109 
Padden, & Sandler, 2004).  Aronoff et al. (2004:35) write, ?The lesson from ABSL is 
therefore that even the motivated morphology that we find in all established sign 
languages requires social interaction over time to crystallize.? It is surprising that 
with the availability of the visuo-spatial modality, such aspects of sign languages still 
require time to develop rather than being exploited immediately.  Although some 
aspects of verb agreement are found in rudimentary home sign (the first stage of 
language creation among deaf children who grow up without exposure to any 
accessible language input) (Goldin-Meadow, 1993) and even the older generation of 
NSL users, they lack the systematic and extensive use found among all mature and 
stable sign languages.   
Although the development of simultaneous strategies that do not exist in 
spoken languages is the focus here, it is important to note that it is not true that early-
 exposed young generation signers avoid all sequential strategies.  In a different study, 
Senghas et al. (2004) describe how only young-generation native signers discretize 
manner and path features of movement (e.g., ?rolling down?) in natural production, 
whereas others use the more gestural form of expressing these features 
simultaneously (Senghas, Kita, & ?zy?rek, 2004).  Since this aspect of segmentation 
and linearization were not present among older generations of signers, it serves as 
another example of language creation without rich input.   
 Although the focus of the study by Senghas and Coppola (2001) is on spatial 
modulation in NSL grammar, the analysis of fluency based on morpheme rates 
provides a unique insight about their possible interaction.  Unfortunately, information 
about the rate of signs, propositions, and syllables are not reported in the study.  What 
110 
remains unknown is the average rate of morphemes in a spoken language, and the 
relationship between morpheme and syllable rates in sign language and in speech.  
The underlying assumption about the temporal dynamics of signing has been 
that they are slow compared to the rapid movements of oral articulators and the fine 
structures of acoustic signals in speech. Meier (2002:8) summarizes this argument as: 
 To date, the articulatory factor that has received the most attention in 
the sign literature involves the relative size of the articulators in sign 
and speech.  In contrast to the oral articulators, the manual 
articulators are massive.  Large muscle groups are required to 
overcome inertia and to move the hands through space, much larger 
muscles than those required to move the tongue tip.  
 
However, a look at quantificational measures of velocities in sign and speech 
production demonstrates that the relationship between speed of the articulators and 
the grammatical differences is not straightforward.  Ostry and Munhall (1984) report 
that the average maximum velocity of tongue dorsum movements is in the order of 10 
cm/s .  In contrast, Wilbur (1999) reports that the peak velocity of signs are measured 
in the order of 300-400 cm/s when measured from diodes placed on the thumb and 
index finger recorded by cameras.  When comparing 2- and 3- dimensional traces of 
signing motion, Bosworth et al. (2010) find that 2D traces yield slightly slower 
figures for velocity (see Figure 20) (Bosworth, Dobkins, & Wright, 2010).  In their 
study, the mean velocity of movements was ~ 50 cm/s and the maximum speeds were 
~ 150 cm/s in sentence production.  The differences in the measurements between 
Wilbur and Bosworth et al. may be attributed to differences in equipment and 
distance to recording devices (WATSMART and Virtual Reality InterSense, 
111 
respectively) or individual variation.  In Bosworth et al. (2010), the maximum 
velocity of one signer was as high as ~ 300 cm/s.   
 
Figure 20. Reproduced from Bosworth, Dobkins, & Wright (2010), this 
figure demonstrates the 2D movement trace for an elicited sentence 
containing the sign KNOW. 
 
The maximum amount of displacement in tongue dorsum raising and falling is 
about 1 cm (Ostry & Munhall,1984).  In signing, Wilbur (2009) reports that the 
average amount of displacement is about 20-30 cm.  However, in the study conducted 
by Bosworth et al. (2010), a visual inspection of figures reveals displacements 
ranging 20 ? 150 cm, but mean values are not reported.  In speech, there is a reliable 
correlation between the amplitude (maximum distance) of the tongue dorsum 
movement and its maximum velocity.  Bosworth et al. report that since duration and 
displacement of movements vary linearly, relative speed of movements are kept 
constant.  It is not clear from these figures how to characterize and compare the 
speeds in production.  Although manual movements are executed at higher velocities, 
they also move greater distances.  Given the average peak velocity and distances of 
112 
movements in each modality, it would be useful better understand and compare how 
long an average movement takes in each modality. 
Although sign languages have rhythmic properties, they do not have a single 
predominant oscillator like the mandible in speech.  The movements of the mandible 
are relatively simpler, consisting only of raising and lowering.  In signing, multiple 
joints on the hands and arms can contribute to a wide range of motions, including 
extensions and rotations.  Although the timing of these movements associated with 
syllable units is rhythmic, they do not have the same cyclic property like the syllables 
in speech, except for rotations, trilled movement, and repeated movements.   
The use of non-manual features, such as mouthing and eye-brow movements, 
contribute to meaning and also display rhythmic properties (Baker & Padden, 1978; 
Wilbur, 2009).  When mouthing and signs co-occur, oral units entrain to sign 
syllables (Sandler & Lillo-Martin, 2006).  In many sign languages, these oral features 
are not optional but obligatory (Boyes-Braem & Sutton-Spence, 2001).  Lexically 
specified movements are synchronized with sign movements, but they do not co-
 occur with transition movements between signs.  In ASL, mouthings borrowed from 
English can co-occur with signs.  Meier (2008) report that in cases where the English 
word and ASL do not match in syllable count, the English word is restructured.  One 
example is the reduction of the mouthing for finish to fish because the sign FINISH 
has a single outward twist of the forearm.  Woll (2001) also describes the 
phenomenon of echo phonology in British Sign Language, where movement of the 
mouth and hands are synchronized and the manner of movements is matched.  
113 
Finally, the rate of letters in fingerspelling may provide some insights to the 
speed of fine-motor changes.  Quinto-Pozos et al. (2010) report that approximately 
7.5 letters can be produced per second (or 133 ms per letter) by a native signer.  
Although this may seem rather fast, the degree of coarticulation that takes place in 
fluent fingerspelling and dropping of letters in signs (for example, B-N-K for bank 
and M-P-H-E for morpheme) suggest that these articulation rates are subject to further 
time pressures.  Phonological reduction may be constrained by the average duration 
of signs and the transition time required between signs. 
 
3.3 Perspectives from information theory 
 
If the rate of signs per second is twice as slow as the rate of words in the 
equivalent measure of time but the overall propositional rate is the same, this suggests 
that an incremental sign contains more linguistic information than an incremental 
word in English.  Although these larger units take longer to produce, because 
information can be encoded simultaneously in ASL, each sign may contain similar 
amounts of linguistic information that is sequentially presented in the same amount of 
time (i.e., multiple words) in speech. 
In information theory (Shannon, 1951), information is described in terms of 
entropy, which is a measure of the uncertainly associated with a random variable and 
can be quantified by taking into account the number of values within a set and the 
probability of those values.  For example, calculating the information bit of a letter in 
English text takes into account that there are 27 characters (26 letters plus space) and 
114 
the probability of each letter.  Entropy is used to describe the average uncertainty of 
an information source, where the maximum entropy is achieved in the scenario where 
all letters occur with equal probability.  In contrast, redundancy quantifies the 
predictability of the language. Empirically, letters in English are rather predictable 
because of the differences in frequencies of the letters and the sequence of letters that 
are possible.  The entropy rate of English text is estimated to be 0.6 to 1.3 bits per 
letter (Shannon, 1951), and similar figures are reported in estimates of phonemes in 
speech (van de Laar, Kleijn, & Deprettere, 1997).  This is well below its maximum 
entropy, which is estimated to be 3-3.5 bits higher (Chong, Sankar, & Poor, 2009).   
Chong et al. (2009) apply a similar approach to sign language by analyzing 
handshapes of ASL.  Their list consisted of 45 different handshapes, 29 that have 
alphanumeric correspondence and 16 additional ones that are using in signing.  Data 
were collected from video logs (vlogs) found on the Internet and natural 
conversations that were videorecorded at a deaf school.  The frequencies of the 45 
handshapes were then computed in order to determine the empirical entropy of the 
handshapes and to compare them to the maximum entropy.  They report that the 
average entropy of a handshape is approximately 5 bits, which is not very different 
from a maximum possible entropy of 5.49 bits.  They write, ?Our findings suggest 
that a slow rate of sign production in ASL may be compensated for, at least in part, 
by a low redundancy of handshapes.?  Chong et al. speculate that speech requires 
higher redundancy (it is estimated that approximately half of the text in English can 
be predicted) because the auditory channel is noisier than the visual channel, but the 
basis for this assumption is not explained.   
115 
This conclusion suggests that ASL should be more sensitive to noise since it is 
less redundant.  However, at least three studies now suggest that ASL is more robust 
to temporal distortions than spoken languages.  Tweney et al. (1977) report that ASL 
is much more resistant to temporal disruptions compared to speech (Miller & 
Licklider, 1950).  Fischer et al.?s (1999) results show that even with compressions by 
a factor of 6, 20-40% of signs remain intelligible.  In Chapter 2, I demonstrated that 
ASL is much more resistant to local time-reversals than speech (Greenberg & Arai, 
2001). 
Chong et al. consider the possibility that although English is more redundant 
in the sequence of phonemes, ASL achieves redundancy by holding a handshape for 
longer periods of time.  One way to test whether these forms of redundancy are 
equivalent is to calculate information transfer rates, which is what I describe here.  In 
speech, it has been estimated that 10-15 segments are produced per second 
(Liberman, 1996).  This converges with findings that phonetic segments are on 
average 72 ms long (Arai & Greenberg, 1998), that there are on average 2.5 segments 
per syllable in English (Greenberg, Hollenback, & Ellis, 1996), and that syllables in 
English are on average ~200 ms long.  If each phoneme contains 1 bit of information 
on average, the information transfer rate is approximately 10-15 bits per second.  In 
ASL, each sign has at least one handshape and a maximum of 2 handshapes.  Bellugi 
and Fischer (1972) and Grosjean (1979) report that approximately 2 signs are 
produced per second.  If each handshape contains 5 bits of information on average, 
the information transfer rate is approximately 10-20 bits per second in regular 
signing.  Quinto-Pozos et al. (2010) find that 7.5 letters are produced per second on 
116 
average.  Since fingerspelled letters include only a subset of the 45 handshapes 
analyzed by Chong et al. (2009), the estimates for the information content of 
handshapes in only fingerspelling contexts would be lower than the estimate of 5 bits. 
Excepting fingerspelled words, the information transfer rate of English and ASL 
might be comparable based on a phonetic analysis.  Although Reed and Durlach 
(1998) estimate the information transfer rate differently, they reach the same 
conclusions about the equivalence of information transfer rates in spoken English and 
signed ASL. 
Chong et al. acknowledge that their analysis of entropy in ASL is incomplete 
because it does not take into account other phonological features that are essential to 
the identification of signs, such as location, orientation, movement and non-manual 
features.  Methodologically, it is more difficult to incorporate these features.  They 
explain that orientation has too few variations and that movement has too many.  In a 
study of categorical perception, Emmorey et al. (2003) find that phonemically distinct 
handshapes are perceived categorically but that phonemically distinct locations are 
not.  The categorical/discrete versus continuous/analogical aspects of signing is still 
not well understood (Liddell, 2003). Chong et al. also speculate that when 
combinations of handshapes and motions between the dominant and non-dominant 
hand are accounted for, greater redundancy would be found in ASL.  Depictive 
gestures in natural signing and the manipulation of classifier handshapes (Liddell, 
2003) pose extra challenges for determining what is the set of phonetic features in 
sign languages.  Nevertheless, the development of sign language corpora with 
annotations for phonological features will be essential to these investigations.   
117 
Entropy has also been applied to understand the amount of information 
contained in whole words in sentences (see Figure 21).  Given a sequence of words 
already encountered in a sentence, the following word is more informative if it is less 
predictable.  Sentence processing is highly sensitive to frequency effects, both at the 
lexical level and structural level (Hale, 2001).   
 
Figure 21. Reproduced from Hale (2001), this figure demonstrates how 
entropy (or ?surprisal?) fluctuates over the course of a sentence. 
 
Words that are the more frequent overall and more predictable in context have shorter 
phonological forms (Zipf, 1935; Manin, 2006).  Given the correlation between length 
of a form and information content, it is possible that this link applies cross-modally.  
Since signs on average take twice as long to produce than spoken words, they are 
expected to carry more information.  One proposal for sentence processing is that 
speakers are sensitive to the amount of information per unit (?information density?) 
comprising an utterance and try to maintain uniform information density across the 
utterance (Levy & Jaeger, 2007; Jaeger, 2010).  This hypothesis is motivated by a 
118 
principle in information theory that sending information at a constant rate is most 
efficient in noisy channels (Shannon, 1948; Genzel & Charniak, 2002).  When the 
error rate is minimal, it is assumed that information transfer close to the channel?s 
capacity is optimal.  Findings by Chong et al.?s (2009) may suggest that sign 
language processing is more efficient than speech. Across a sentence, some words 
have more information than others, such that there are ?peaks? and ?troughs? in 
information density.  These peaks and troughs are modulated to some degree by 
closed-class words that are highly frequent, are short in length, and make the 
categories of subsequent words more predictable.  In the future, it would be 
informative to compare information density patterns across different modalities since 
sign languages involve more simultaneous layering of information.   
In summary, understanding rates in natural language processing requires 
knowledge about the rate at which phonetic units are produced, the rate at which 
lexical units are produced, and the information content of each unit.  The kinematics 
of oral and manual articulators as well as the sensory pathways in audition and vision 
reveal considerable differences between the communication systems.  The entropy 
analysis by Chong et al. (2009) suggests that phonetic units in signing have much 
more information than units in speech, but an extension of their analysis to the 
duration of the signals suggests that overall information transfer rates may be 
comparable.  Although speech segments are more redundant than sign handshapes, 
sign handshapes may be as redundant over time.  An analysis of spoken sentence 
production shows that listeners are sensitive to the predictability of upcoming words 
and that speakers make phonological, lexical, and syntactic decisions based on the 
119 
information profile of the utterance (Hale, 2001; Levy & Jaeger, 2007).  
Psycholinguistic experiments show that signers are also sensitive to predictability in 
sentences and show similar neural correlates found in spoken languages, such as the 
N400 effect in electrophysiology (Neville, Mills, & Lawson, 1992; Capek, Grossi, 
Newman, McBurney, Corina, Roeder, & Neville, 2009).  From an information 
theoretic point of view, it remains unknown whether information density fluctuations 
in spoken sentence production (Figure 21) are similar in profile in signed utterances. 
Highly frequent closed-class words of English do not have direct phonological 
analogs in ASL. Peaks and troughs seen in sentences of English caused by these 
shorter words may also emerge in ASL as some parts of signs are more informative 
than others.  Alternatively, the differences between sequential and simultaneous flow 
of information across the modalities may reveal unique distributions of information 
density. 
 
3.4 Words, signs, morphemes, and syllables 
 
Since the findings of Bellugi and Fischer (1972), many questions still remain 
about the convergence of rates across languages and divergence of time properties 
based on modality and grammatical features.  The difference in word and sign rates is 
difficult to interpret because they may not be equivalent linguistic units.  Within 
spoken languages, the degree of complexity in words is represented by the analytic-
 synthetic continuum, where analytic languages have little to no morphological 
inflections on words (e.g., modern Chinese) whereas synthetic languages (e.g., West 
120 
Greenlandic) are known for their morphological complexity.  Modern English is 
considered to be closer to the analytic end of the spectrum.  Meier (2002) notes that a 
polysynthetic language like Navajo produces fewer words per minute than English.  
Thus, the rate of words in English should not be generalized as a property of all 
spoken languages.   
What has never been reported in these studies is the rate of morphemes in 
languages.  Even though fewer words were produced per minute in Navaho than in 
English, how do they compare in terms of morpheme rates?  How do the morpheme 
rates in these two spoken languages compare with ASL?  Brentari (2002) argues that 
the typological trend among sign languages is that signs are monosyllabic and 
polymorphemic (Table 3).  She also argues that polymorphemic and monomorphemic 
signs are typically not different in length.   
 Monosyllabic Polysyllabic 
Monomorphemic Chinese English 
Polymorphemic Sign languages West Greenlandic 
Table 3. Adapted from Brentari (2002), who describes the typological 
distribution of canonical word shapes. These assumptions are 
reexamined throughout the current discussion in Chapter 3 because 
they require an examination of syllable and morpheme rates and the 
ratio of these rates for languages.   
 
This does not appear to be true in spoken languages, where morphologically complex 
words tend to have more syllables and thus are longer than morphologically simpler 
words (for example, morphologically can be analyzed as having 4 morphemes and 4 
syllables, whereas simpler can be analyzed as having 2 morphemes and 2 syllables).   
121 
A universal property of all mature sign languages is the use of spatial 
modulations to mark agreement and the use of classifier constructions.  These forms 
result in great complexity of meaning but can phonologically resemble 
morphologically simpler signs (Brentari, 1995).  Other constructions where semantic 
information can be layered nonconcatenatively include numeral incorporation, 
aspectual modulations, nominal and verbal number, and adverbial modifications 
(Rathmann & Mathur, 2010).   
 
Figure 22. Adapted from Mathur & Rathmann (2011), this figure 
demonstrates an example of numeral incorporation in ASL. 
 
122 
 
 
Figure 23. Reproduced from Mathur & Rathmann (2011), this figure 
demonstrates the grammatical form for TEN DAY and the 
ungrammatical form TEN+DAY that would result with numeral 
incorporation.  The latter is believed to be not possible due to 
phonological constraints against complex movement. 
 
Rathmann and Mathur explain that although these cases are not universal and more 
flexible to change, they also contribute to increased semantic complexity of 
constructions.  The availability of space in sign language articulation does not blindly 
allow forms to be combined nonconcatenatively but rather are constrained by 
phonological and phonetic restrictions.  Finally, Napoli and Sutton-Spence (2010) 
attribute the limitation of 4 propositions that can be articulated simultaneously in sign 
languages to cognitive limitations, in particular visual short-term memory.   
When languages are described as being analytic or synthetic, this usually 
refers to morpheme:word ratios, where analytic languages are 1:1 and synthetic 
languages are several:1 .  Brentari (2002) classifies sign languages as being 
polysynthetic like West Greenlandic but argues that having the property of both 
monosyllabicity and polysynthecism is unique to sign languages (Table 3).  A better 
123 
understanding of these relationships requires a typological investigation of the ratio of 
syllables to morphemes. 
Bellugi and Fischer (1972) speculate that in addition to incorporation, body 
movements, and facial expression, which all involve how information is layered 
without sequential strategies, a possible explanation for the discrepancy in word and 
sign rates is that ASL can ?do without.? For example, a sentence in English like ?I ate 
an apple? would be translated in ASL to ?EAT APPLE?.  ASL (and all sign 
languages) allow pro-drop especially when arguments can be understood from 
context.  Moreover, ASL does not have phonologically expressed function words like 
?an?.  Finally, the past-tense information can also usually be understood from the 
context.  In this example, the equivalent of 4 words and 5 morphemes in English can 
be expressed with 2 signs.  This discrepancy cannot be attributed to the fact that ASL 
has more ?synthetic? qualities in this sentence than English.    
Taking into account previous work and the theoretical issues that arise, I have 
chosen to analyze the rates of words/signs, morphemes, and syllables in English, 
ASL, and Korean.  In addition to replicating an analysis of words/signs in English and 
ASL, an analysis of morpheme rate will lead to a better understanding of to what 
degree the combination of nonconcatenative morphology and ?doing without? leads 
to true discrepancies in the rate of lexical units in the languages.  Given that English 
and ASL are distinct in more ways than one, it is difficult to assess whether the 
differences in rates are attributable to modality or grammatical differences. I have 
chosen to include Korean in this analysis because it is also a pro-drop language and 
lacks some of the small functional words that exist in English.  A perfect comparison 
124 
would be between two natural languages that differ in modality but essentially 
identical in grammar, but this is impossible given that typological distinctions in 
grammar do seem to be divided by modality.  Although Manually Coded English was 
created to have these features, the fact that it cannot be learned naturally and is 
globally much slower than ASL suggests that grammatical properties of sign 
languages are essential for their realization in the visuo-spatial modality.  
Another point of interest is the syllable rate in these three languages to 
compare units of form to units of meaning.  The syllable rate has been measured for 
English by numerous studies but morpheme rates have never been calculated using 
the same data.  A comparison with syllable and morpheme rates in Korean contributes 
to a better understanding of what trends emerge by looking at typologically distinct 
spoken languages.  Korean is more synthetic than English, it is expected to have a 
lower word rate than English but to have more morphemes per word than English.  
Finally, this analysis of ASL builds upon the work of Bellugi and Fischer (1972) and 
Wilbur and Nolen (1986).  Wilbur and Nolen have provided the most thorough report 
of syllable production in ASL, describing the frequencies of different types of 
syllables and their lengths, and including sign-external transitions.  As a point of 
comparison, the analysis provided in this present work only counts syllables based 
intra-sign movements. 
 
125 
3.5 Rates in spoken languages: English and Korean 
 
Studies on the speed of speech have largely focused on the rate of words, 
syllables, or segments.  Here, the goal is to gain a better understanding of the rate of 
words, morphemes, and syllables for cross-linguistic comparison.  The first step in 
analyzing the rate of linguistic units in speech and sign language production was 
identifying appropriate materials.  With large bodies of data developed for automatic 
speech recognition, English had the most options, but it was important to also 
consider if comparable material was accessible for Korean and ASL.  Data from each 
of the three languages was collected from natural conversations.  When looking for 
English materials, one question that arose was whether it would make a difference in 
the results for rates to use a corpus with prompted sentences (TIMIT) (Garofolo, 
Lamel, Fisher, Fiscus, Pallet, & Dahlgren, 1993), which were already transcribed and 
phonetically annotated, or to use a corpus of natural telephone conversations 
(CALLFRIEND, Canavan & Zipperlen, 1996a) without any annotations.  Both 
corpora were accessed through the Linguistics Data Consortium (LDC) at the 
University of Pennsylvania. 
Whereas speech files in TIMIT comprise of individual sentences that are 
uttered in isolation, the speech files in CALLFRIEND comprise of full 30-minute 
telephone conversations between two individuals.  Before doing a rate analysis for 
CALLFRIEND (American English, corpus containing non-Southern dialects only), a 
set of 363 sentences were extracted from the telephone conversations, 3 sentences 
from 121 individuals from 60 conversations) for their 1) propositional completeness, 
126 
2) lack of long pauses/breaks, and 3) lack of errors and corrections mid-sentence.  
The boundaries of the sentences were determined by looking at the acoustic 
waveforms and spectrograms and measured for overall duration.  For TIMIT 
sentences, rather than blindly taking the duration of the speech files, sentences were 
also analyzed in a similar way by looking at the onset of the first phoneme and the 
conclusion of the last phoneme because the sentences were preceded and followed by 
a short period of silence.  188 unique sentences were chosen from the TIMIT corpus 
from 188 speakers of a non-Southern dialect (to better closely match the dialects 
found in CALLFRIEND).   
Words have been described as ?the free-standing unit that unifies form and 
meaning? (Sandler & Lillo-Martin, 2006:21), but as discussed previously, languages 
vary in their definitions of words, which range in complexity.  Due to the lack of 
consistent and linguistically well-motivated definition of words, here words were 
taken to be units marked by spaces in orthography.  As in Bellugi and Fischer (1972), 
contracted forms (don?t, it?s, wanna) were counted as single words.   
Morphemes are considered to be the smallest unit of meaning in language, but 
making judgments about morphemes is not always straightforward.  Debates about 
the decomposability of words have a long history (see Fiorentino (2006) for an 
extensive discussion).  I adopt the assumption that the lexicon involves structured 
representations and that morphological parsing is an early process of word 
recognition (Fiorentino & Poeppel, 2007).  Psycholinguistic experiments demonstrate 
different levels of analysis exist in word processing (Lehtonen, Monahan, & Poeppel, 
2011).  For example, a word like corner contains two morphemes in English ? corn 
127 
and -er ? but because they do not compose the meaning of the word, the word is 
considered to be made up of just one morpheme.  In on-line processing, evidence 
suggests that responses to semantically opaque words like corn and corner show 
differences to semantically transparent words like teach and teacher (Lehtonen, 
Monahan, & Poeppel, 2011).  Lehtonen et al. also show that although ?er  in corner  
is not a morpheme, because it is a possible morpheme in words like teacher, corn and 
corner are processed differently from pairs like broth and brothel, which only involve 
an orthographic overlap and no possible morphological decomposition.  This three-
 way separation of the data demonstrates the complexity of morphological processing.  
Although a word like corner may trigger morphological decomposition but which is 
rejected by subsequent analysis, unlike brothel, it is not analyzed in the same way as 
a word like teacher.  Corner and teacher share a decompositional stage of analysis, 
which succeeds for teacher and fails for corner. 
Although methodologies like priming studies provide a way to probe the 
psychological reality of a word?s subparts, it is not practical to apply them to every 
single word in a corpus.  Understanding the morphological structure of a word may 
also involve some knowledge about its etymology.  For example, could, would, and 
should are etymologically connected to can, will, and shall, but it is not clear whether 
native speakers decompose these words as having two parts (where ?ld was 
historically linked to a suppletive form for the past tense).  Another example is a word 
like height, the noun form of the adjective high (where ?t(h) is linked historically to a 
Germanic abstract noun suffix).  Because judgments about words such as these were 
not easy, both ?conservative? and ?liberal? judgments were made about morpheme 
128 
counts. Abbreviations like ESL were counted with 3 morphemes by the liberal count 
and 1 morpheme by the conservative count. 
Cases of irregular/suppletive forms were judged as being morphologically 
complex.  A word like didn?t was counted as having 3 morphemes: do-
 +past+negation.  The word been was counted as having 2 morphems: be+-en.  
Possessive pronouns like our was counted as having two morphemes: 
we/us+possessive.  This decision was made based on the pattern that -?s is a 
productive morpheme that is used with nouns.  When her was used as a possessive 
pronoun, it was counted as having 2 morphemes, but when it was used as an 
object/accusative pronoun, it was counted as having 1 morpheme.  This decision was 
made based on the pattern in English that case-marking is not productive and only 
used among pronouns.  
Syllables in words were also measured with two estimates for similar reasons, 
although making syllable counts were relatively easier than morpheme counts.  Few 
examples that posed some difficulty include interesting ? which was perceivable as 
having 3 or 4 syllables, actually ? which was perceivable as having 3 or 4 syllables, 
several ? which was perceivable as having 2 or 3 syllables, and you?re ? which was 
perceivable as having 1 or 2 syllables. 
Two researchers coded the data from English, where each researcher coded 
approximately 50% of the sentences from each corpus.  These two researchers 
worked together with frequent discussions to support consistency, but at this current 
time, inter-rater reliability has not been assessed. 
129 
In all of the following figures (from English, Korean, and ASL), results are 
shown in density plots (using R 2.8.1, R Development Core Team (2005)), which 
estimates the probability density function of the underlying variable.  The kernels in 
these density plots represent the data (length, syllables per second, etc.) from each 
sentence.  
 
Figure 24. Estimated probability density functions for the length in 
seconds of sentences in two corpora of English: TIMIT (prompted) 
and CALLFRIEND (conversational).  
 
A comparison of sentences from TIMIT and CALLFRIEND for English show 
that rates are significantly different in prompted speech and natural conversational 
speech.  The following figures include ?conservative? measures of morphemes and 
syllables. 
130 
 
Figure 25. Estimated probability density functions for words rates (words 
per second) of sentences in two corpora of English: TIMIT (prompted) 
and CALLFRIEND (conversational).  
 
 
131 
 
Figure 26. Estimated probability density functions for syllable rates 
(syllables per second) of sentences in two corpora of English: TIMIT 
(prompted) and CALLFRIEND (conversational).  
 
132 
 
Figure 27. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences in two corpora of English: 
TIMIT (prompted) and CALLFRIEND (conversational).  
 
As may be expected, rates were overall much faster in the natural 
conversational speech than prompted speech (Figures 25, 26, and 27). A calculation 
of the mean average syllable rate (on conservative-liberal estimates) reveals a slower 
articulation in TIMIT (~5.0-5.1 syllables per second) than CALLFRIEND (~6.1-6.2 
syllables per second).  In addition to the inherent difference between producing self-
 generated sentences with a communicative partner and reading unfamiliar sentences, 
other reasons for these differences could be attributed to 1) the oddness of the 
semantic content of TIMIT sentences and 2) the presence of more low-frequency 
words in TIMIT.  In CALLFRIEND sentences, words were produced approximately 
133 
4.7 words per second (mean average), which is similar to the results of Bellugi and 
Fischer (1972), where stories were narrated by 3 individuals and pauses were 
excluded from analysis.  In contrast, the rate in TIMIT is 3.1 words per second. An 
examination of the ratio of syllables to words reveals that TIMIT contained words 
that had longer phonological forms (mean average ~1.6 syllables per word in TIMIT 
compared to ~1.3 syllables per word in CALLFRIEND).  Although a frequency 
analysis was not conducted for words in TIMIT and CALLFRIEND, the trend that 
more frequent words have shorter phonological forms (Zipf, 1935; Manin, 2006) 
suggests that the words in CALLFRIEND are more highly frequent.   
Morpheme rates were also overall slower in TIMIT than in CALLFRIEND.  
The mean average (on conservative-liberal estimates) was 4.4-4.8 morphemes per 
second in TIMIT and 6.1-6.4 morphemes per second in CALLFRIEND.  An 
examination of the ratio of morphemes to words reveals that TIMIT (1.4-1.6 
morphemes per word) contained words that were more morphologically complex than 
CALLFRIEND (1.3-1.4 morphemes per word).   
Finally, in both corpora there is approximately a ratio of 1:1 between 
morphemes and syllables.  In TIMIT, which presumably contains lower frequency 
lexemes (a word-like unit that are used to represent all variations of the word in 
usage) and each morpheme contains more phonological content, the ratio is slightly 
lower than 1:1, and in CALLFRIEND, the ratio is slightly higher than 1:1. 
The mean average duration of the sentences taken from CALLFRIEND was 
2.37 s.  The mean duration of syllables was calculated by dividing the duration of the 
sentences by the number of syllables in the sentences.  The mean duration of syllables 
134 
was approximately 162 ms.  This is somewhat faster than 190 ms reported by 
Greenberg et al. (1996).  This could be attributed to at least two reasons: 1) a 
difference in the corpora used (CALLFRIEND versus Switchboard, where two 
individuals discuss a specific topic for several minutes) and 2) this study only chose a 
small subset of the most fluent sentences in CALLFRIEND for analysis, whereas 
Greenberg et al. used data from full conversations that contained filled pauses and 
misarticulations.   
As explained previously, Korean was chosen for analysis because it is a 
spoken language that is typologically different from English by being 
morphologically more complex, being a pro-drop language, and having fewer small 
functional words (like a and the) and thereby being grammatically more similar to 
ASL.  A Korean version of the CALLFRIEND corpus (Canavan & Zipperlen, 1996b) 
with Yale Romanized transcription is also available through the LDC.  378 sentences 
were extracted from 128 speakers following the same criteria as used for English 
(fluency, lack of errors and corrections mid-sentence, and lack of long pauses/breaks).  
Again, words were counted based on orthography.  In other words, words were 
equivalent to eojeols, which are the spacing units in Korean orthography.  The 
Romanization used periods (.) to mark syllable boundaries and spaces to mark word 
boundaries.  Similar to English, words in Korean are taken to be free-standing units 
that can vary in morphological complexity.  Conservative and liberal estimates of 
morpheme and syllable counts were measured for each word.  For example, the topic 
form of the second person pronoun is ne.nun (?you-TOPIC?) with 2 syllables but it is 
often reduced to nen and perceivable as 1 syllable in fast speech.  Case markers were 
135 
always counted as morphemes.  Examples of words that had different conservative 
and liberal morpheme counts were hak.kyo (?school?), which was counted as 
consisting of either 1 or 2 morphemes, and pi.ngwus.ta (?to mock?), which was 
counted as consisting of either 2 or 3 morphemes.   
The data from Korean was coded by one researcher, and at this current time, 
inter-rater reliability has not been assessed. 
The results from Korean (as compared to conversational data of English) are 
as follows.  The sentences that were extracted from the two corpora were similar in 
length (Figure 28). 
 
Figure 28. Estimated probability density functions for length in seconds of 
sentences from conversational data in English and Korean. 
136 
 
Figure 29. Estimated probability density functions for word rates (words 
per second) of sentences from conversational data in English (a more 
analytic language) and Korean (a more synthetic language). 
 
As predicted, Korean had a lower rate of words per second because Korean is 
more synthetic than English (Figure 29).  The results show that the mean average rate 
is 3.1 words per second (compared to 4.7 words per second in English).  However, 
this does not mean that Korean is slower than English.  The mean syllable rate was 
7.2-7.3 (conservative-liberal) per second, which is slightly higher than English (6.1-
 6.2 syllables per second) (see Figure 30).  The mean duration of Korean syllables was 
approximately 138 ms (compared to ~162 ms in English). This may be attributed to 
the fact that English allows consonant cluster onsets and codas, whereas syllables in 
Korean are simpler.  For example, a long syllable in English like script (CCCVCC), 
would have to be pronounced with 4 syllables ([s?k?r?pt?] = CVCVCVCCV) with 
137 
epenthesized vowels in Korean.  Japanese, like Korean, has simpler syllable 
phonotactics than English, and Arai and Greenberg (1998) show that the mean 
average of syllables in Japanese are slightly shorter than in English. 
 
Figure 30. Estimated probability density functions for syllable rate 
(syllables per second) of sentences from conversational data in 
English and Korean. 
 
138 
 
Figure 31. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences from conversational data in 
English and Korean. 
 
The mean morpheme rate was 5.8-6.0 per second in Korean (compared to 6.1-
 6.4 per second in English) (see Figure 31).  As a language that is more synthetic than 
English, Korean was expected to have a higher ratio of morphemes to words than 
English.  Results show that on average, there are 1.9 morphemes per word in Korean 
(compared to 1.3-1.4 morphemes per word in English).  An examination of the ratio 
of syllables to words (~2.3 syllables per word) reveals that Korean has words 
containing more syllables (compared to 1.3 syllables per word in English).  Finally, 
the ratio of morphemes to syllables is 1:1.2, which is slightly lower than the 1:1 ratio 
found in English.   
139 
Although English and Korean are typologically distant languages, similar 
trends emerge.  The main difference between the languages is in the word rate.  
However, when looking at the smallest unit of meaning, both show rates of 
approximately 6 morphemes per second.  Although the syllable rate is slightly faster 
in Korean and the morpheme to syllable ratio is slightly lower in Korean, this is most 
likely due to the simpler syllable structure in Korean.  Although Korean does not 
have small functional words like a and the in English, it has case markers on nouns 
and also richer morphology on verbs, resulting in the morpheme rates to closely 
converge.  Overall, the ratio of morphemes to syllables in both languages is 
approximately 1:1. 
 
3.6 Rates in sign language: ASL revisited 
 
The goal of this study was to replicate previous work that have examined the 
rate of signs in natural ASL production and extend the analysis to morphemes and 
syllables within the signs.  Sentences that matched the fluency criteria used for 
English and Korean were taken from natural conversations of ASL collected by Ceil 
Lucas and colleagues.  Lucas?s corpus was filmed in the 1990s to study 
sociolinguistic variations of ASL across the United States.  The videos involve free 
conversations among deaf participants who already know each other and interview 
sessions with a researcher.  The free conversation sessions were recorded without the 
presence of any researcher.  In the interviewed segments, a deaf African-American 
researcher moderated groups composed of deaf African-American participants.  For 
140 
the purposes of this study, 179 sentences were taken from 21 participants who are 
native ASL users.   
Sign language linguistics students identified a set of full, fluent sentences 
within the conversations, which were labeled using ELAN software.  These research 
assistants were instructed to use their intuition about the beginning and end of 
sentences by doing a frame-by-frame analysis on the first and last signs.  Group 
discussions and viewing of the videos supported consistency in the data, but at this 
current time, inter-rater reliability has not been assessed. 
Each sign in a sentence was first given an English gloss, and sign rates were 
calculated based on these glosses.  For each sign, annotation tiers were then created 
so that the number of morphemes and syllables could be counted.  Morphemes were 
counted in two ways ? with a ?conservative? or ?liberal? estimate.  Before starting the 
annotation process, it was decided that plain/uninflected verbs like LIKE and HAVE 
would be counted as having 1 morpheme, agreement/indicating verbs such as SHOW 
and ASK would be counted as having 2 morphemes (one for the root and one for 
agreement), and that spatial/locative verbs such as PUT and DRIVE would be 
counted as having 2 morphemes (one for the root and one for movement).  On these 
verbs, aspectual marking was counted as one morpheme, and aspectual marking that 
showed number was counted as having an addition morpheme.  Depiction verbs were 
counted as having 2 morphemes, one for the classifier handshape and one for 
movement.  Although these criteria were decided before the annotation process, the 
vast majority of the verbs found in this set of sentences were plain and uninflected. 
141 
Possessive pronouns were counted as having 2 morphemes, one for the palm 
orientation for indexation and one for the open handshape marking possession.  Facial 
inflections that were used in questions were counted as one morpheme.  An 
expression with noun incorporation such as TWO-MONTHS was counted as having 2 
morphemes.  The sign for TWO-OF-US was counted as having 2 morphemes.  The 
sign for AGE-THREE was counted as having 2 morphemes.  The sign for EVERY-
 FRIDAY, where the sign for FRIDAY is held in downward movement, was counted 
as 2 morphemes.  There were two cases when the researchers could not identify the 
sign of short gestures, and these gestures were labeled ?gesture? and counted as one 
morpheme each.   
Liberal versus conservative estimates were used in cases where the etymology 
of a sign was known to be a compound.  For example, the sign for HOME evolved 
from the combination of the sign for EAT (contact at the chin) and BED (contact at 
the cheek).  HOME was counted as 2 morphemes in the liberal estimate and 1 
morpheme in the conservative estimate.  The sign for WIFE was counted as 2 
morphemes (WOMAN+MARRY) in the liberal estimate and 1 morpheme in the 
conservative estimate.  The sign for TEACHER is traditionally considered to consist 
of 2 morphemes, one for TEACH and one for an ?-er?-like affix that is linked with the 
sign for PERSON.  In natural signing, TEACHER is signed with one fluid motion 
where separate components for TEACH and PERSON become hard to distinguish.  
Thus, TEACHER was counted as having 2 morphemes in the liberal estimate and 1 
morpheme in the conservative estimate.  The sign for PARENT is the combination of 
the signs for MOTHER and FATHER.  PARENT was counted as having 2 
142 
morphemes in the liberal estimate and 1 morpheme in the conservative estimate.  
Fingerspelled words consist of a sequence of letters, each of which represents the 
letter but as a whole also represents a word.  The sign for HIGHSCHOOL, which is a 
sequence of H and S, was counted as having 2 morphemes in the liberal estimate and 
1 in the conservative estimate.  For all fingerspelled words, the liberal estimate was 
the number of letters and the conservative estimate was 1.   
Syllables were counted based on the number of movements that occurred 
within the sign and were based on how they were produced in the video, not citation 
forms.  For example, SCHOOL was sometimes produced with 1 or 2 movements (1 
or 2 syllables).  Each token was labeled the way it was produced. In another case, the 
sign for HERE was signed with 1 syllable in one sentence, and when it was 
emphasized, it was signed with 3 syllables.  As has been discussed in the sign 
language literature, the majority of signs in these sentences were monosyllabic.  
Examples of disyllabic signs that occurred in this set of sentences included CANCEL 
and NEVER.  In ASL, nominalization of verbs can be achieved through 
reduplication.  As an example, the sign for AIRPORT was the reduplicated version of 
FLY and was produced with 2 syllables.  In cases where a sign involved more than 
one syllable, it was usually through repetition of a movement, such as in 
SOMETIMES, VACATION, WORK, FEEL, YOUNG, and TECHNOLOGY.  These 
reduplicated movements are usually produced in a restrained manner.  The sign for 
SIGN was produced with 2 syllables in some cases, and there was one token when it 
was produced with 4 syllables.  When the W handshape was waved three times for 
WEDNESDAY, it was counted as 3 syllables, and when the M handshape was waved 
143 
two times for MONDAY, it was counted as 2 syllables.  Syllables in fingerspelled 
words were generally counted by the number of transitions between the letters but 
were sometimes lower because of coarticulation of letters.  A gesture that was used to 
indicate ?HEART-POUNDING? was produced with 8 syllables.   
The results from ASL are presented together with English and Korean in the 
following discussion and figures (Figures 32, 33, and 34). The results show that ~2.3 
signs are produced per second, replicating the findings from Bellugi and Fischer 
(1972), Grosjean (1979), and Klima & Bellugi (1979) for ASL (see Figure 33).  The 
main reason word rates are compared here is that previous studies have given much 
attention to differences between English and ASL at this level.  However, languages 
have different definitions for what a word is as a unit, and here, word rates do not tell 
us much about modality-based differences.  As discussed earlier for English and 
Korean data, two spoken languages also show significant differences in their word 
rates.  An analysis of a more synthetic spoken language, such as Navajo or West 
Greenlandic, is predicted to show more similar rates to ASL.   
144 
 
Figure 32. Estimated probability density functions for length in seconds of 
sentences from conversational data in English, Korean, and ASL. 
145 
 
Figure 33. Estimated probability density functions for word/sign rates 
(words or signs per second) of sentences from conversational data in 
English, Korean, and ASL.  This comparison word and sign rates 
replicate the findings from Bellugi & Fischer (1972) for English and 
ASL.  A comparison with Korean demonstrates that word rates 
depend on grammatical properties of the language.
146 
 
Figure 34. Estimated probability density functions for syllable rates (syllables per 
second) of sentences from conversational data in English, Korean, and ASL. 
Syllables rates in ASL may be the basis for the temporal integration window 
of ~250-300 ms found in Experiment 1 in Chapter 2. 
 
147 
 
Figure 35. Estimated probability density functions for morpheme rates 
(morphemes per second) of sentences from conversational data in 
English, Korean, and ASL.  This figure demonstrates that English and 
Korean, two spoken language with distinct grammars, have the same 
morpheme rate (~6 per second), in contrast with the morpheme rate 
in ASL (~3 per second). 
 
The difference between faster English word and slower ASL sign rates have now 
been discussed widely in the literature, along with speculations on how simultaneous 
encoding of information (through greater morphological complexity) of ASL signs 
and more condensed ways of expressing meaning (through ?doing without?) may 
contribute to similar propositional rates.  However, in order to test these assumptions, 
analysis of morpheme rates in these languages is necessary. 
By liberal counting methods, morphemes were produced at ~3.0 per second, 
and by conservative counting methods, morphemes were produced at ~2.5 per 
148 
second. These results were surprising given that the rates in English and Korean were 
both approximately 6 morphemes per second, and Senghas and Coppola?s (2001) 
analysis of rates in Nicaraguan Sign Languages, where 5 ? 6 morphemes per second 
are reported among native signers.  A detailed discussion about the theoretic and 
methodological considerations for why morpheme rate estimates are considerably 
lower is given in the following conclusion section.  However, the present results 
suggest that strategies for ?doing without? may play a bigger role than simultaneous 
morphology to reach the same propositional rates across modalities.   
To test Brentari?s (2002) assumptions presented in Table 3, when comparing 
the ratio of morphemes to words, there are approximately 1.3-1.4 morphemes per sign 
in ASL.  This is the same ratio that was found in English (also 1.3-1.4 morphemes per 
word) and slightly lower than 1.9 morphemes per word found in Korean.  
As explained by Wilbur and Nolen (1986), the articulators in signing cannot 
be hidden while in transition from one sign to another.  Since Wilbur and Nolen 
already provide a thorough analysis of syllables from an articulatory point of view 
where all types of movements were measured, here, only intra-sign movements were 
counted to provided syllable rate estimates.  Thus, the mean average number of 
syllables per second was predicted to be lower than that reported by Wilbur and 
Nolen (~ 4 syllables per second).  In this study, approximately 3.1 syllables were 
produced per second.  Thus suggests that approximately 25% of the movements 
during sentence production do not contribute to the articulation of signs.   
Similar to the time-scales seen in morpheme rates, among these three 
languages ASL has a significantly slower syllable rate than English or Korean. These 
149 
results are consistent with other studies that report that monosyllabic English words 
are approximately half the duration of monosyllabic ASL signs (Emmorey & Corina, 
1993; Corina & Knapp, 2006; Capek, Grossi, Newman, McBurney, Corina, Roeder, 
& Neville, 2009). 
When examining the ratio of syllables to signs, it was found that there are 
approximately 1.4 syllables per sign.  When examining the mean average of the ratio 
of morphemes to syllables, a liberal morpheme count resulted in an average estimate 
of 0.96 morphemes per syllable and a conservative morpheme count resulted in an 
average estimate of 0.81 morphemes per syllable.  In other words, these mean average 
ratios are very similar in range to the values found for English and Korean. 
150 
 
 
Figure 36. The comparison of morpheme:syllable ratios in English, 
Korean, and ASL suggests that the globally, morphemes and syllables 
are processed at approximately the same rate.  However, the results 
from ASL are different from spoken languages in that the ratios reveal 
a trimodal distribution.  This may be attributed to properties unique to 
sign languages, such as productive use of reduplication (resulting in 
ratios lower than 1:1) and productive use of spatial modulations 
(resulting in ratios higher than 1:1), in addition to simple signs.   
 
However, as seen in Figure 36, the ratio in ASL shows a unique tri-modal 
distribution for sentences where sometimes the rate is lower than 1:1 and other times 
when the ratios are higher.  Sign languages differ from spoken languages by having 
productive use of reduplication, where a sign can be repeated multiple times, and 
allowing more compacting of information through simultaneous strategies.  Despite 
151 
these varied options, the ASL follows the pattern of English and Korean where global 
rates of morphemes and syllables are approximately the same.  
The need for expanding the sample size of ASL sentences and assessing the 
inter-rater reliability of morpheme and syllable estimates presents some 
methodological challenges before adopting these findings conclusively.  Moreover, 
this area investigation trying to understand the temporal dynamics of linguistic 
processes in production and perception requires a better theoretical consensus on how 
to count all of these units (words/signs, morphemes, and syllables) and compare 
them.  Nevertheless, the emerging trend from this first attempt to study all of these 
rates together suggests that units of form (syllables) and meaning (morpheme) unfold 
at approximately the same time scales in all languages.  
 
3.7 Conclusion 
 
By examining the rates of words, signs, morphemes, and syllables, this study 
provides new insights on the universal time properties of language production and 
also differences that arise due to grammar and modality.  The results from English 
and ASL converge with previous studies that have examined word, sign, and syllables 
in these languages (Bellugi & Fischer, 1972; Grosjean, 1979; Wilson & Nolen, 1986; 
Emmorey & Corina, 1993; Corina & Knapp, 2006).  The results from Korean 
syllables confirm models of speech production based on other spoken languages 
(Greenberg, Hollenback, & Ellis, 1996; Arai & Greenberg, 1998).  The unique 
contribution of this present work is the demonstration of the relationship between the 
152 
physical dynamics of language production and representational units of meaning.  
Taken all together, these findings reveal consistent patterns in language processing 
although the particular rates may differ. 
Bellugi and Fischer?s (1972) original work comparing the rate of a spoken 
language (English) and signed language (ASL) concluded that at the word/sign level, 
signed languages are twice as slow as spoken languages but that at the 
propositional/sentence level, the rates across the modalities are the same.  They 
speculated that the convergence of global rates despite the discrepancy of local rates 
is due to the differences in the grammatical properties of the two languages.  Later 
work (Klima & Bellugi, 1979) examining a signing system that maintains a similar 
grammatical structure to English verified that without the special grammatical 
properties of a true sign language, a manual communication system is significantly 
slower.   
The present results demonstrate that the word-sign comparison is not very 
meaningful when considering that even among spoken languages, the amount of 
linguistic information within a word can vary greatly, which is traditionally 
represented by the analytic-synthetic continuum.  A comparison of word rates in 
English (~ 5 words per second) and Korean (~ 3 words per second) reveals that word 
rates are not indicative of major differences due to modality but grammar and how 
word boundaries are determined in languages.  Nevertheless, an analysis of 
morpheme rates in English, Korean, and ASL indicates that Bellugi and Fischer?s 
conclusion about rate differences due to modality, where spoken languages are twice 
as fast as signed languages, still presents a deep puzzle.  Morpheme rates are ~ 6 
153 
morphemes per second in English and Korean and ~ 3 morphemes per second in 
ASL.   
Moreover, this work goes beyond Bellugi and Fischer?s study by analyzing 
the rate of syllables among three languages.  It also complements Wilbur and Nolen?s 
(1986) study focusing on syllable rates of ASL but differs from their work by 
focusing on intra-sign movements (or syllable nuclei) that are involved in the 
articulation of signs, whereas they also included inter-sign transitional movements.  
Syllable rates reveal the physical dynamics in production and also serve as units for 
sensory integration in perception.  In phonological theory, syllables serve as 
sublexical units to which constraints and rules apply.  Similar to the notion of 
syllables in spoken languages, syllables in sign languages organize the timing of 
phonetic segments and arrange them into a sonority/saliency hierarchy.  Wilbur and 
Nolen have speculated that syllable rates are the same across spoken and signed 
languages.  However, the results presented here suggest that time-scales of syllables 
across the modalities are different ? ~6-7 syllables per second in English and Korean, 
and ~3 syllables per second in ASL (or ~4 syllables per second according to Wilbur 
and Nolen).  These rate differences are consistent with the differences in the 
frequency of syllables found in babbling, where vocal babbling is faster than manual 
babbling. 
Nevertheless, a consistent pattern that emerges is that the ratio of morphemes 
to syllables is approximately 1:1 in both modalities.  In English, there are certainly 
polysyllabic and monomorphemic words, such as apple and kitchen.  However, there 
are also numerous highly frequent monosyllabic and multimorphemic words, such as 
154 
went and men.  The same pattern holds in Korean.  As Brentari (2002) has described, 
in some ways ASL can be described as a language that is monosyllabic and 
polymorphemic because it has a rich system of simultaneous morphology that 
exploits the use of space.  However, it also has an inventory of bisyllabic signs (like 
CANCEL and NEVER) and cases in normal usage where monosyllabic signs are 
reduplicated to polysyllabic forms.  Among all these languages, many morphemes are 
monosyllabic, and monomorphemic-polysyllabic cases are balanced with 
polymorphemic-monosyllabic cases. 
Bellugi and Fischer (1972) listed three reasons for how propositions/sentences 
in ASL can contain similar amounts of semantic information despite having fewer 
signs/words than English: 1) doing without, 2) incorporation, and 3) body movements 
and facial expression.  Based on the results of the present study, which took into 
account the incorporated information in signs by measuring morpheme rates, the 
factor that seems to play the biggest role in the convergence of rates in spoken and 
signed languages appears to be the idea of ?doing without,? which Bellugi and 
Fischer characterize as a way of reducing redundancy and increasing information 
density.   
In doing a morpheme rate analysis, this study was not able to replicate the 
findings from Senghas and Coppola (2001), who measured morpheme rates as an 
indicator of fluency.  Among the group who used the full-fledged version of 
Nicaraguan Sign Language, the average rate was 350 morphemes per minute, or ~6 
morphemes per second.  Because the study focused on the use of spatial modulations 
in the grammar and did not elaborate on the details of the rate analysis, it is not 
155 
possible to determine whether these differences in results are due to the difference 
between ASL and NSL or a difference in methodologies on how morphemes were 
counted.   
In addition to assessing inter-rater reliability for all these data, future analyses 
on the coded data will benefit from considering alternative ways of counting 
morphemes, especially in ASL.  Determining how to count morphemes presents 
challenges in both spoken and signed languages.  Theories of syntax and morphology 
in generative grammar posit the presence of phonetically null elements that serve 
functional roles in derivations (Embick & Noyer, 2007; Baker, 1996).  Here, only 
morphemes that were phonetically realized in some way were counted.  For example, 
men was counted as having 2 morphemes even though the regular plural suffix is not 
attached because of a phonetic change to the root.   The same approach was taken 
when analyzing ASL, with most attention given to the manual gestures and where 
facial features were taken into account in question-marked constructions.  It is 
possible that different criteria could have resulted in a higher estimate of morpheme 
counts.  For example, it was decided that agreeing and spatial verbs would be counted 
with 2 morphemes, 1 for the root and 1 for the agreement or spatial feature.  Another 
approach would have been to count these as having at least three morphemes, the 
verb root, and subject and object for agreement verbs, and the source and goal 
locations in spatial verbs.  However, the vast majority of the verbs found in the 
sentences (that were selected before the annotation process) were plain/uninflecting, 
and it is predicted that this revision would not significantly change the results.  de 
Beuzeville, Johnston, & Schembri (2009) report similar patterns for plain verbs in 
156 
Australian Sign Language.  Perhaps one way to increase the number of constructions 
involving morphologically richer verbs would be to have participants view videos 
with actions involving many of these verbs and then discuss them with other 
participants.  
Another potential way of increasing morpheme counts in ASL is to take into 
greater consideration the derivational processes described by Padden and Perlmutter 
(1987) that change the movement of a sign. Repeated circular movements can change 
regular adjectives to mean ?characteristically ___?.  Small, quick movements that are 
reduplicated forms activity nouns from verbs, as in pairs such as SIT-CHAIR. 
Although it seems relatively simple to systematically count signs like CHAIR as 
consisting of 2 morphemes (SIT+NOUN), it becomes a tricky issue for nouns like 
CHURCH and NURSE that phonologically have reduplicated noun forms without 
corresponding verbs.   
  
Figure 37. Reproduced from Padden & Perlmutter (1987), where 
reduplicating circular movement turns the adjective QUIET to mean 
?characteristically quiet?, or taciturn. 
157 
 
 
Figure 38. Reproduced from Aronoff, Meir, & Sandler (2005), 
demonstrating a complex ASL classifier construction: ?A person walks 
forward, (dragging) a dog squirming behind.? 
 
Perhaps classifier constructions present the greatest challenge in 
understanding how many units of meaning can be captured in visual imagery.  Liddell 
(2003) provides a useful discussion of these issues.  DeMatteo (1977) and others 
argue that classifier constructions (or ?classifier predicates?) are analogical rather 
than discrete, and that morphemic representations of these constructions are not 
appropriate.  In contrast, Supalla (1982) has proposed that these constructions can be 
analyzed as being a highly complex, productive, multimorphemic system.  Liddell 
himself has argued for a hybrid of these models where handshapes have lexical status 
but the use of these handshapes are gradient/analogical.  Liddell cautions against 
attributing morpheme status to metaphorically expressed, depicting movements.  For 
example, he points out that [rl] is not considered a morpheme in English although 
there are words like curl, swirl, whirl, twirl, furl, and gnarl, which all have meanings 
related to round, twisted shapes.  Dudis (2011) explains that these issues that make 
depicting verbs hard to analyze morphologically also apply to agreement/indicating 
verbs since they utilize correspondences in space.    
158 
Iconic aspects of signs are highlighted as one of the key modality effects in 
language and poses interesting challenges in understanding how meaning is 
composed.  For example, mental verbs and nouns tend to involve articulations near 
the forehead, for example, THINK, KNOW, and DREAM.  One possibility is to 
assume a morpheme for MIND, but it is impossible to distinguish whether such 
morphemes are computed compositionally or whether the forehead is exploited as an 
iconic place of articulation in phonology.  Signs like BELIEVE and AGREE have 
been described as originating from compounds: THINK-MARRY and THINK-
 SAME, respectively.  In the case of THINK-MARRY, there is a change from the ?1?-
 handshape for THINK to the ?C?-handshape for MARRY.  It is now common to see 
uses of BELIEVE involving handshape assimilation where the ?C?-handshape starts 
near the forehead.  Understanding the etymology of this sign motivates a 
bimorphemic analysis, but there may become a point where this compositional aspect 
gets lost in on-line processing. 
Finally, any future work examining the morpheme rate in a sign language 
should provide a more careful analysis to non-manual features, which in addition to 
eye-brow raising/lower includes, eye-gaze, body-shifts, and mouthing.  Facial 
articulations can provide lexical information that adds to the meaning of a sentence.  
For example, when mouthed at the same time as the verb, the ?TH? expression 
(tongue between the teeth) means ?carelessly? and ?MM? expression (protrusion of the 
lips) means ?with relaxation and enjoyment? (Corina, Bellugi, & Reilly, 1999).  Eye-
 gaze and body shifts may have provided a phonetic cue of pronouns in cases where 
159 
the argument was assumed to be ?null?.  In order to catch these subtleties that may 
have been lost in these annotations, a corpus with high video quality is necessary. 
Taking all these factors into account may show that ASL also displays >6 
morphemes per second, suggesting that morphemes rates are universal.  If so, the 
ratio of morphemes to syllables may be >1:1, which may be a unique property of sign 
languages.  However, if transitional movements between signs are also factored in, as 
proposed by Jantunen (2010), ratios may still remain consistent. 
Another way to compare rates of spoken and signed languages in the future 
may be to compare only open-class/lexical (where lexical is contrasted with 
functional) morphemes.  For example, in a sentence like ?I ate an apple,? although 
there are 5 morphemes total (I-eat-past-an-apple), it only contains 2 lexical 
morphemes (eat-apple), like the ASL sign EAT-APPLE.  It is possible that ASL may 
have more phonetically null morphemes than English or Korean.  As discussed by 
Lillo-Martin (1991) and Fischer et al. (1999), ASL may be more discourse-dependent 
than English, where the meaning of individual sentences is harder to recover in 
isolation.  Because Korean is a pro-drop language, it may be considered more 
discourse-dependent than English, but morpheme rates were in the same time-scale as 
English.  Although Lillo-Martin (1991) has suggested that ASL is discourse-
 dependent like Chinese, Japanese, and Korean, more significant differences in the 
degree of discourse-dependence may be determined by modality.   
The present study did not conduct an analysis of propositional rate for the 
following reasons: 1) the materials used for the three languages were not matched for 
semantic content, and 2) the high likelihood that propositional rates across English 
160 
and ASL are comparably equal given the task of simultaneous interpreting by 
professionals.  Although discrepancies may exist for particular constructions, the 
global rates are generally assumed to be the same.  Padden (2000:179) summarizes 
this view by saying, ?Languages of different modalities organize timing, prosody and 
syllable structure differently even if linguistic content is similar.  However, over a 
span of time, the amount of information in any language, signed or spoken, is roughly 
equivalent.?  The slowness of artificially created signing systems adds further support 
to the idea that natural language processing occurs within a certain range of time 
constraints.   
An analysis of the rate of signing or speech in highly-skilled interpreting and 
comparison with the rate of the original production may be a useful way to study how 
global rates become equivalent across the modalities.  It is expected that some short 
constructions in ASL require long English translations, and vice versa.  However, to 
more accurately capture these patterns and fully understand universal time properties, 
an examination of more sign languages, especially those with different grammatical 
properties, is needed.  For example, Japanese Sign Language is reported to have 
gender marking on verbs, and Taiwan Sign Language is reported to have auxiliary 
verbs (Padden, 2000).   
Aside from words/signs and propositions, languages also have phrasal units, 
which are intermediate levels of structure.  Due to time constraints, it was not 
possible to look at phrasal units at the time of the study.  Nevertheless, I can speculate 
about how spoken and sign languages compare at intermediate time scales.  Nespor 
and Sandler (1999) describe how similar principles of dividing sentences into 
161 
prosodic and intonational phrases applies to spoken and signed languages. Although 
there are debates about the degree to which this isomorphism holds, it has been 
shown that there are phonological constituents that correspond to syntactic 
constituents (Nespor & Vogel, 1986; Selkirk, 1984).  In sign language production, 
eyeblinks have been recognized as occurring at syntactic boundaries and discourse 
transitions (Baker & Padden, 1978; Bahan & Supalla, 1995).  Nespor and Sandler 
(1999) provide an analysis of Israeli Sign Language, where cues for prosodic and 
intonational phrase boundaries are taken from facial features (brows, eyes, cheeks, 
mouth, tongue, head tilt, mouthing), body shifts, and temporal cues (reduplications, 
pauses, and speed and size of movements).  Although they do not report the time 
durations of prosodic phrases, which are embedded in intonational phrases, on 
average their examples show prosodic phrases with 2 signs and intonational phrases 
with 3 signs.  Boyes-Braem (1999) provide an analysis of prosodic rhythms among 
early and late learners of Swiss German Sign Language.  The examples in her work 
show the time-course of sentences, where signs and prosodic units are labeled.  A 
rough estimate based on the measurements she provides suggests that a prosodic unit 
is, on average, about 1 second long.  Some of these findings may converge with 
reports of speech, where prosodic information is conveyed at rates of 1?3 Hz 
(Munhall, Jones, Callan, Kuratate, & Vatikiotis-Bateson, 2004).   
The intuition behind the finding that global rates in languages are equivalent is 
that the rate of informational transfer in language is relatively consistent.  In this 
sense, the capacity of the communication channel in language may be amodally 
determined.  The objective in transmitting information is to ensure that the message is 
162 
conveyed through a noisy channel, in the shortest amount of time, and with the lowest 
probability of error.  Resistance to error and rate are traditionally considered to be 
opposing factors, where conveying minimal information with each fragment is the 
most error resistant strategy but also the slowest.  Chong et al. (2009) demonstrate 
that the ratio of the information bits in units of ASL and English is approximately 5:1.  
Nevertheless, English is not overall slower than ASL.  An extension of their analysis 
to calculating information transfer rates suggests that they may be quite similar, 
although the channels that are particular to different modalities may determine how 
many information bits are transmitted at a time.  Based on these characteristics, 
auditory and visual processing of language may be differentially sensitive to noise in 
the channel, as demonstrated by the experiments in Chapter 2.   
The finding that units of linguistic form and meaning unfold at approximately 
the same time-scales has broader implications for language processing.  This suggests 
that sensory integration and extraction of meaning proceed in parallel.  In addition to 
research from neuroscience (Hickok & Poeppel, 2007), a typological investigation of 
patterns in language processing can lead to better models for the architecture of 
computational and neural networks for language.  Although there is still much room 
for improvement in our understanding of how many meaningful units get phonetically 
realized, how meaning is constructed, and the discrete nature of meaningful units 
(Embick & Noyer, 2007; Liddell, 2003), this first attempt to compare units of form 
and meaning highlights the importance of taking into consideration the time 
properties of phonological and morphological processing, which are temporally 
tightly linked.   
163 
4 Conclusion 
 
 
4.1 Overview 
 
 
In Chapter 2, I described evidence for larger temporal integration windows in 
sign language perception than in speech.  In Chapter 3, I summarized findings that 
support the claim that units of form and meaning are produced in periods of longer 
time scales in sign language production than in speech.  Despite these differences that 
are putatively aligned with modality, universal patterns emerge as well.  In both 
English and ASL experiments, intelligibility of the sensory signal falls drastically 
with severe time distortions created with local reversals.  The hypothesis that 
sensitivity to time distortions is dependent on the size of representational units in the 
signal has been confirmed in two ways, by comparing the results across English and 
ASL, and by comparing the results within each language for normal and compressed 
sentences.  Taken together with the findings from Chapter 3, as well as other studies 
on production rates, the temporal integration windows implicated by these perception 
experiments corresponds to syllables in ASL. 
In this concluding chapter, I discuss temporal patterns in language processing 
more broadly, providing a synthesis of key findings from speech and sign language 
research, considering the implications, and outlining future directions.  
164 
4.2 More than meets the eye 
 
Sign language perception involves much more than processing visual signals 
produced by the sign articulators.  It is guided by linguistic knowledge about sign 
language grammar and sensitivity to how signals unfold in time.  Temporal 
integration windows in language processing do not seem to arise just from the 
properties of a particular sensory system or just from a special property of language 
but from the interaction of the two.  The difference between ~ 50 ? 60 ms windows in 
speech and ~ 250 ? 300 ms windows in sign language clearly demonstrate the effect 
of modality. The window of ~ 250 ? 300 ms in sign language perception for sentences 
played at normal rates in this work is attributed to the perceiver?s integration of the 
visual signal according to syllabic units in ASL (present results from Chapter 3, as 
well as Wilbur & Nolen (1986)).   Studies of compressed and locally-time reversed 
sentences in both modalities have now shown that the durations over which the signal 
is integrated must be flexible to a certain extent and adjust to the rate of the incoming 
linguistic information.  Of course, time-compression studies of spoken and sign 
language (Foulke & Sticht, 1969; Foulke, 1971; Ahissar, Nagarajan, Ahissar, 
Protopapas, Mahncke, & Merzenich, 2001; Fischer, Delhorne, & Reed, 1999) at 
increasing rates also demonstrate the limitations of this flexibility, but similar 
findings from spoken and sign language suggest that perceptual bottlenecks are 
modality-independent.  
The results of Experiment 2, where the duration of temporal integration 
windows was proportionally reduced by half with sentences compressed by a factor 
165 
of 2, parallel to the findings in speech (Figueroa, 2009; Stilp, Kiefte, Alexander, & 
Kluender, 2010), point towards common mechanisms in the auditory and visual 
processing of language.  Stilp et al. (2010) argue that the findings from locally 
reversed speech support explanations based on cochlear-scaled spectra.  However, the 
present results from sign language demonstrate the need for more general models, 
where perceptual processes are more broadly driven by sensitivity to the rate of 
incoming information. 
In studies of low-level visual processing, temporal resolution in vision is ~20 
ms (Chase & Jenner, 1993).  In an EEG visual MMN paradigm, temporal windows of 
150-170 ms in duration are reported (Czigler, Winkler, Pat?, V?rnagy, Weisz, & 
Bal?zs, 2006).  The results of Experiment 1 in Chapter 2 implicate longer windows of 
~ 250 ? 300 ms for sign language processing, suggesting that the linguistic nature of a 
perceptual task can extend the duration of windows for sensory integration.  As 
described in Chapter 3, studies from categorical perception in signing (Emmorey, 
McCullough, & Brentari, 2003; Baker, Idsardi, Golinkoff, & Petitto, 2005; Best, 
Mathur, Miranda, & Lillo-Martin, 2010) and perception of apparent motion (Wilson, 
2001) have also shown that sign language knowledge guides visual processing.   
At the algorithmic level of language processing, I adopt the assumption that 
perceptual processes are guided by internal guesses about the upcoming 
representations (Halle & Stevens, 1959, 1962; Stevens & Halle, 1967; Yuille, & 
Kersten, 2006; Poeppel, Idsardi, & van Wassenhove, 2008).  The analysis of rates in 
natural production describes what the patterns that influence perception might be.  
Part of integrating the sensory signal over certain time windows is driven by the 
166 
expectation for representations unfolding over those durations.  When the sensory 
signal is manipulated in such a way that those expectations are violated, cognitively 
restoring the signals becomes much more difficult.   
Sentence processing in both spoken and sign languages requires the ability to 
track rapidly changing sensory signals and integrate them skillfully over long 
durations.  As Foulke and Sticht (1969) note in their review of compression studies, 
there are cases where performance on the identification of words is lower than overall 
comprehension of sentences, and where it is also higher.  The results in Chapter 2, 
where intelligibility of locally-reversed input falls sharply at 267 ms reversals and 
plateaus at ~50% at reversals of ~500 ms and greater, reflect the demands of 
phonological processing in sentence processing.  In a separate pilot study that was 
designed by Clifton Langdon, we tested the intelligibility of locally reversed single 
signs and found that accuracy was higher than 50% for most signs.  Although many 
signs are recoverable, when reversals exceed a certain size, it is likely that ?un-doing? 
the motion is not automatic and requires deliberate effort.  When all signs in a rapid 
sequence are distorted in such way, capturing each sign using concerted strategies 
becomes much more challenging.  As Mayberry and Fischer (1989) describe late 
learners, difficulty in sentence processing can be attributed to phonological 
bottlenecks (late L1 versus L2 learners of ASL were not distinguished in their study).  
Late learners are believed to be much less efficient at phonological encoding, which 
has consequences for many other aspects of language processing. Experiment 1 
results suggest that local reversals cause disruptions in the automatic recognition of 
phonological information that is encoded through time.  However, spatially encoded 
167 
phonological information provides a buffer that makes signed sentences more robust 
to time distortions than speech.  Late learners are characterized as a group that has 
difficulty with efficient phonological encoding for even normal sentences.  
Experiment 3 results show that late L2 learners of ASL are much more sensitive to 
distortions in the signal than native signers. 
In addition to theories of sensory processing, theories of representations of 
linguistic units and knowledge about how they combine are integral to complete 
models of language processing.  In particular, assumptions about the status of 
linguistic primitives motivate psycholinguistic and neurolinguistic investigations 
about how they fold unfold in real time (Poeppel, Idsardi, & van Wassenhove, 2008).  
In turn, considerations about the time-scales at which these units are processed may 
help better inform theories about representations for features, segments, syllables, 
morphemes, phrases, and sentences.   
 
4.3 Hierarchical coupling in sign language processing? 
 
In speech perception, it has been proposed that endogenous rhythms in the 
gamma (30-50 Hz) and theta (4-7 Hz) bands serve critical roles for the processing of 
segments and syllables.  More broadly, rhythmic aspects in many biological functions 
are associated with the frequencies of neural oscillations.  Given the findings that the 
rate of syllables and morphemes in ASL are 3 per second, and that ~ 250 ? 300 ms 
durations are critical temporal integration windows in perception, neural activity in 
the delta (1-3 Hz) band is implicated for sign language processing.   
168 
As is emphasized in the multi-time resolution model of speech perception 
(Poeppel, 2003), temporal integration windows need not be viewed as serially 
organized frames for processing.  In sign language, different levels of representations 
are also evident in theories of segments, syllables, prosodic units, and 
intonational/discourse units.  Even in the case of fluent fingerspelling, the letters do 
not come as a simple sequence of letters but are structured into ?chunks? that have 
been referred to as movement envelopes (Akamatsu, 1982).  Thus, while delta 
oscillations are by no means the only important neural activity, the new 
psychophysical results presented here strongly suggest that they may have a 
privileged status in sign language processing. 
The analysis of fine-structures in speech that operate at fast rates are attributed 
to oscillations in the gamma band and bilateral activations in the superior temporal 
gyrus (STG) of the auditory cortex (Boemio, Fromm, Braun, & Poeppel, 2005).  
Aside from some trilled movements where the temporal direction is nondistinctive, 
sign language does not involve fluctuations at such high frequencies.  Nevertheless, in 
experiments that tested the perception of meaningful lexical signs and meaningless 
(but phonetically plausible) signs, a bilateral activation in STG was found only for 
deaf signers but not hearing nonsigners (Petitto, Zatorre, Gauna, Nikelski, Dostie, & 
Evans, 2000).  Although the findings from deaf signers may point to explanations 
where STG might be more generally sensitive to some aspects of visual processing 
and not just auditory processing, the differences from hearing nonsigners suggest that 
the activation was driven by the nature of higher order processing of the visual 
signals, such as phonological processing, lexical access, and subsequent integration 
169 
into other computations.  Electrocorticographic gamma activity has been used to 
study neuroanatomy and processing dynamics of speech and sign language 
production (Crone, Hao, Hart, Boatman, Lesser, Irizarry, & Gordon, 2001), with 
results that are fairly consistent with other imaging studies that demonstrate overlaps 
in the functional organization of language-processing areas across modalities.  Aside 
from the special role that it may have for sensory selection in speech perception, 
gamma activity is more broadly associated with feature binding and attention (Singer 
& Gray 1995; Fries, Nikolic, & Singer, 2007; Schroeder & Lakatos 2008).   
The current findings about the time properties of sign language processing 
suggest that the brain operates in a rhythmic mode, and more specifically, that neural 
activity entrains to the low frequency rhythms of signing.  Based on the models of 
oscillatory coupling, especially where gamma synchronies contribute to 
enhancements in the processing of task-relevant events (Schroeder & Lakatos, 2009) 
and attention in visual information processing (M?ller, Gruber, & Keil, 2001), future 
work may also find evidence for the critical role of gamma activity in sign language 
processing for sensory selection as well as higher-order processing as in speech. 
Future work investigating the temporal properties of signing and the neural 
basis for these dynamics requires use of methodologies with high temporal resolution, 
such as EEG and MEG, complemented by high temporal resolutions measures of sign 
articulation.  It may be predicted that phase patterns of endogenous rhythms in the 
delta band will be correlated with the sign language intelligibility, where successful 
processing of the visual signals requires continuous segmentation and integration of 
the input in ~300 ms temporal windows.  At these low frequency rates, the dynamics 
170 
of sensory processing in spoken and signed languages may converge.  However, these 
low frequency rates may play a greater role in sign language processing because 
lexical and prosodic information are processed together at these time-scales.  This 
prediction may be consistent with the prosodic model of sign language phonology 
(Brentari, 1998:22), who argues that ?ASL exploits paradigmatic constraints in a 
greater range of phenomena than do spoken languages.?  Finally, understanding the 
nature of the relationship between the neuronal oscillations that subserve language-
 independent functions and those that entrain to the sensory input in language 
processing should be a broader goal in this research.   
 
4.4 Innate sensitivity to rhythms in language 
 
Sensitivity to rhythms in language is attested in the earliest stages of language 
acquisition, where newborns are born preferring the voice and language of their 
mother (DeCasper & Fifer, 1980; Mehler, Jusczyk, Lambertz, Halsted, Bertoncini, & 
Amiel-Tison, 1988).  An analysis of newborns? cry melodies have shown that their 
productions reflect the prosodic contours of their mother?s language (Mampe, 
Friederici, Christophe, & Wermke, 2009).  After birth, prosodic information in speech 
continues to shape the language acquisition for young children, for word 
segmentation (Jusczyk, Houston, & Newsome, 1999) and learning syntactic structure 
(Gleitman & Wanner, 1982).  Infants seem to prefer input where prosodic contours 
are made salient through infant-directed speech (Cooper & Aslin, 1990; Werker & 
171 
McLeod, 1989).  Babbling, one of the earliest stages of language production, is 
marked by its rhythmic qualities.   
The importance of rhythm in sign language processing is now evident in a 
wide variety of cases.  Babbling is no longer considered to be a precursor to speech 
because of the biomechanics of the mandible but to all languages (Petitto & 
Marentette, 1991).  Deaf infants growing up in signing environments also prefer 
?motherese? versions of the input (Masataka, 2003).  The sensitivity to rhythmic 
aspects of the visual signal does not arise from auditory deprivation.  Hearing 
children who are born to deaf parents and thus exposed to signing also manually 
babble (Petitto, Holowka, Sergio, & Ostry, 2001).  These manual gestures are distinct 
from other manual movements that might be typical of general motor development 
because they are produced in the signing space, produced at unique frequencies, and 
only appear among sign-exposed infants.  Evidence that sensitivity and preference for 
linguistic input is partially innate and not driven by exposure is presented by Krentz 
and Corina (2008).  6-month old hearing infants that had never been exposed to sign 
language show a preference to look at videos of signing compared to videos of 
communicative gestures that are not linguistic. 
Even in the domain of fingerspelling, which may be considered a sequence of 
handshapes to represent letters of the English alphabet, shows hierarchical 
organization and rhythmic properties.  The acquisition of fingerspelling by young 
deaf children reflect their recognition of movement envelopes, where fingerspelled 
words are analyzed as whole units rather than individual handshapes (Padden & 
LeMaster, 1985; Andrews, Leigh, & Weiner, 2004).  
172 
Rhythmic characteristics also distinguish native and non-native signers.  In 
subjective ratings, the cues that judges used to determine whether or not a signer was 
native or non-native were handshape, facial expression, rhythm, and lexical choices 
(Kantor, 1978).  In a quantitative measurement of the production of native and non-
 native signers of Swiss German Sign Language, Boyes-Braem (1999) found that 
native signers use side-to-side movement of the torso according to prosodic and 
discourse units in the signed sentences, and that this was lacking among late learners.  
Among the three late learners, one who had some limited exposure for one year at an 
early age, had more of these left-right movements than the other two who had no 
early exposure (see Figure 39).  The results suggest that late learners are following 
the prosodic patterns of spoken German (their first language) rather than sign 
language. This production study stands in contrast with perceptual studies where 
nonsigners had similar sensitivity to sign language prosodic cues (Brentari, Gonz?lez, 
Seidl, & Wilbur, 2011; Fenlon, Denmark, Campbell, & Woll, 2007).   Thus, although 
some aspects of prosodic rhythms in signing may be perceptually salient and not 
require sign language knowledge, it is interesting how these characteristics do not 
become automatic in production for late-learners who have had extensive exposure to 
signing. 
173 
 
Early Learner Late Learner 
 
 
Figure 39. Reproduced from Boyes-Braem (1999), demonstrating the 
difference between early and learners of Swiss German Sign 
Language in their lateral torso movements while signing. 
 
By continuing to better understand the rhythmic aspects of sign language 
production, future research can address what are the temporal characteristics of 
typical and atypical development.  Studying the spectral characteristics of signing 
(Foulds, 2004) can also lead to better models of what perceptual cues, aside from 
grammatical organization, distinguish linguistic and nonlinguistic gesture.  Finally, 
such guidelines may help better understand the status of iconic gestures, which seem 
to straddle these boundaries, in visuo-spatial communication.  
 
174 
4.5 Channel capacity for sign language 
 
Understanding the rate at which linguistic information is transmitted has had 
practical applications for designing communications devices.  The greater bandwidth 
required for videophones compared to telephones leads to the over-simplistic belief 
that sign language requires larger channel capacities in natural processing.  Chong et 
al. (2009) demonstrate that a phonetic unit that is realized in some fragment of time in 
ASL contains 5 times the amount of information compared to a phonetic unit in 
English.  Based on this calculation, estimating the bit rate per second in English and 
ASL based on production rates of words and signs showed that global information 
transfer rates are the same.   
In an independent information theoretic analysis, Reed and Durlach (1999) 
also reach the conclusion that auditory processing of English and visual processing of 
ASL involve the same information transfer rate.  Among all the communication 
systems they analyze (which also included Morse code though different modalities 
and Braille), the only other system that had comparable information rates with spoken 
English (auditory form) and signed ASL (visual form) was reading (visual form).  
Notably, the visual and tactile forms of spoken English and the tactile form of ASL 
had significantly lower rate measurements.   
In a study examining whether it is possible to transmit signs using the 
bandwidth of one telephone line, Tartter and Knowlton (1981) examined the 
intelligibility of signs produced with 27 moving spots.  This technique has been used 
to study the gross patterns of biological motion (Johansson, 1973).  In signing, 13 
175 
retroreflective tapes were anchored to gloves worn by each hand and 1 on the nose to 
provide a reference for place of articulation.  27 moving spots were sufficient to allow 
two pairs of deaf subjects to have conversations, although there was some difficulty 
with understanding fingerspelling.  In other studies, spatial image compressions and 
coding schemes have shown similar results, where videos can be substantially 
compressed while conveying intelligible messages in sign languages (Sperling, 
Landy, Cohen, & Pavel, 1985; Abramatic, Letellier, & Nadler, 1982; Pearson, 1981).  
In a more recent study examining the compressability of sign language video 
files, Foulds (2004) approaches the bandwidth requirements from both the perceptual 
and biomechanical perspectives.  Transmission of video with a limited bandwidth 
involves a trade-off of spatial resolution with frame rates.  He explains that most 
efforts on sign language communication systems have focused on how to achieve 
lossy spatial compression while preserving temporal information.  In perception, high 
frame rates are necessary to surpass the critical flicker frequency.  However, from a 
kinematic point of view, critical information for sign language perception may be 
encoded more sparsely.  In a separate pilot study, Foulds measures the spectral 
characteristics of sign language motion by using a sensor that tracks the right index 
finger of a signer who produced a list of 20 ASL signs.  Convergent with the results 
of the rate analysis presented in Chapter 3, he found that most of the spectral energy 
is in the lower frequency range of 0-3 Hz.   
Based on these findings, Foulds estimated that a frame rate of 6 frames per 
second may be sufficient to capture the kinematic information necessary for sign 
intelligibility, the higher standard of 30 frames per second (0-15 Hz bandwidth) is 
176 
necessary to avoid flickers in perception.  Foulds uses a method that smoothly 
interpolates the lower bandwidth to the standard 30 frames per second.  The results 
from an intelligibility experiment, where original videos (with 0-15 Hz bandwidth) 
were compared to stick figure animations (with 0-15 Hz and 0-3 Hz bandwidths), 
showed that the temporal compression by a factor of 5 (to 6 frames per second) 
preserved the intelligibility of the stimuli.  Foulds concludes that ?Earlier reported 
limitations were imposed by human perception and are not determined by the 
kinematic bandwidth of human movement associated with sign production.? 
Fould?s measurement of the spectral characteristics of signing motion should 
be extended to articulations of conversational sentences for future studies comparing 
the dynamics of the sensory signal to the rhythms of neural oscillations, as discussed 
earlier. 
 
4.6 Availability of two communication channels? 
 
Given the apparently large differences between the vocal-auditory and 
manual-visual modalities, the convergence of rates in spoken and signed languages is 
remarkable.  What happens when both modalities/channels are available to a language 
user?  
Bimodal bilinguals are individuals who are fluent in a spoken and a signed 
language, like English and ASL.  In natural conversations, bimodal bilinguals have 
been observed to produce code-blended constructions, even while communicating 
177 
with English speaking monolinguals (Pyers & Emmorey, 2008).  Based on the finding 
that bimodal bilinguals used ASL-appropriate facial expressions while speaking 
English, Pyers and Emmorey propose that ?This result provides evidence for a dual-
 language architecture in which grammatical information can be integrated up to the 
level of phonological implementation.?  In a different study, Casey and Emmorey 
(2009) found that bimodal bilinguals produce more iconic gestures than nonsigners 
and that actual signs are used from time to time.   
The fact that both channels are available does not necessarily mean that 
information can be conveyed at a faster rate.  In a production task (Emmorey, Petrich, 
& Gollan, 2009), English-ASL bilinguals? performance on picture-naming tasks in 
ASL-only, English-only, and code-blending conditions were compared to English 
monolinguals and ASL monolinguals.  The reaction times among the English 
monolinguals and ASL monolinguals were the same.  The reaction times for bimodal 
bilinguals in the English-only conditions were similar to the English monolinguals.  
However, the responses to ASL-only and code-blending conditions were significantly 
slower, and the reaction times for these two conditions were the same.  The results 
suggest that production in the non-dominant language (ASL) is usually slower, and in 
code-blending conditions, where reaction times in English and ASL match, the slower 
response is attributed to time-locking with the slower language.  These findings 
suggest that the vocal and manual articulators are not independent in simultaneous 
production.  However, in a perceptual experiment, where participants had to make 
semantic judgments to words given in English, ASL, or both languages (code-
 blended), the fastest reaction times were attested in the code-blended condition. Thus, 
178 
in the perceptual channel, the use of both modalities has a facilitating effect, but in 
the production channel, it has a cost.   
In development, a comparison of bilinguals who are acquiring two spoken 
languages (French and English) and a spoken and signed language (French and 
Langues des Signes Qu?b?coise (LSQ)) has demonstrated that early linguistic 
milestones in each language are similar (Petitto, Katerelos, Levy, Guana, T?treault, 
Ferraro, 2001).  Language mixing occurred to varying degrees in both groups.  Even 
at early stages of acquisition, bimodal bilinguals children are found to produce 
simultaneous constructions of French and LSQ. 94% of the cases where the languages 
were mixed involved simultaneous language mixing.  Although both channels can be 
exploited for bimodal bilinguals, simultaneous production seems to only occur in 
constrained ways. 89% of the simultaneous mixing cases involved lexically 
congruence.  In the other 11% where the signs and words had different meanings, the 
meanings were cohesive, where, for example ??a ressemble? in French was uttered 
at the same time MOUCHOIR was signed in LSQ, to result in the sentence, ?This 
resembles a [facial tissue]?.  The average length of utterances by these children when 
mixing the languages was around 3 words, the same as for utterances without 
language mixing.  In other words, the availability of two channels did not mean that 
these children would produce utterances that had double the complexity.   
More recent work looking at the development of bimodal bilingualism ? 
among children in the U.S. learning English and ASL, and children in Brazil learning 
Brazilian Portuguese and Libras ? also demonstrate both the constrained and variable 
aspects of natural, simultaneous use of languages (Quadros, Lillo-Martin, & Chen 
179 
Pichler, 2010). Quadros et al. propose that ?multiple kinds of blending are possible 
with multiple articulators,? but that ?one proposition is one computation with 
intermodal expression.?  Cases where two separate propositions are uttered in the 
languages are never attested.  However, corresponding words and signs are not 
always produced together, and mismatches between two languages were common.  
The results that I have presented suggest that spoken and signed languages operate at 
different time scales for transmitting lexical items, which may be the cause of these 
mismatches.  These observations from language acquisition should be contrasted with 
the findings from adults (Emmorey, Petrich, an&d Gollan, 2009), where code-
 blending resulted in slower reaction times, which was attributed to the time-locking 
of the faster, dominant language (English) with the slower language (ASL).    
The simultaneous use of a spoken language and a signing system for 
expressing whole sentences is called simultaneous communication, or SimCom.  
Given that there are no pairs of a spoken and signed language that have identical 
grammars, SimCom usually involves a signed version of a spoken language.  
SimCom does not arise naturally and was originally designed for use in deaf 
education settings.  In Quadros et al.?s work studying natural forms of simultaneous 
production in early acquisition, it is shown that some mismatches between the 
modalities can be tolerated.  However, the lack of natural, full-fledged simultaneous 
systems suggests that mismatches created by the differences in grammars of a spoken 
and signed language cannot be tolerated (Wilbur & Petersen, 1998).  Even with the 
substitution of a natural sign language with an artificial signing system that follows 
the grammar of the spoken language, errors made in both languages while using 
180 
SimCom reflects processing costs.  A common observation in SimCom is that signed 
English can become inaccurate due to omissions of signs.  Marmor and Petitto (1979) 
report that there were errors in 90% of the signed English sentences produced by 
teachers.  One possibility is that these errors in signed English could be a result of the 
fact that it is a not a natural sign language and operates at a time-scale that is globally 
too slow.  However, Hyde and Power (1991) also report that accuracy in Australasian 
Signed English in SimCom comes at the expense of decreased naturalness in speech 
production, with much slower prosody.  Interestingly, accuracy and rate of speech 
were correlated, where individuals with faster rates were also more accurate.   
In Bellugi and Fischer?s (1972) study, production rates were analyzed for 
ASL-alone, English-alone, and simultaneous signing-speaking conditions.  They 
explain that the participants had much experience and are highly skilled at 
simultaneous production, but they do not explain whether the signing was closer to 
ASL or signed English.  From their description, it appears that both languages were 
affected by the other, where translations of ASL signs resulted in unnatural lexical 
choices in English, and vice versa.  In the simultaneous condition, there were more 
errors in both languages and more time spent for pausing during the narration as 
compared to the one-language conditions.  Somewhat consistent with the results from 
the picture naming task reported by Emmorey, Petrich, and Gollan (2009), the rate of 
speaking in the simultaneous condition was slower than in the speaking-alone 
condition, but the rates in signing were the same.    
Wilbur and Petersen (1998) investigate the modality interactions with respect 
to temporal properties in SimCom.  Consistent with previous studies, they find that 
181 
sentences produced in English only and ASL only take approximately the same 
amount of time.  Consistent with Klima and Bellugi?s (1979) results on signed 
English, Wilbur and Petersen find that sentences in signed English take considerably 
longer.  The novel finding, however, is that SimCom requires durations that are 
longer than speech-alone but shorter than sign-alone conditions.  In other words, 
speech production is slowed down in SimCom but signing is sped up.  Given that 
speech and signing occur at different time-scales, SimCom forces the systems to be 
time-aligned, which incurs costs in time (for speech) and accuracy (for signing, as 
speeding up results in increased sign omissions).   
As suggested by Fischer, Delhorne, and Reed (1999), as well as Foulke and 
Sticht (1969), bottlenecks in language processing are likely to be rooted in factors 
beyond motor articulation or sensory processing.  In the case of SimCom, even when 
there is redundancy in meaning, there are minimal overlaps in phonological form, 
where sublexical units produced in each language contain information bits.  
Generating this amount greater amount of information has a high cost in production, 
although processing inputs with redundancy in meaning is advantageous in 
perception.  These bodies of work combined suggest that channel capacities in 
language processing arise more from cognitive constraints than from the articulatory-
 perceptual interface.   
 
182 
4.7 Rates in production and time-course of recognition 
 
Evidence now from several studies show that signs are produced at a rate of 2-
 3 per second in ASL.  This implicates that signs are produced at periods of ~400 ms 
at a time.  Although sign durations can be variable, this is not consistent with other 
ways of measuring the average duration of signs.  Signs produced in isolation can be 
somewhat longer (>500 ms), whereas signs excised from sentences can be shorter 
(~250 ms), especially depending on the position within the sentence.  The 
discrepancy between periods determined by rate analysis and durations determined by 
direct measurements is attributed to the fact that there are transitions external to the 
signs. 
Results from perceptual tasks suggest that the time-course of identifying a 
sign is much shorter than 400 ms.  Emmorey and Corina (1990) used a gating task 
with signs presented in isolation.  Participants were asked to identify signs (and report 
how confident they were about their guesses) after viewing videos of a sign, where 
one videoframe was added to each presentation.   On average, 240 ms of a sign 
contained enough information for accurate identification.  These results are similar to 
the findings from speech, where although the whole unit is stored (in memory or the 
lexicon), recognition in perception only needs processing up to point where the unit is 
distinctive from other units.  Grosjean (1981) found that for signs presented in 
isolation, signs can be recognized from approximately the first half of a sign.  
Extending these results, Clark and Grosjean (1982) tested signs that were produced 
within sentences and tested recognition times with or without the sentence context.  
183 
When presented in context, signs could be recognized from the first 40% of the sign.  
When analyzing what percent way through a sign makes it possible to identify each of 
the four formational parameters of signs (orientation, location, handshape, and 
movement), they found that sign recognition was linked to the identification of 
movement (Grosjean, 1981; Clark & Grosjean, 1982).  Although the four parameters 
could be isolated at around the same percent way into a sign, movement took the 
longest.   
If the average period per sign is about 400 ms, but the true duration of the sign 
is in fact shorter, and only about 50% of the sign has to be viewed for lexical 
recognition, this suggests that much of the sign period is not contributing lexical 
information, which is somewhat puzzling.  Clark and Grosjean do not explain how 
the onset of the sign was measured, especially with respect to how much of the 
transitional movement from the previous sign was included.  
As discussed earlier in Chapter 3, sign-internal movements are distinguished 
from sign-external movements.  Jantunen (2010) explains that ?Standard theory treats 
transitions as nonlinguistic, unintentional, meaningless, automatic, nonsalient, 
unmodifiable, holistic, etc. (e.g. Wilbur 1990, Perlmutter 1990, Wilcox 1992, van der 
Hulst 1993).?  However, Jantunen argues that this characterization of transition 
movements and traditional annotations of sign boundaries need to be revised based on 
two experiments (see Figure 40).  In the first experiment, Jantunen studied the 
biomechanics of signing by measuring the acceleration peaks during sentence 
production and found that movement dynamics of sign-internal and sign-external 
transitions were quite similar.  In the second experiment, he tested the intelligibility 
184 
of video clips that were created by excising the signs and concatenating the remaining 
transition frames in a sentence.  More than 60% of such ?signless? video clips were 
understandable.  Given the saliency of these transitions both in terms of phonetic 
attributes and meaningful content, Jantunen proposes that they should be viewed as 
being internal parts of signs.  
 
Figure 40. Reproduced from Jantunen (2010), demonstrating the 
acceleration peaks in the biomechanics of both hands while signing, 
annotated for traditional sign boundaries and transitions between 
signs. 
 
If Jantunen?s model were adopted, it would support the idea that duration of 
signs and the periods of signing rates are in fact the same.  Nevertheless, it still 
presents the question on why signs are produced over long periods of time when the 
time course of lexical recognition is shorter by as much as 50%.  From an 
informational theoretic perspective, some amount of redundancy is not only expected 
but desirable.  A potentially meaningful connection is that redundancy in printed 
English is also reported to be approximately 50% (Chong, Sankar, & Poor, 2009).   
Jantunen?s model also suggests that the idea that most signs are monosyllabic 
needs to be revised.  Future investigations examining the relationship between the 
185 
rate of form and meaning will need to consider these new ways of counting 
phonological units as well as morpheme units.  Moreover, interpreting all movements 
to be sign-internal would mean that sign languages are not different from spoken 
language in that all phonetic components are meaningful even though the articulators 
are not hidden.  Combining Jantunen?s methods of measuring the patterns of 
acceleration peaks (Jantunen & Takkinen, 2010) and Fould?s (2004) methods of 
measuring spectral frequency will lead to a better understanding of the rhythmic 
characteristics of signing.   
 
4.8 General conclusions 
 
The goal of this dissertation was to investigate temporal integration windows 
and the rates at which form and meaning unfold in language processing from a cross-
 linguistic and cross-modal perspective.  Psychophysical experiments using locally-
 reversed speech demonstrated that temporal integration windows that capture 
direction-sensitive input are much longer in sign language (~ 250 ? 300  ms) than in 
speech (~ 50 ? 60 ms).  Despite the differences in these absolute values, the universal 
pattern across languages is that temporal integration windows are sensitive to the size 
of representational units in language.  This was demonstrated by the reduction of 
temporal integration windows in proportion to time-compressed sentences and by the 
comparison between English and ASL.  The analysis of production rates from corpus 
data of natural conversations also contributed to a better understanding of what is the 
temporal dynamics in language production that might shape expectations in the 
186 
perceptual process. A common mechanism for mapping sensory signals to abstract 
representations that are used towards higher-order linguistic computations is 
integration in time-scales that match the rate of the linguistic units.  Construction of 
meaningful representations may also operate at the same time-scale as syllables since 
the ratio of syllables and morphemes is approximately 1:1 .  Although there may be 
different requirements in the technological implementation, auditory and visual 
channels for language processing seem to involve similar global rates of information 
transfer and the same amount of redundancy.   
Studying the rates at which linguistic units are produced and the time-
 windows over which this information is integrated leads to a better understanding of 
how information is chunked and organized when processed through a particular 
channel.  The degree to which information is encoded sequentially/simultaneously 
affects the time-scales of integration windows.  In speech, the unintelligibility of 
sentences where fragments were reversed at durations exceeding the average size of 
segments demonstrates the importance of the temporal direction of the segments.  In 
sign language, although syllables can also be decomposed to segmental units (Liddell, 
1984; Perlmutter, 1992), the robustness of signed sentences to reversal sizes up to the 
length of syllables and signs suggests that the way temporal direction of these 
segments are encoded is different in nature from the segments in speech.  These 
results are consistent with the findings of Wilbur and Allen (1991), who argue against 
internal structure in ASL syllables.   
The difference in performance between early and late learners of ASL 
supports the view that part of being a native user of a language is the ability to 
187 
efficiently decode the sensory signal and map it to lexical representations.  Part of 
having robust phonological representations may be having tolerance to distortions in 
the signal.  Following the work on bilingualism in spoken languages, future work 
investigating the effect of developmental factors in sensory integration should 
examine the role of noisy environments and noisy inputs for processing sign 
language.  In the case of temporal distortions, late learners are more vulnerable than 
early learners of a sign language.  This remains untested among late learners of a 
spoken language.  In the case of snow-like visual noise, Mayberry and Fischer (1989) 
found that early and late learners were equally vulnerable.  In spoken languages, 
Rogers et al. (2006) report that late bilinguals are more vulnerable to noise and 
reverberations than early bilinguals, who are more vulnerable than monolinguals 
(Rogers, Lister, Febo, Besing, & Abrams, 2006).  It is possible that modality, type of 
noise, age of acquisition, and bilingualism all interact to produce different sensitivity 
to noise.  Understanding the effect of noise in communication for late learners is 
particularly relevant in the sign language community where 95% of deaf individuals 
are born to hearing parents and thus are not exposed to signing since birth.  For 
example, although a noisy channel of communication may allow successful 
communication between two native signers, it may not be adequate for late signers. 
Given the findings that early auditory deprivation leads to a reorganization of visual 
attention to peripheral fields of vision (Bavelier, Dye, & Hauser, 2006), future 
research should also investigate the effect of peripheral noise on cognitive processing 
for both early and late language learners.   
188 
Understanding how information is chunked through time also has implications 
for working memory.  Advantages in many cognitive functions are associated with 
larger capacities for holding information, integrating its contents, and comparing it to 
other sets of information (Baddeley, 2003; Duncan, Seitz, Kolodny, Bor, Herzog, & 
Ahmed, 2000).  Among the many factors that contribute to working memory 
function, such as attention (Engle, 2002; Conway, Cowan & Bunting, 2001; Kane & 
Engle, 2003) and inhibition or filtering mechanisms (Vogel, McCollough & 
Machizawa, 2005), strategies that can expand working memory capacity include 
rehearsability (Baddeley, 2003; Gathercole & Baddeley, 1993; Wilson & Emmorey, 
1997) and ?chunking? strategies (Miller, 1956).  Findings that using ASL results in 
shorter short-term memory spans than when using English implicate a difference 
between auditory versus visually-based representations in taking up resources within 
working memory (Boutla, Supalla, Newport, & Bavelier, 2004).   These results were 
consistent among deaf signers and English monolinguals as well as within hearing 
bilinguals who sign.  More specifically, it is possible that the sequential nature of 
units in speech-based input and processing these units in smaller time-scales results in 
higher spans that are measured serially.   
Among the many consequences of having delayed language exposure, 
individuals who are late learners of a first language have smaller working memory 
capacities than their early learning counterparts (Mayberry, 1993).   Newport (1990) 
explains that the acquisition process and errors made by late learners reflect a lack of 
understanding of the internal structure of signs.  By acquiring language through the 
development of memory, early learners may learn the discrete components in sign 
189 
even though they are produced simultaneously.  The ability to analyze language in a 
finer-grained way may contribute to a larger working memory capacity that can also 
be exploited for other cognitive functions.   
Knowledge about how linguistic information is chunked when viewed through 
the eyes may also be relevant to research in reading, especially among deaf children 
for whom vision is the primary channel for communication.  A key challenge in deaf 
education is improving the rates and achievement-levels of literacy.  For hearing 
children, correspondences in written print can be found with the language that they 
already speak.  For deaf signers learning to read, the process involves learning the 
grammar of a second language. Increasing evidence for the importance of early 
language exposure suggests that reading skills depend on strong language foundations 
(Mayberry, del Giudice, & Lieberman, 2010; Wilbur, 2000).  Most of the focus on 
literacy efforts has been on phonological coding and awareness skills (Wang, Trezek, 
Luckner, & Paul, 2008; Allen, Clark, del Giudice, & Koo, 2009). Since spoken 
language in its natural usage operates over short time scales that are not relevant in 
sign language processing, one might also imagine that this poses extra challenges.  In 
reading eye movement patterns, although hearing children (inexperienced readers) 
exhibit frequent small saccades, hearing adult readers (skilled readers) exhibit 
fixations in relatively long time scales (200-250 ms on average) and mean saccade 
sizes of 7-9 letter spaces (Rayner, 1998).  However, it is possible that different early 
language experiences, especially using distinct time windows for integrating language 
input, may shape the process of learning to read differently among hearing and deaf 
children. Ahissar et al. (2001) note that understanding temporal response patterns to 
190 
sensory signals is highly relevant in many cognitive functions.  In particular, they 
mention that individuals with ?poor successive-signal processing? in audition and 
vision tend to be poor readers, and that they are more vulnerable than good readers to 
time compressions of sentences.  
By examining perspectives from speech perception and sign language 
processing, as well as information theory, grammatical theory, development, and 
neuroscience, and summarizing new experimental work, I have demonstrated how 
critical temporal dynamics are for language processing and outlined new challenges 
for future research.  Besides making specific contributions to our understanding of 
temporal integration windows and rates in language processing through cross-
 linguistic comparisons, this work supports an approach to language processing that 
takes into account the representations of linguistic units, the information in those 
units, and the time course over which they unfold.  In addition to comparisons in 
grammar and functional organization in the brain, temporal relationships in on-line 
language processing demonstrate universal patterns.  Key differences also contribute 
to the model for the architecture of language, where interaction with the sensori-
 motor interfaces results in unique properties in each modality.  The time properties 
seen in language processing, which impact the grammar of spoken and signed 
languages, may be best understood from the perspective of the temporal dynamics of 
underlying neural processes, a claim that motivates future interdisciplinary work in 
speech and sign language research.  
191 
Bibliography 
 
Abel, S. M. (1972). Discrimination of temporal gaps. Journal of the Acoustical 
Society of America, 52, 519-524. 
 
Abramatic, J. F., Letellier, P. H., & Nadler, M. (1982). A narrow- band video 
communication system for the transmission of sign language over ordinary telephone 
lines. In T. S. Huang (Ed.), Image sequence processing and dynamic scene analysis 
(pp. 314-316). New York: Springer-Verlag. 
 
Adams, C. (1979). English Speech Rhythm and the Foreign Learner. The Hague: 
Mouton. 
 
Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke, H., & Merzenich, 
M.M. (2001). Speech comprehension is correlated with temporal response patterns 
recorded from auditory cortex. Proceedings from the National Academy of Sciences, 
98, 13367?13372. 
 
Akamatsu, C. T. (1982). The acquisition of fingerspelling in pre-school children. 
Unpublished doctoral dissertation, University of Rochester, Rochester. 
 
Allen, T. E., Clark, M. D., Del Giudice, A., Koo, D., Lieberman, A., Mayberry, R., & 
Miller, P. (2009). Phonology and reading: A response to Wang, Trezek, Luckner, and 
Paul. American Annals of the Deaf, 154(4), 338?345. 
 
Andrews, J., Leigh, I., & Weiner, M. (2004). Deaf People: Evolving Perspectives 
From Psychology, Education, And Sociology. Boston: Allyn & Bacon. 
 
Arai, T., & Greenberg, S. (1997). The temporal properties of spoken Japanese are 
similar to those of English. In Proceedings of Eurospeech: Vol. 2, 1011?1014. 
 
Arai, T., & Greenberg, S. (1998). Speech intelligibility in the presence of cross-
 channel spectral asynchrony. Proceedings of the 1998 IEEE International Conference 
on Acoustics, Speech and Signal Processing, Vol. 2, 933?936. 
 
Archangeli, D. & Pulleyblank, D. (1994). Grounded Phonology. Cambridge, MA: 
MIT Press. 
 
Aronoff, M., Meir, I., & Sandler, W. (2005). The paradox of sign language 
morphology. Language, 81(2), 301?344. 
 
Aronoff, M., Meir, I., Padden, C., & Sandler, W. (2004). Morphological universals 
and the sign language type. In G. Booij & J. van Marle (Eds.), Yearbook of 
Morphology 2004 (pp. 19?39). Kluwer Academic Publishers. 
 
192 
Baddeley, A. (2003). Working memory and language: an overview. Journal of 
Communication Disorders, 36, 189?208. 
 
Bahan, B., & Supalla, S. (1995). Line segmentation and narrative structure: A study 
of eyegaze behavior in American Sign Language. In K. Emmorey & J. Reilly (Eds.), 
Language, gesture and space (pp.171-191). Hillsdale: Lawrence Erlbaum Associates. 
 
Baker, C., & Padden, C. (1978). Focusing on the nonmanual components of 
American Sign Language. In P. Siple (Ed.), Understanding language through sign 
language research (pp.27-57). New York: Academic Press. 
 
Baker, M. C. (1996). The Polysynthesis Parameter. Oxford University Press. 
 
Baker, S. A., Idsardi, W. J., Golinkoff, R. M., & Petitto, L. A. (2005). The perception 
of handshapes in American Sign Language. Memory & Cognition, 33(5), 887?904. 
 
Battison, R. (1978). Lexical Barrowing In American Sign Language. Silver Spring, 
MD: Linstok. 
 
Bavelier, D., Dye, M. W. G., & Hauser, P. C. (2006). Do deaf individuals see better? 
Trends in Cognitive Sciences, 10(11), 512?518. 
 
Beasley, D. S., Forman, B. S., & Rintelmann, W. F. (1972). Perception of time-
 compressed CNC monosyllables by normal listeners. Journal of Audiology Research, 
12, 71?75. 
 
de Beuzeville, L., Johnston, T. & Schembri, A. (2009). The use of space with 
indicating verbs in Australian Sign Language: A corpus-based investigation. Sign 
Language & Linguistics 12(1), 53-82.  
 
Beck, M. (1998). Morphology and its interfaces in second language knowledge. 
Amsterdam: Benjamins. 
 
Bellugi, U., & Fischer, S. (1972). A comparison of sign language and spoken 
language: Rate and grammatical mechanisms. Cognition, 1(3), 173-200.  
 
Best, C. T., Mathur, G., Miranda, K. A., & Lillo-Martin, D. (2010). Effects of sign 
language experience on categorical perception of dynamic ASL pseudosigns. 
Attention, Perception, & Psychophysics, 72(3), 747-762. 
 
Bettger, J. G. (1992). The effects of experience on spatial cognition: Deafness and 
knowledge of ASL. Doctoral dissertation, University of Illinois, Urbana-Champaign. 
 
Bialystok, E. (2001). Bilingualism in development: Language, literacy, and cognition. 
Cambridge University Press. 
 
193 
Boemio, A., Fromm, S., Braun, A., and Poeppel, D. (2005). Hierarchical and 
asymmetric temporal sensitivity in human auditory cortices. Nature Neuroscience, 8, 
389?395. 
 
Bonvillian, J. D., & Folven, R. J. (1993). Sign language acquisition: Developmental 
aspects. Psychological Perspectives on Deafness, 1, 229. 
 
Bosworth, R.G., Dobkins, K.R., & Wright, C.E. (2010). Analysis of visual properties 
in American Sign Language. Presentation given at the 10th Theoretical Issues in Sign 
Language Research Conference, Purdue University, West Lafayette, IL. 
 
Boudreault, P., & Mayberry, R. I. (2006). Grammatical processing in American Sign 
Language: Age of first-language acquisition effects in relation to syntactic structure. 
Language and Cognitive Processes, 21(5), 608?635. 
 
Boutla, M., Supalla, T., Newport, E.L., & Bavelier, D. (2004). Short- term memory 
span: Insights from sign language. Nature Neuroscience, 7, 997?1002. 
 
Boyes-Braem, P., & Sutton-Spence, R. (2001). The Hands are the Head of the Mouth. 
Hamburg, Germany: Signum. 
 
Boyes-Braem,P. (1999). Rhythmic temporal patterns in the signing of deaf early and 
late learners of Swiss German Sign Language. Language and Speech, 42, 177-208. 
 
de Boysson-Bardies B. (1993). Ontogeny of language-specific syllabic productions. 
In B. de Boysson-Bardies, S. de Schonen, P.W. Jusczyk, & P. McNeilage (Eds.), 
Developmental Neurocognition: Speech and Face Processing in the First Year of Life 
(pp.353-363). Dordrecht, Netherlands: Kluwer. 
 
de Boysson-Bardies B. (1999). How Language Comes to Children: From Birth to 
Two Years. Cambridge, MA: MIT Press. 
 
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. 
Journal of the Acoustical Society of America, 112, 272-284. 
 
Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with 
learning disabilities: sentence perception in noise. Journal of Speech, Language, and 
Hearing Research, 46(1), 80-97. 
 
Brentari, D. (1995). Sign language phonology: ASL. In J. Goldsmith (Ed.) The 
Handbook of Phonological Theory (pp.615?639). Oxford, England: Blackwell. 
 
Brentari, D. (1998). A Prosodic Model of Sign Language Phonology. Cambridge, 
MA:MIT Press. 
 
194 
Brentari, D. (2002). Modality differences in sign language phonology and 
morphophonemics. In R. Meier, K. Cormier, & D. Quinto-Pozos (Eds.) Modality and 
Structure in Signed and Spoken Languages (pp.35?64). Oxford University Press. 
 
Brentari, D. (2006). Effects of language modality on word segmentation: An 
experimental study of phonological factors in a sign language. Papers in laboratory 
phonology, 8, 155?164. 
 
Brentari, D., Gonz?lez, C., Seidl, A., & Wilbur, R. (2011). Sensitivity to visual 
prosodic cues in signers and nonsigners. Language and Speech, 54(1), 49-72. 
 
Brentari, D., Poizner, H., & Kegl, J. (1995). Aphasic and Parkinsonian signing: 
differences in phonological disruption. Brain and Language, 48(1), 69?105. 
 
Budding, C., Hoopes, R., Mueller, M., & Scarcello, K. (1995). Identification of 
foreign sign language accents by the deaf. In L. Byers & M. Rose (Eds.) Gallaudet 
University Communication Forum, Vol. 4 (pp.1-16). Washington, DC: Gallaudet 
University Press. 
 
Busch, N. A., Dubois, J., & VanRullen, R. (2009). The phase of ongoing EEG 
oscillations predicts visual perception. Journal of Neuroscience, 29(24), 7869-7876. 
 
Buus, S., Florentine, M., Scharf, B., & Can?vet, G. (1986). Native French listeners? 
perception of American-English in noise. Proceedings of Inter-noise, 86, 895?898. 
 
Buzs?ki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. 
Science, 304(5679), 1926-1929. 
 
Campbell, R., Woll, B., Benson, P.J., & Wallace, S.B. (1999). Categorical processing 
of faces in Sign. Quarterly Journal of Experimental Psychology (52A), 62?95. 
 
Canavan, A., & Zipperlen, G. (1996a). CALLFRIEND American English-Non-
 Southern Dialect. Linguistic Data Consortium, Philadelphia. 
 
Canavan, A., & Zipperlen, G. (1996b). CALLFRIEND Korean. Linguistic Data 
Consortium, Philadelphia. 
 
Capek, C. M., Grossi, G., Newman, A. J., McBurney, S. L., Corina, D., Roeder, B., & 
Neville, H. J. (2009). Brain systems mediating semantic and syntactic processing in 
deaf native signers: Biological invariance and modality specificity. Proceedings of 
the National Academy of Sciences, 106(21), 8784 -8789. 
 
Casey, D. S., & Emmorey, K. (2009). Co-speech gesture in bimodal bilinguals. 
Language and Cognitive Processes, 24(2), 290?312. 
 
195 
Chase, C., & Jenner, A.R. (1993). Magnocellular visual deficits affect temporal 
processing of dyslexics. Annals of the New York Academy of Sciences 682, 326-329. 
Cheek, A., Cormier, K., Repp, A., & Meier, R. P. (2001). Prelinguistic gesture 
predicts mastery and error in the production of early signs. Language, 77(2), 292?
 323. 
 
Chen Pichler, D. (2006). The development of sign language. In K. de Bot & R.W. 
Schrauf (Eds.) Language Development over the Lifespan (pp. 217-241). New York, 
NY: Routledge. 
 
Chong, A., Sankar, L., & Poor, H. V. (2009). Frequency of Occurrence and 
Information Entropy of American Sign Language. arXiv:0912.1768. 
 
Cicourel, A., & Boese, R. (1972). Sign language acquisition and the teaching of deaf 
children. American Annals of the Deaf, 1771(1), 27-33. 
 
Clark, L. E., & Grosjean, F. (1982). Sign recognition processes in American Sign 
Language: The effect of context. Language and Speech, 25(4), 325-340. 
 
Conlin, K. E., Mirus, G. R., Mauk, C., & Meier, R. P. (2000). The acquisition of first 
signs: Place, handshape, and movement. In C. Chamberlain, J. Morford, & R. 
Mayberry (Eds.), Language Acquisition by Eye (pp. 51?69). Mahwah, NJ: Lawrence 
Erlbaum. 
 
Conway, A.R.A., Cowan, N., & Bunting, M.F. (2001). The cocktail party 
phenomenon revisited: The importance of WM capacity. Psychonomic Bulletin & 
Review, 8, 331-335. 
 
Cooper, R. P., & Aslin, R. N. (1990). Preference for infant-directed speech in the first 
month after birth. Child Development, 61(5), 1584?1595. 
 
Corina, D. P., Bellugi, U., & Reilly, J. (1999). Neuropsychological studies of 
linguistic and affective facial expressions in deaf signers. Language and Speech, 42, 
307. 
 
Corina, D. P., & Hildebrandt, U. C. (2002). Psycholinguistic investigations of 
phonological structure in ASL. In R. Meier, K. Cormier, & D. Quinto-Pozos (Eds.), 
Modality and Structure in Signed and Spoken Languages (pp.88?111). Cambridge 
University Press. 
 
Corina D.P., & Knapp H.P. (2006). Lexical retrieval in American Sign Language 
production. In L.M. Goldstin, D.H. Whalen, & C.T. Best (Eds.), Papers in 
Laboratory Phonology 8: Varieties of Phonological Competence (pp 213?239). 
Mouton de Gruyter: Berlin. 
 
196 
Corina, D. P., Poizner, H., Bellugi, U., Feinberg, T., Dowd, D., & O?Grady-Batch, L. 
(1992). Dissociation between linguistic and nonlinguistic gestural systems: A case for 
compositionality. Brain and Language, 43(3), 414?447. 
 
Coulter,G. R. (1982). On the nature of ASL as a monosyllabic language. Paper 
prsented at the Annual Meeting of the Linguistic Society for America, San Diego, CA. 
 
Cowan, N. (1995). Sensory memory and its role in information processing. In G. 
Karmos, M. Moln?r,V. Cspe, I., Czigler, J.E. Desmedt (Eds.), Perspective of Event-
 Related Potentials Research, EEG Supplement 40 (pp.21-31). New York: Elsevier. 
 
Crone, N. E., Hao, L., Hart, J., Boatman, D., Lesser, R. P., Irizarry, R., & Gordon, B. 
(2001). Electrocorticographic gamma activity during word production in spoken and 
sign language. Neurology, 57(11), 2045-2053. 
 
Czigler, I., Winkler, I., Pat?, L., V?rnagy, A., Weisz, J., & Bal?zs, L. (2006). Visual 
temporal window of integration as revealed by the visual mismatch negativity event-
 related potential to stimulus omissions. Brain Research, 1104(1), 129?140. 
 
Davis, S., & McCroskey, R. (1980). Auditory fusion in children. Child Development, 
51, 75-80. 
 
DeCasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their 
mothers? voices. Science, 208(4448), 1174-1176. 
 
Deiber, M. P., Missonnier, P., Bertrand, O., Gold, G., Fazio-Costa, L., Iba?ez, V., & 
Giannakopoulos, P. (2007). Distinction between perceptual and attentional processing 
in working memory tasks: a study of phase-locked and induced oscillatory brain 
dynamics. Journal of Cognitive Neuroscience, 19(1), 158?172. 
 
DeKeyser, R. M. (2000). The robustness of critical period effects in second language 
acquisition. Studies in Second Language Acquisition, 22(4), 499-533. 
 
DeKeyser, R. M. (2005). What Makes Learning Second-Language Grammar 
Difficult? A Review of Issues. Language Learning, 55(S1), 1?25. 
 
DeMatteo, A. (1977). Visual imagery and visual analogues in American Sign 
Language. In L. Friedman (Ed.), On the other hand: New perspectives on American 
Sign Language (pp. 109-136). New York: Academic Press. 
 
Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. 
Perception, 9, 719?721. 
 
Dolata, J. K., Davis, B. L., & MacNeilage, P. F. (2008). Characteristics of the 
rhythmic organization of vocal babbling: Implications for an amodal linguistic 
rhythm. Infant Behavior and Development, 31(3), 422?431. 
197 
 
Dudis, P.G. (2011). Response: Some observations on form-meaning. In G. Mathur & 
D.J. Napoli (Eds.), Deaf Around the World (pp. 83-95). Oxford University Press. 
 
Duncan, J., Seitz, R. J., Kolodny, J., Bor, D., Herzog, H., Ahmed, A. (2000). A neural 
basis for general intelligence. Science, 289(5478), 457-460. 
 
Eimas, P. D., & Miller, J. L. (1980). Contextual effects in infant speech perception. 
Science, 209(4461), 1140-1141. 
 
Elbers, L. (1982). Operating principles in repetitive babbling: a cognitive continuity 
approach. Cognition, 12(1), 45?63. 
 
Elliott, L. L. (1979).  Performance of children aged 9 to 17 years on a test of speech 
intelligibility in noise using sentence material with controlled word predictability. 
Journal of the Acoustical Society of America, 66, 651?653. 
 
Elliott, L. L., & Katz, D. R. (1980). Children?s pure-tone detection. Journal of the 
Acoustical Society of America, 67, 343?344. 
 
Embick, D., & Noyer, R. (2007). Distributed morphology and the syntax/morphology 
interface. In G. Ramchand & C. Reiss (Eds.), The Oxford Handbook of Linguistic 
Interfaces (pp.289?324). Oxford University Press. 
 
Emmorey, K. (1995). Processing the dynamic visual-spatial morphology of signed 
languages. In L.B. Feldman (Ed.), Morphological Aspects of Language Processing: 
Crosslinguistic Perspectives (pp.29-54). Mahwah, NJ: Lawrence Erlbaum Associates. 
 
Emmorey, K., & Corina, D.P. (1990). Lexical recognition in sign language: Effects of 
phonetic structure and morphology. Perceptual and Motor Skills, 71, 1227-1252. 
 
Emmorey, K, & Corina D. (1993). Hemispheric specialization for ASL signs and 
English words: Differences between imageable and abstract forms. Neuropsychologia 
31(7), 645? 653. 
 
Emmorey, K., & Kosslyn, S. M. (1996). Enhanced image generation abilities in deaf 
signers: A right hemisphere effect. Brain and Cognition, 32(1), 28-44. 
 
Emmorey, K., Bellugi, U., Friederici, A., & Horn, P. (1995). Effects of age of 
acquisition on grammatical sensitivity: Evidence from on-line and off-line tasks. 
Applied Psycholinguistics, 16, 1-23. 
 
Emmorey, K., Corina, D.P., & Bellugi, U. (1995). Differential processing of 
topographic and referential functions of space. In K. Emmorey & J. Reilly (Eds.), 
Language, Gesture, and Space (pp. 43-62). Mahwah, NJ: Lawrence Erlbaum 
Associates. 
198 
 
Emmorey, K., Klima, E., & Hickok, G. (1998). Mental rotation within linguistic and 
non-linguistic domains in users of American Sign Language. Cognition, 68(3), 221-
 246. 
 
Emmorey, K., Kosslyn, S. M., & Bellugi, U. 1993. Visual imagery and visual-spatial 
language: Enhanced imagery abilities in deaf and hearing ASL signers. Cognition, 46, 
139-181. 
 
Emmorey, K., Luk, G., Pyers, J. E., & Bialystok, E. (2008). The source of enhanced 
cognitive control in bilinguals. Psychological Science, 19, 1201?1206. 
 
Emmorey, K., McCullough, S., & Brentari, D. (2003). Categorical perception in 
American Sign Language. Language & Cognitive Processes, 18, 21-45.  
 
Emmorey, K., Mehta, S., & Grabowski, T. J. (2007). The neural correlates of sign 
versus word production. Neuroimage, 36(1), 202?208. 
 
Emmorey, K., Petrich, J., & Gollan, T. (2009). Simultaneous production of American 
Sign Language and English costs the speaker but benefits the perceiver. In Paper 
Presented at the 7th International Symposium on Bilingualism, Utrecht, The 
Netherlands. 
 
Emmorey, K., Thompson, R., & Colvin, R. (2009). Eye gaze during comprehension 
of American Sign Language by native and beginning signers. Journal of Deaf Studies 
and Deaf Dducation, 14(2), 237. 
 
Engel, A.K., Fries, P., and Singer, W. (2001). Dynamic predictions: oscillations and 
synchrony in top-down processing. Nature Reviews Neuroscience, 2, 704?716. 
 
Engle, R.W. (2002). Working Memory Capacity as Executive Attention. Current 
Directions in Psychological Science, 11(1), 19-23. 
 
Fallon, M., Trehub, S. E., & Schneider, B. A. (2000). Children?s perception of speech 
in multitalker babble. Journal of the Acoustical Society of America, 108, 3023-3029. 
 
F?nelon, V. S., Casasnovas, B., Simmers, J., & Meyrand, P. (1998). Development of 
rhythmic pattern generators. Current Opinion in Neurobiology, 8(6), 705?709. 
 
Fenlon, J., Denmark, T., Campbell, R., & Woll, B. (2008). Seeing sentence 
boundaries. Sign Language & Linguistics, 10(2), 177?200. 
 
F?ry, C. & van de Vijver, R. (2004). The syllable in optimality theory. Cambridge 
University Press. 
 
199 
Figueroa, V. (2009). Representaciones fonol?gicas en el procesamiento del lenguaje: 
modalidad de input, restricciones temporales y correlatos neurofisiol?gicos. 
Unpublished doctoral dissertation, Pontificia Universidad Cat?lica De Chile. 
 
Figueroa, V., Howard, M., Idsardi, W., & Poeppel, D. (2009). Rate and local reversal 
effects on speech comprehension. Abstract in The Neurobiology of Language 
Conference, October 2009, Chicago, IL. 
 
Fiorentino, R. (2006). Lexical Structure and the Nature of Linguistic Representations. 
Doctoral dissertation, University of Maryland, College Park. 
 
Fiorentino, R., & Poeppel, D. (2007). Compound words and structure in the lexicon. 
Language and Cognitive processes, 22(7), 953?1000. 
 
Fischer, S. D., Delhorne, L. A., & Reed, C. M. (1999). Effects of rate of presentation 
on the reception of American Sign Language. Journal of Speech, Language, and 
Hearing Research, 42(3), 568-582.  
 
Flege, J. E., MacKay, I.R.A., & Meador, D. (1999). Native Italian speakers? 
perception and production of English vowels. Journal of the Acoustical Society of 
America, 106(5), 2973-2987. 
 
Foulds, R. A. (2004). Biomechanical and perceptual constraints on the bandwidth 
requirements of sign language. IEEE Transactions on Neural Systems and 
Rehabilitation Engineering, 12(1), 65?72. 
 
Foulke, E. (1971). The perception of time compressed speech. In D. Horton & J. 
Jenkins (Eds.), Perception in language (pp.79-107). Pittsburgh, PA: Pittsburgh 
University Press. 
 
Foulke, W., & Sticht, T. G. (1969). Review of research on the intelligibility and 
comprehension of accelerated speech. Psychological Bulletin, 72(1), 50?62. 
 
French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of 
speech sounds. Journal of the Acoustical Society of America, 19, 90-119. 
 
Friedman. L. (1974). On the physical manifestation of stress in the American Sign 
Language. Unpublished manuscript, University of Calironifa, Berkeley. 
 
Fries, P., Nikolic, D., & Singer, W. (2007). The gamma cycle. Trends in 
Neurosciences, 30(7), 309?316. 
 
Furman, O., Dorfman, N., Hasson, U., Davachi, L., & Dudai, Y. (2007). They saw a 
movie: Long-term memory for an extended audiovisual narrative. Learning & 
Memory, 14(6), 457 -467. 
 
200 
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N. (1993). Darpa, 
TIMIT, Acoustic-phonetic continuous speech corpus. (NISTIR Publication No. 
4930). Washington, DC: US Department of Commerce. 
 
Gathercole, S.E. and Baddeley, A.D. (1993). Working Memory and Language. 
Erlbaum. 
 
Genzel, D., & Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of 
the association of computational linguistics (pp. 199?206), Philadelphia, PA.  
 
Ghez, C. & Krakauer, J. (2000). The organization of movement. In E.R. Kandel, J.H. 
Schwartz, T.M. Jessel (Eds.), Principles of Neuroscience. New York: McGraw-Hill. 
 
Ghitza, O., & Greenberg, S. (2009). On the possible role of brain rhythms in speech 
perception: Intelligibility of time-compressed speech with periodic and aperiodic 
insertions of silence. Phonetica, 66(1-2), 113?126. 
 
Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S. J., & 
Laufs, H. (2007). Endogenous cortical rhythms determine cerebral specialization for 
speech perception and production. Neuron, 56(6), 1127?1134. 
 
Goldin-Meadow, Susan (1993). When does gesture become language? A study of 
gesture used as a primary communication system by deaf children of hearing parents. 
In K.R. Gibson & T. Ingold (Eds.), Tools, Language and Cognition in Human 
Evolution (pp.63?85). Cambridge University Press. 
 
Green, D.M. (1971). Temporal auditory acuity. Psychological Review, 78, 540-551. 
 
Green, K. P., & Miller, J. L. (1985). On the role of visual rate information in phonetic 
perception. Perception & Psychophysics, 38(3), 269?276. 
 
Greenberg, S. (1996) Understanding speech understanding: towards a unified theory 
of speech perception. In W.A. Ainsworth & S. Greenberg (Eds.), Proceedings of the 
ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech 
Perception (pp.1-8). Keele University, UK. 
 
Greenberg, S., & Arai, T.!(2001). The relation between speech intelligibility and the 
complex modulation spectrum. In the 7th International Conference on Speech 
Communication and Technology, Scandinavia (pp. 473? 476). 
 
Greenberg, S., Hollenback, J. and Ellis, D. (1996) Insights into spoken language 
gleaned from phonetic transcription of the switchboard corpus. Proceedings of the 
International Conference on Spoken Language Processing, pp. S24-27. 
 
Grosjean, F. (1979). A study of timing in a manual and a spoken language: American 
Sign Language and English. Journal of Psycholinguistic Research, 8(4), 379 ? 405. 
201 
 
Grosjean, F. (1981). Sign and word recognition: A first comparison. Sign Language 
Studies, 32, 195-219. 
 
Hale, J. (2001). A probabilistic early parser as a psycholinguistic model. In 
Proceedings of the North American Association of Computational Linguistics. 
 
Halle, M. & Stevens, K. N. (1959). Analysis by synthesis. In W. Wathen-Dunn & 
L.E. Woods (Eds.) Proc. Seminar on Speech Compression and Processing, Vol. 2, 
paper D7. 
 
Halle, M. & Stevens, K. N. (1962). Speech recognition: a model and program for 
research. Reprinted in Halle, 2002. 
 
Halle, M. (2002). From memory to speech and back: papers on phonetics and 
phonology 1954?2002. Berlin, Germany: Mouton de Gruyter. 
 
Heiman, G. W., & Tweney, R. D. (1981). Intelligibility and comprehension of time 
compressed sign language narratives. Journal of Psycholinguistic Research, 10(1), 3?
 15. 
 
Henry, W. G. (1966). Recognition of time compressed speech as a function of word 
length and frequency of usage. Unpublished doctoral dissertation, Indiana University. 
 
Hickok, G., Bellugi, U., & Klima, E. S. (1998). The neural organization of language: 
Evidence from sign language aphasia. Trends in Cognitive Sciences, 2(4), 129?136. 
 
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. 
Nature Reviews Neuroscience, 8(5), 393?402. 
 
Holcombe, A. O. (2009). Seeing slow and seeing fast: two limits on perception. 
Trends in Cognitive Sciences, 13(5), 216?221. 
 
van der Hulst, H. (1993). Units in the analysis of signs. Phonology, 10, 109-241. 
 
Hwang, S.-O., Monahan, P. J., & Idsardi, W. J. (2010). Underspecification and 
asymmetries in voicing perception. Phonology, 27(2), 205?224. 
 
Hyde, M.B., & Power, D.J. (1991). Teachers? use of simultaneous communication: 
Effects on the signed and spoken components. American Annals of the Deaf, 136(5), 
381-387. 
 
Jackson, C. (1989). Language acquistion in two modalities: The role of nonlinguistic 
cues in linguistic mastery. Sign Language Studies, 62, 1-21. 
 
202 
Jaeger, T.F. (2010). Redundancy and reduction: Speakers manage syntactic 
information density. Cognitive Psychology, 61(1), 23?62. 
 
Jantunen, T. (2010). On the role of transitions in SL or: What?s wrong with the sign? 
Presentation given at the 10th Theoretical Issues in Sign Language Research 
Conference, Purdue University, West Lafayette, IL. 
 
Jantunen,T. & Takkinen, R. (2010). Syllable structure in sign language phonology. In 
D. Brentari (Ed.), Sign Languages (pp.312-331). Cambridge University Press. 
 
Jensen, J. K., Neff, D. L., & Callaghan, B. P. (1987). Frequency, intensity, and 
duration discrimination in young children. Asha, 29, 88. 
 
Jensen, O., & Lisman, J. E. (2005). Hippocampal sequence-encoding driven by a 
cortical multi-item working memory buffer. Trends in Neurosciences, 28(2), 67?72. 
 
Johansson, G. (1973). Visual perception of biological motion and a model for its 
analysis. Attention, Perception, & Psychophysics, 14(2), 201?211. 
 
Johnson, J. S., & Newport, E. L. (1989). Critical period effects in second language 
learning: The influence of maturational state on the acquisition of English as a second 
language. Cognitive Psychology, 21(1), 60?99. 
 
Jusczyk, P. W. (1997). The discovery of spoken language. Cambridge, MA: MIT 
Press. 
 
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word 
segmentation in English-learning infants. Cognitive Psychology, 39, 159?207. 
 
Kabak, B., & Idsardi, W. J. (2007). Perceptual distortions in the adaptation of English 
consonant clusters: Syllable structure or consonantal contact constraints? Language 
and Speech, 50(1), 23-52. 
 
Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of 
attention: The contributions of goal neglect, response competition, and task set to 
Stroop interference. Journal of Experimental Psychology: General, 132(1), 47-70. 
 
Kantor, R. (1978). Identifying native and second language signers. Communication 
and Cognition, 11, 39-55. 
 
Kimura, M., Schr?ger, E., Czigler, I., & Ohira, H. (2010). Human visual system 
automatically encodes sequential regularities of discrete events. Journal of Cognitive 
Neuroscience, 22(6), 1124?1139. 
 
Klatt, D. H. (1975). Voice onset time, frication, and aspiration in word-initial 
consonant clusters. Journal of Speech and Hearing Research, 18, 686?706. 
203 
 
Klein, W., & Dittmar, N. (1979). Developing grammars: The acquisition of German 
syntax by foreign workers (Vol. 1). Berlin: Springer. 
 
Klima, E. S., & Bellugi, U. (1979). The signs of language. Cambridge, MA: Harvard 
University Press. 
 
Klima, E. S., Tzeng, O. J. L., Bellugi, U., Corina, D., & Bettger, J. G. (1996). From 
sign to script: effects of linguistic experience on perceptual categorization (Tech. 
Rep. No. INC-9604). Institute for Neural Computation, University of California, San 
Diego. 
 
Kohlrausch, A., P?schel, D., & Alphei, H. (1992). Temporal resolution and 
modulation analysis in models of the auditory system. In M.E.H. Schouten (Ed.) The 
Auditory Processing of Speech: From Sounds to Words (pp.85?98). Berlin/New 
York: Mouton de Gruyter. 
 
Korte, A. (1915) Kinematoskopische Untersuchungen. Zeitschrift fuer Psychologie, 
72, 194-296. 
 
Krentz, U. C., & Corina, D. P. (2008). Preference for language in early infancy: The 
human language bias is not speech specific. Developmental Science, 11(1), 1?9. 
 
Kroll, J. F., Bobb, S. C., & Wodnieka, Z. (2006). Language selectivity is the 
exception, not the rule: Arguments against a fixed locus of language selection in 
bilingual speech. Bilingualism: Language and Cognition, 9, 119?135. 
 
Kuhl, P. K., Tsao, F. M., & Liu, H. M. (2003). Foreign-language experience in 
infancy: Effects of short-term exposure and social interaction on phonetic learning. 
Proceedings of the National Academy of Sciences, 100(15), 9096-9101. 
 
Kurtzrock, G.H. (1957). The effects of time and frequency distortion upon word 
intelligibility. Speech Monographs, 24, 94. 
 
Kushalnagar, P., Hannay, H. J., & Hernandez, A. E. (2010). Bilingualism and 
Attention: A Study of Balanced and Unbalanced Bilingual Deaf Users of American 
Sign Language and English. Journal of Deaf Studies and Deaf Education, 15(3), 263-
 273. 
 
Ladefoged, P. (2005). Vowels and consonants: An introduction to the sounds of 
languages (Vol. 1). Wiley-Blackwell Publishing. 
 
Lahiri, A. & Reetz, H.(2002). Underspecified recognition. In C. Gussenhoven, N. 
Werner, & T. Rietveld (Eds.) Laboratory Phonology 7 (pp.637-676). Berlin: Mouton 
de Gruyter.  
 
204 
Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. 
(2005). An oscillatory hierarchy controlling neuronal excitability and stimulus 
processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904-1911. 
 
van de Laar, V., Kleijn, W. B., & Deprettere, E. (1997). Perceptual entropy rate 
estimates for the phonemes of American English. IEEE International Conference on 
Acoustics, Speech, and Signal Processing, 3, 1719?1722. 
 
Lehtonen, M., Monahan, P. J., & Poeppel, D. (2011). Evidence for Early 
Morphological Decomposition: Combining Masked Priming with 
Magnetoencephalography. Journal of Cognitive Neuroscience, (Early Access), 1?14. 
 
Levitt, A., & Wang, Q. (1991). Evidence for language-specific rhythmic influences in 
the reduplicative babbling of French- and English-learning infants. Language and 
Speech, 34(3), 235?249. 
 
Levy, R., & Jaeger, T. F. (2007). Speakers optimize information density through 
syntactic reduction. In B. Schl?kopf, J. Platt, & T. Hoffman (Eds.) Advances in 
neural information processing systems (NIPS), Vol. 19 (pp. 849?856). Cambridge, 
MA: MIT Press. 
 
Liberman, A. M., & Whalen, D. H. (2000). On the relation of speech to language. 
Trends in Cognitive Sciences, 4(5), 187?196. 
 
Liberman, A.M. (1996). Speech: A special code. Cambridge, MA: MIT Press. 
 
Liddell, S.K. (1978). Non-manual signs and relative clauses in American Sign 
Language. In P. Siple (Ed.) Understanding language through sign language research 
(pp.59-90). New York: Academic Press. 
 
Liddell, S.K. (1984). THINK and BELIEVE: Sequentiality in American Sign 
Language. Language, 60, 372-392. 
 
Liddell, S.K. (1990). Structures for representing handshape and local movement at 
the phonemic level. In S. Fischer & P. Siple (Eds.) Theoretical Issues in Sign 
Language Research (pp.37-65). Chicago:Chicago University Press. 
 
Liddell, S. K. (2000).  Blended spaces and deixis in sign language discourse. In D. 
McNeil (Ed.), Language and gesture (pp.331-357). Cambridge University Press. 
 
Liddell, S.K. (2003). Sources of meaning in ASL classifier predicates. In K. 
Emmorey (Ed.) Perspectives on classifier constructions in sign language (pp.199-
 220). Mahwah, NJ: Lawrence Erlbaum Associates. 
 
205 
Liddell, S.K., & Johnson, R.E. (1986). American Sign Language compound 
formation processes, lexicalization, and phonological remnants. Natural Language 
and Linguistic Theory, 8, 445-513. 
 
Liddell, S.K. & Johnson, R. (1989). American Sign Language: the phonological base. 
Sign Language Studies, 64, 195-277. 
 
Lillo-Martin, D. (1991). Universal Grammar and American Sign Language: Setting 
the Null Argument Parameters. Studies in Theoretical Psycholinguistics. Dordrecht: 
Kluwer. 
 
Lillo-Martin, D. (1999). Modality effects and modularity in language acquisition: the 
acquisition of American Sign Language. In W.C. Ritchie & T. K. Bhatia (Eds.) 
Handbook of Language Acquisition (pp.531-567). San Diego, CA: Academic Press. 
 
Lisker, L. (1975). Is it VOT or a first-formant transition detector. Journal of the 
Acoustical Society of America, 57(6), 1547?1551. 
 
Lisker, L., Abramson, A. S. (1964). A cross-language study of voicing in initial stops: 
Acoustical measurements. Word (20) 384?422. 
 
Locke, J. L. (1983). Phonological acquisition and change. New York: Academic 
Press. 
 
Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably 
discriminate speech in human auditory cortex. Neuron, 54(6), 1001-1010. 
 
MacNeilage, P. F. (1998). The frame/content theory of evolution of speech 
production. Behavioral and Brain Sciences, 21(4), 499?546.  
 
MacNeilage, P. F., & Davis, B. L. (2001). Motor mechanisms in speech ontogeny: 
phylogenetic, neurobiological and linguistic implications. Current Opinion in 
Neurobiology, 11(6), 696?700. 
 
MacWhinney, B. (2006). Emergent fossilization. In Z. Han & T. Odlin (Eds.), Studies 
of fossilization in second language acquisition (pp. 134-156). Clevedon, UK: 
Multilingual Matters. 
 
Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newborns? cry 
melody is shaped by their native language. Current Biology, 19(23), 1994?1997. 
 
Manin, D. (2006). Experiments on predictability of word in context and information 
rate in natural language. Journal of Information Processes, 6(3), 229?236. 
 
206 
Marian, V., & Spivey, M. (2003). Competing activation in bilingual language 
processing: Within-and between-language competition. Bilingualism: Language and 
Cognition, 6(2), 97?116. 
 
Marmor, G. S. & Petitto, L. A. (1979). Simultaneous communication in the 
classroom: How well is English grammar represented? Sign Language Studies, 3, 99-
 136. 
 
Marr, D. (1982). Vision. San Francisco, CA: Freeman. 
 
Masataka, N. (1992). Motherese in a signed language. Infant Behavior and 
Development, 15(4), 453?460. 
 
Masataka, N. (2003). The onset of language. Cambridge University Press. 
 
Massaro, D. W., Cohen, M. M., & Smeele, P. M. (1996). Perception of asynchronous 
and conflicting visual and auditory speech. Journal of the Acoustical Society of 
America, 100, 1777?1786. 
 
Mathur, G., & Rathmann, C. (2011). Two types of nonconcatenative morphology in 
signed languages. In G. Mathur & D.J. Napoli (Eds.), Deaf Around the World (pp.54-
 82). Oxford University Press. 
 
Mayberry, R. I. (1993). First-Language acquisition after childhood differs from 
second-language acquisition: The case of American Sign Language. Journal of 
Speech and Hearing Research, 36, 51-68. 
 
Mayberry, R. I. (2007). When timing is everything: Age of first-language acquisition 
effects on second-language learning. Applied Psycholinguistics, 28(3), 537?549. 
 
Mayberry, R. I., & Eichen, E. (1991). The long-lasting advantage of learning sign 
language in childhood: Another look at the critical period for language acquisition. 
Journal of Memory and Language, 30, 486-512. 
 
Mayberry, R. I., & Fischer, S. D. (1989). Looking through phonological shape to 
lexical meaning: The bottleneck of non-native sign language processing. Memory & 
Cognition, 17(6), 740?754. 
 
Mayberry, R. I., & Lock, E. (2003). Age constraints on first versus second language 
acquisition: Evidence for linguistic plasticity and epigenesis. Brain and Language, 
87, 369-383. 
 
Mayberry, R. I., del Giudice, A. A., & Lieberman, A. M. (2011). Reading 
achievement in relation to phonological coding and awareness in deaf readers: A 
meta-analysis. Journal of Deaf Studies and Deaf Education, 16(2), 164-188. 
 
207 
Mayo, L. H., Florentine, M., & Buus, S. (1997). Age of second-language acquisition 
and perception of speech in noise. Journal of Speech, Language, and Hearing 
Research, 40(3), 686-693. 
 
McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-
 visual speech recognition by normal-hearing adults. Journal of the Acoustical Society 
of America, 77, 678-685. 
 
McGurk, H., & McDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 
746?747. 
 
Meador, D., Flege, J. E., & Mackay I. R. A. (2000). Factors affecting the recognition 
of words in a second language. Bilingualism: Language and Cognition, 3, 55?67. 
 
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., & Amiel-Tison, C. 
(1988). A precursor of language acquisition in young infants. Cognition, 29(2), 143?
 178. 
 
Meier, R. P. (1987). Elicited imitation of verb agreement in American Sign 
Language: Iconically or morphologically determined? Journal of Memory and 
Language, 26(3), 362?376. 
 
Meier, R.P. (2002). Why different, why the same? Explaining effects and non-effects 
of modality upon linguistic structure in sign and speech. In R.P. Meier, K. Cormier, & 
D., Quinto-Pozos (Eds.) Modality and structure in signed and spoken languages 
(pp.1-25). Cambridge University Press. 
 
Meier. R. P. (2006). The form of early signs: Explaining signing children?s 
articulatory development. In M. Marschark, B. Schick, & P. Spencer (Eds.), Advances 
in sign language development by deaf children (pp.202-230). Oxford University 
Press. 
 
Meier, R.P. (2008). Channeling language: Review of Wendy Sandler & Diane Lillo-
 Martin (2006). Natural Language and Linguistic Theory, 26, 451-466. 
 
Meier, R.P., Cormier, K., Quinto-Pozos, D. (2002). Modality and structure in signed 
and spoken languages. Cambridge University Press. 
 
Meier, R. P., & Newport, E. L. (1990). Out of the hands of babes: On a possible sign 
advantage in language acquisition. Language, 66, 1-23. 
 
Meier, R. P., & Willerman, R. (1995). Prelinguistic gesture in deaf and hearing 
infants. In K. Emmorey & J. Reilly (Eds.), Language, gesture and space (pp.391?
 409. Hillsdale: Lawrence Erlbaum Associates. 
 
208 
Merigan, W. H., & Maunsell, J. H. R. (1993). How parallel are the primate visual 
pathways? Annual Review of Neuroscience, 16(1), 369?402. 
 
Miller, G. A. (1951). Language and communication. New York: McGraw-Hill. 
 
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on 
our capacity for processing information. Psychological Review, 63, 81-97. 
 
Miller, G. A., & Isard, S. (1963). Some perceptual consequences of linguistic rules. 
Journal of Verbal Learning and Verbal Behavior, 2(3), 217?228. 
 
Miller, G. A., & Licklider, J. C. R. (1950). The intelligibility of speech. Journal of the 
Acoustical Society of America, 22, 167?173. 
 
Miller, G. A., Heise, G. A., & Lichten, W. (1951). The intelligibility of speech as a 
function of the context of the test materials. Journal of experimental Psychology, 
41(5), 329-335. 
 
Miller, J. L., & Liberman, A. M. (1979). Some effects of later-occurring information 
on the perception of stop consonant and semivowel. Attention, Perception, & 
Psychophysics, 25(6), 457?465. 
 
Mills, J. H. (1975). Noise and children: A review of literature.  Journal of the 
Acoustical Society of America, 58, 767-779. 
 
Milner, B. (1971). Interhemispheric differences in the localizationof psychological 
processes in man. British Medical Bulletin, 27, 272-277. 
 
Mirus, G., Rathmann, C., & Meier, R. (2001). Proximalization and distalization of 
sign movement in adult learners. In V. Dively, M. Metzger, S. Taub, & A.M. Baer 
(Eds). Signed languages: Discoveries from international research (pp.103-119). 
Washington, DC: Gallaudet University Press. 
 
Mitchell, R.E. and Karchmer, M.A. (2002). Chasing the mythical ten percent: 
parental hearing status of deaf and hard of hearing students in the United States. Sign 
Language Studies, 4, 128?163. 
 
Morford, J. P., & MacFarlane, J. (2003). Frequency Characteristics of American Sign 
Language. Sign Language Studies, 3(2), 213?25. 
 
Morford, J. P., Wilkinson, E., Villwock, A., Pi?ar, P., & Kroll, J. F. (2010). When 
deaf signers read English: Do written words activate their sign translations? 
Cognition, 118(2), 286-292. 
 
Morford, J., & Mayberry, R. (2000). A reexamination of ?Early Exposure? and its 
implications for language acquisition by eye. In C. Chamberlain, J. Morford, & R. 
209 
Mayberry (Eds.), Language acquisition by eye (pp.111-128). Mahwah, NJ: Lawrence 
Erlbaum. 
 
M?ller, M. M., Gruber, T. & Keil, A. (2001). Modulation of induced gamma band 
activity in the human EEG by attention and visual information processing. 
International Journal of Psychophysiology, 38, 283?299.  
 
Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on 
the McGurk effect. Attention, Perception, & Psychophysics, 58(3), 351?362. 
 
Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. 
(2004). Visual prosody and speech intelligibility. Psychological Science, 15(2), 133-
 137. 
 
N??t?nen, R. (1992). Attention and Brain Function. Hillsdale, NJ: Lawrence Erlbaum 
Associates Publishers. 
 
Nabelek, A. (1988). Identification of vowels in quiet, noise, and reverberation: 
Relationships with age and hearing loss. Journal of the Acoustical Society of America, 
84, 476?484. 
 
Napoli, D. J., & Sutton-Spence, R. (2010). Limitations on simultaneity in sign 
language. Language, 86(3), 647-662. 
 
Nespor, M., & Sandler, W. (1999). Prosody in Israeli sign language. Language and 
Speech, 42, 143?176. 
 
Nespor, M., & Vogel, I. (1986). Prosodic Phonology. Dordrecht: Foris. 
 
Neville, H. J., Mills, D. L., & Lawson, D. S. (1992). Fractionating language: 
Different neural subsystems with different sensitive periods. Cerebral Cortex, 2(3), 
244-258. 
 
Newport, E. (1990). Maturational constraints on language learning. Cognitive 
Science, 14, 11-28. 
 
Newport, E. L., & Meier, R. P. (1985). The acquisition of American Sign Language. 
Hillsdale, NJ: Lawrence Erlbaum Associates. 
 
Nilsson, M., Soli, S. D., & Sullivan, J. A. (1994). Development of the Hearing in 
Noise Test for the measurement of speech reception thresholds in quiet and in noise. 
Journal of the Acoustical Society of America, 95, 1085-1099. 
 
Nittrouer, S., & Boothroyd, A. (1990). Context effects in phoneme and word 
recognition by young children and older adults. Journal of the Acoustical Society of 
America, 87, 2705-2715. 
210 
 
Oller, D. K., & Eilers, R. E. (1988). The role of audition in infant babbling. Child 
Development, 59(2), 441?449. 
 
Oller, K., Wieman, L., Dole, W., & Ross, C. (1976). Infant babbling and speech. 
Journal of Child Language, 3, 1-12. 
 
Olsho, L. W., Schoon, C., Sakai, R., Turpin, R., & Sperduto, V. (1982). Auditory 
frequency discrimination in infancy. Developmental Psychology, 18(5), 721?726. 
 
Ostry, D. J., & Munhall, K. G. (1985). Control of rate and duration of speech 
movements. Journal of the Acoustical Society of America, 77(2), 640?648. 
 
Oyama, S. (1976). A sensitive period in the acquisition of a non-native phonological 
system. Journal of Psycholinguistic Research, 5, 261-285. 
 
Padden, C.A. (1988). Interaction of morphology and syntax in American Sign 
Language. New York, NY: Garland. 
 
Padden, C.A. (1991). The acquisition of fingerspelling by deaf children. In P. Siple & 
S. Fischer (Eds.), Theoretical issues in sign language research (pp.191-210). 
Chicago, IL: University of Chicago Press. 
 
Padden, C. A. (2000). Simultaneous interpreting across modalities. Interpreting, 5(2), 
171?187. 
 
Padden, C.A., & LeMaster, B. (1985). An alphabet on hand: The acquisition of 
fingerspelling in deaf children. Sign Language Studies, 47, 161-172. 
 
Padden, C. A., & Perlmutter, D. M. (1987). American Sign Language and the 
architecture of phonological theory. Natural Language & Linguistic Theory, 5(3), 
335?375. 
 
Pandey, P. C., Kunov, H., & Abel, S. M. (1986). Disruptive effects of auditory signal 
delay on speech perception with lipreading. Journal of Auditory Research, 26(1), 27-
 41. 
 
Parasnis, I., Samar, V. J., Bettger, J. G., & Sathe, K. (1996). Does Deafness Lead to 
Enhancement of Visual Spatial Cognition in Children? Journal of Deaf Studies and 
Deaf Education, 1(2), 145-152. 
 
Pearson, D. E. (1981). Visual communication systems for the deaf. IEEE 
Transactions on Communications, 29, 1986-1992. 
 
Perlmutter, D. M. (1990). On the segmental representation of transitional and 
bidirectional movements in ASL phonology. In S. Fischer & P. Siple (Eds.) 
211 
Theoretical Issues in Sign Language Research (pp.67-80). Chicago: Chicago 
University Press. 
 
Perlmutter, D. M. (1992). Sonority and syllable structure in American Sign Language. 
Linguistic Inquiry, 23(3), 407?442. 
 
Perrett, D. I., Rolls, E. T., & Caan, W. (1982). Visual neurons responsive to faces in 
the monkey temporal cortex. Experimental Brain Research, 47(3), 329?342. 
 
Petitto, L. A. (1987). On the autonomy of language and gesture: Evidence from the 
acquisition of personal pronouns in American sign language. Cognition, 27(1), 1?52. 
 
Petitto, L. A., & Marentette, P. F. (1991). Babbling in the manual mode: Evidence for 
the ontogeny of language. Science, 251, 1493?1496. 
 
Petitto, L. A., Katerelos, M., Levy, B. G., Gauna, K., T?treault, K., & Ferraro, V. 
(2001). Bilingual signed and spoken language acquisition from birth: Implications for 
the mechanisms underlying early bilingual language acquisition. Journal of Child 
Language, 28(2), 453?496. 
 
Petitto, L. A., Zatorre, R. J., Gauna, K., Nikelski, E. J., Dostie, D., & Evans, A. C. 
(2000). Speech-like cerebral activity in profoundly deaf people processing signed 
languages: implications for the neural basis of human language. Proceedings of the 
National Academy of Sciences, 97(25), 13961-13966. 
 
Petitto, L., Holowka, S., Sergio, L., & Ostry, D. (2001). Language rhythms in baby 
hand movements. Nature, 413, 35. 
 
Petitto, L., Holowka, S., Sergio, L., Levy, B., & Ostry, D. (2004). Baby hands that 
move to the rhythm of language: Hearing babies acquiring sign languages babble 
silently on the hands. Cognition, 93, 43?73. 
 
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard 
of hearing I: Intelligibility differences between clear and conversational speech. 
Journal of Speech and Hearing Research, 28(1), 96-103. 
 
Poeppel, D. (2003). The analysis of speech in different temporal integration windows: 
cerebral lateralization as ?asymmetric sampling in time?. Speech Communication, 41, 
245?255. 
 
Poeppel, D., Idsardi, W. J., van Wassenhove, V. (2008). Speech perception at the 
interface of neurobiology and linguistics. Philosophical Transactions of the Royal 
Society London B, 363,1071-86. 
 
P?ppel, E. (1997). A hierarchical model of temporal perception. Trends in Cognitive 
Sciences, 1(2), 56?61. 
212 
 
Portnoff, M. (1981). Time-scale modification of speech based on short-time Fourier 
analysis. IEEE Transactions on Acoustics, Speech and Signal Processing, 29(3), 374?
 390. 
 
Pyers, J. E., & Emmorey, K. (2008). The Face of Bimodal Bilingualism. 
Psychological Science, 19(6), 531-535. 
 
Quadros, R.M., Lillo-Martin, D., & Chen Pichler, D. (2010). Two languages but one 
computation: Code-blending in bimodal bilingual development. Presentation given at 
the 10th Theoretical Issues in Sign Language Research Conference, Purdue 
University, West Lafayette, IL. 
 
Quinto-Pozos, D. (2010). Rates of fingerspelling in American Sign Language. Poster 
given at the 10th Theoretical Issues in Sign Language Research Conference, Purdue 
University, West Lafayette, IL. 
 
R Development Core Team. (2005). R: a language and environment for statistical 
computing. Vienna: R Foundation for Statistical Computing. Available at http:// 
www.r-project.org. 
 
Ramsey, C. (1989). Language planning in deaf education.  In C. Lucas (Ed.), The 
sociolinguistics of the deaf community (pp.123-146). San Diego, CA: Academic 
Press. 
 
Rathmann, C., & Mathur, G., (2010). Two types of nonconcatenative morphology in 
signed languages. Presentation given at the 10th Theoretical Issues in Sign Language 
Research Conference, Purdue University, West Lafayette, IL. 
 
Rayner, K. (1998). Eye movements in reading and information processing: 20 years 
of research. Psychological Bulletin, 124(3), 372-422. 
 
Reed, C. M., & Durlach, N. I. (1998). Note on information transfer rates in human 
communication. Presence, 7(5), 509?518. 
 
Rogers, C. L., Lister, J. J., Febo, D. M., Besing, J. M., & Abrams, H. B. (2006). 
Effects of bilingualism, noise, and reverberation on speech perception by listeners 
with normal hearing. Applied Psycholinguistics, 27(3), 465-485. 
 
Rosen, R. (2004). Beginning L2 production errors in ASL lexical phonology. Sign 
Language Studies, 7, 31-61. 
 
Rosen, S., 1992. Temporal information is speech: acoustic, auditory, and linguistic 
aspects. Philosophical Transactions of the Royal Society B, 336, 367?373. 
 
213 
Rosenzweig, M. R., & Postman, L. (1957). Intelligibility as a function of frequency of 
usage. Journal of Experimental Psychology, 54(6), 412-422. 
 
Ross, J. R. (1967). Constraints on Variables in Syntax. Doctoral dissertation, MIT. 
 
Saberi, K., & Perrott, D. R. (1999). Cognitive restoration of reversed speech. Nature, 
398, 6730. 
 
Saltzman, E., & Byrd, D. (2000). Task-dynamics of gestural timing: Phase windows 
and multifrequency rhythms. Human Movement Science, 19(4), 499?526. 
 
Sandler, W., & Lillo-Martin, D. (2006). Sign language and linguistic universals. 
Cambridge University Press. 
 
Schroeder, C. E., & Lakatos, P. (2009a). Low-frequency neuronal oscillations as 
instruments of sensory selection. Trends in Neurosciences, 32(1), 9?18. 
 
Schroeder, C. E., & Lakatos, P. (2009b). The gamma oscillation: master or slave? 
Brain Topography, 22(1), 24?26. 
 
Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal 
oscillations and visual amplification of speech. Trends in Cognitive Sciences, 12(3), 
106?113. 
 
Selkirk, E. O. (1986). Phonology and syntax: The relation between sound and 
structure. Cambridge, MA: MIT Press. 
 
Senghas, A., & Coppola, M. (2001). Children creating language: How Nicaraguan 
Sign Language acquired a spatial grammar. Psychological Science, 12(4), 323?328. 
 
Senghas, A., Kita, S., & ?zy?rek, A. (2004). Children creating core properties of 
language: Evidence from an emerging sign language in Nicaragua.  Science, 
305(5691), 1779-1782. 
 
Shannon, C.E. (1948). A mathematical theory of communication. Bell System 
Technical Journal, 27(10), 623?656. 
 
Shannon, C.E. (1951). Prediction and entropy of printed English. Bell System 
Technical Journal, 30(1), 50?64. 
 
Shannon, R.V., Zeng, F.G., Kamath, V., Wygonski, J., and Ekelid, M. (1995). Speech 
recognition with primarily temporal cues. Science , 270, 303?304. 
 
Shiffrar, M., & Freyd, J. J. (1990). Apparent motion of the human body. 
Psychological Science, 1(4), 257-264. 
 
214 
Singer, W., & Gray, C. M. (1995). Visual feature integration and the temporal 
correlation hypothesis. Annual Review of Neuroscience, 18(1), 555?586. 
 
Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal 
dichotomies in auditory perception. Nature, 416(6876), 87-90. 
 
Stevens, K. N. & Halle, M. (1967). Remarks on analysis by synthesis and distinctive 
features. In W. Wathen-Dunn (Ed.) Models for the perception of speech and visual 
form (pp.88-102). Cambridge, MA: MIT Press. 
 
Stevens, K. N. (2002). Toward a model for lexical access based on acoustic 
landmarks and distinctive features. Journal of the Acoustical Society of America, 
111(4), 1872-1891. 
 
Stilp, C. E., Kiefte, M., Alexander, J. M., & Kluender, K. R. (2010). Cochlea-scaled 
spectral entropy predicts rate-invariant intelligibility of temporally distorted 
sentences. Journal of the Acoustical Society of America, 128(4), 2112-2126. 
 
Stokoe, W. C. (1960). Sign language structure: An outline of the visual 
communication systems of the American Deaf. Studies in linguistics, occasional 
papers 8. Silver Spring, MD: Linstok Press. 
 
Sperling, G., Landy, M. S., Cohen, Y ., & Pavel, M. (1985). Intelligible encoding of 
ASL image sequences at extremely low information rates. Computer Vision, 
Graphics, and Image Processing, 31, 335-391. 
 
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in 
noise. Journal of the Acoustical Society of America, 26, 212-215. 
 
Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic 
perception. Journal of Experimental Psychology: Human Perception and 
Performance, 7(5), 1074?1095. 
 
Supalla, S. J. (1991). Manually Coded English: The modality question in signed 
language development. In P. Siple & S.D. Fischer (Eds.), Theoretical issues in sign 
language research (pp. 85-109). Chicago: University of Chicago Press. 
 
Supalla, S. J., & McKee, C. (2002). The role of Manually Coded English in language 
development of deaf children. In R. Meier, K. Cormier, & D. Quinto-Pozos (Eds.) 
Modality and Structure in Signed and Spoken Languages (pp.143-165). Oxford 
University Press. 
 
Supalla, T. R. (1982). Structure and acquisition of verbs of motion and location in 
American Sign Language. Doctoral dissertation, University of California, San Diego. 
 
215 
Supalla, T.R., & Newport, E. (1978). How many seats in a chair? The derivation of 
nouns and verbs in American Sign Language. In P. Siple (Ed.) Understanding 
language through sing language research (pp.181-214). New York: Academic Press. 
 
Tallal, P., Miller, S. and Fitch, R.H. (1993) Neurobiological basis of speech: a case 
for the pre-eminence of temporal processing. Annals of the New York Academy of 
Sciences, 682, 27?47. 
 
Tartter, V. C., & Knowlton, K. C. (1981). Perception of sign language from an array 
of 27 moving spots. Nature, 298, 676-678. 
 
Theunissen, F., & Miller, J. P. (1995). Temporal encoding in nervous systems: a 
rigorous definition. Journal of Computational Neuroscience, 2(2), 149?162. 
 
Tweney, R. D., Heiman, G. W., & Hoemann, H. W. (1977). Psychological processing 
of sign language: Effects of visual disruption on sign intelligibility. Journal of 
Experimental Psychology: General, 106(3), 255-268. 
 
Van Rullen, R., & Koch, C. (2003). Is perception discrete or continuous? Trends in 
Cognitive Sciences, 7(5), 207?213. 
 
Viemeister, N. F., & Wakefield, G. H. (1991). Temporal integration and multiple 
looks. Journal of the Acoustical Society of America, 90, 858-865. 
 
Vihman, M. M. (1996). Phonological development: The origins of language in the 
child. Wiley-Blackwell. 
 
Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures 
reveal individual differences in controlling access to working memory. Nature, 
438(7067), 500?503. 
 
Wallace, A. B., & Blumstein, S. E. (2009). Temporal integration in vowel perception. 
Journal of the Acoustical Society of America, 125, 1704-1711. 
 
Wang, Y., Trezek, B. J., Luckner, J., & Paul, P. V. (2008). The role of phonology and 
phonologically related skills in reading instruction for students who are deaf or hard 
of hearing. American Annals of the Deaf, 153(4), 396?407. 
 
Wanner, E. & Gleitman, L.R. (1982). Language acquisition: the state of the art. 
Cambridge University Press. 
 
Warren, R.M. (1999). Auditory Perception. Cambridge University Press. 
 
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of 
integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598?607. 
 
216 
Werker, J. F., & McLeod, P. J. (1989). Infant preference for both male and female 
infant-directed talk: A developmental study of attentional and affective 
responsiveness. Canadian Journal of Psychology/Revue canadienne de psychologie, 
43(2), 230-246. 
 
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for 
perceptual reorganization during the first year of life. Infant Behavior and 
Development, 7(1), 49?63. 
 
Werker, J. F., & Tees, R. C. (2005). Speech perception as a window for 
understanding plasticity and commitment in language systems of the brain. 
Developmental Psychobiology, 46(3), 233?251. 
 
Werker, J. F., Gilbert, J. H., Humphrey, K., & Tees, R. C. (1981). Developmental 
aspects of cross-language speech perception. Child Development, 52, 349?355. 
Wertheimer, M. (1912). Experimentelle stuidien uber das Schen von Beuegung. 
Zeitschrift fuer Psychologie (61) 161-265.  
 
Wightman, F., Allen, P., Dolan, T., Kistler, D., & Jamieson, D. (1989). Temporal 
resolution in children. Child Development, 611?624. 
 
Wilbur, R. B. & Nolen, S. B. (1986) Duration of syllables in ASL. Language & 
Speech,  29(3), 263- 280. 
 
Wilbur, R. B. (1999). Stress in ASL: Empirical evidence and linguistic issues. 
Language & Speech, 42, 229?250. 
 
Wilbur, R. B., & Allen, G. D. (1991). Perceptual evidence against internal structure in 
American Sign Language syllables. Language and speech, 34(1), 27-46. 
 
Wilbur, R. B., & Petersen, L. (1998). Modality interactions of speech and signing in 
simultaneous communication. Journal of Speech, Language, and Hearing Research, 
41(1), 200-212. 
 
Wilbur, R. B., & Zelaznik, H. N. (1997). Kinematic correlates of stress and position 
in ASL. Paper presented at The Annual Meeting of the Linguistic Society of America, 
Chicago, IL. 
 
Wilbur, R.B. (1986). Why syllables? An examination of what the notion means for 
ASL research. Oral paper presented at the Conference on Theoretical Issues in Sign 
Language Research, Rochester, NY. 
 
Wilbur, R.B. (2000). Phonological and prosodic layering of non-manuals in American 
Sign Language. In K. Emmorey & H. Lane (Eds.), The signs of language revisted: An 
anthology to honor Ursula Bellugi and Edward Klima (pp.215-243). Mahwah, NJ: 
Lawrence Erlbaum Associates. 
217 
 
Wilbur, R.B. (2009). Effects of varying rate of signing on ASL manual signs and 
nonmanual markers. Language and Speech, 52, 245-285. 
 
Wilcox, S. (1992). The phonetics of fingerspelling. Philadelphia: John Benjamins. 
 
Wilson, M. (2001). The impact of sign language expertise on perceived path of 
apparent motion. In M.D. Clark & M. Marschark (Eds.), Context, Cognition, and 
Deafness (pp.38-48). Washington, DC: Gallaudet University Press. 
 
Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble 
memories during sleep. Science, 265(5172), 676-679. 
 
Wilson, M., & Emmorey, K. (1997). A visuospatial ?phonological loop? in working 
memory: Evidence from American Sign Language. Memory and Cognition, 25, 313-
 320. 
 
Wilson, M., Bettger, J. G., Niculae, I., & Klima, E. S. (1997). Modality of language 
shapes working memory: Evidence from digit span and spatial span in ASL signers. 
Journal of Deaf Studies and Deaf Education, 2,150-160. 
 
Wingfield, A., Lombardi, L., & Sokol, S. (1984). Prosodic features and the 
intelligibility of accelerated speech: Syntactic versus periodic segmentation. Journal 
of Speech and Hearing Research, 27(1), 128-134. 
 
Woll, B. (2001). The sign that dares to speak its name: Echo phonology in British 
Sign Language. In P. Boyes Braem & R. Sutton-Spence (Eds.), The Hands are the 
Head of the Mouth (pp.87-90). Hamburg, Germany: Signum. 
 
Yabe, H., Tervaniemi, M., Sinkkonen, J., Huotilainen, M., Ilmoniemi, R. J., & 
N??t?nen, R. (1998). Temporal window of integration of auditory information in the 
human brain. Psychophysiology, 35(5), 615?619. 
 
Yost, W. A., Popper, A. N., & Fay, R. R. (1993). Human psychophysics. Springer. 
Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: analysis by synthesis? 
Trends in Cognitive Sciences, 10(7), 301?308. 
 
Zampini, M., Guest, S., Shore, D. I., & Spence, C. (2005). Audio-visual simultaneity 
judgments. Attention, Perception, & Psychophysics, 67(3), 531?544. 
 
Zeng, F.G., Nie, K., Stickney, G.S., Kong, Y.Y., Vongphoe, M., Bhargave, A., Wei, 
C.G., and Cao, K. (2005). Speech recognition with amplitude and frequency 
modulations. Proceedings from the National Academy of Sciences, 102, 2293?2298. 
 
Zipf, G. K. (1935). The psycho-biology of language. Oxford, England: Houghton, 
Mifflin.