ABSTRACT 
 
 
 
 
Title of Dissertation: CROSS-LINGUISTIC DIFFERENCES IN THE 
LEARNING OF INFLECTIONAL 
MORPHOLOGY: EFFECTS OF TARGET 
LANGUAGE PARADIGM COMPLEXITY  
 
  
 Ekaterina Solovyeva, Doctor of Philosophy, 
2020 
  
Dissertation directed by: Professor Robert M. DeKeyser, Second 
Language Acquisition program 
 
 
Inflectional morphology poses significant difficulty to learners of foreign languages. 
Multiple approaches have attempted to explain it through one of two lenses. First, 
inflection has been viewed as one manifestation of syntactic knowledge; its learning has 
been related to the learning of syntactic structures. Second, the perceptual and semantic 
properties of the morphemes themselves have been invoked as a cause of difficulty. 
These groups of accounts presuppose different amounts of abstract knowledge and quite 
different learning mechanisms. On syntactic accounts, learners possess elaborate 
architectures of syntactic projections that they use to analyze linguistic input. They do not 
simply learn morphemes as discrete units in a list?instead, they learn the configurations 
of feature settings that these morphemes express. On general-cognitive accounts, learners 
do learn morphemes as units?each with non-zero difficulty and more or less 
independent of the others. The ?more? there is to learn, the worse off the learner. 
This dissertation paves the way towards integrating the two types of accounts by testing 
them on cross-linguistic data. This study compares learning rates for languages whose 
inflectional systems vary in complexity (as reflected in the number of distinct inflectional 
endings)?German (lowest), Italian (high), and Czech (high, coupled with morpholexical 
variation). Written learner productions were examined for the accuracy of verbal 
inflection on dimensions ranging from morphosyntactic (uninflected forms, non-finite 
forms, use of finite instead of non-finite forms) to morpholexical (errors in root 
processes, application of wrong verb class templates, or wrong phonemic composition of 
the root or ending). Error frequencies were modeled using Poisson regression. 
Complexity affected accuracy differently in different domains of inflection production. 
Inflectional paradigm complexity was facilitative for learning to supply inflection, and 
learners of Italian and Czech were not disadvantaged compared to learners of German, 
despite their paradigms having more distinct elements. However, the complexity of verb 
class systems and the opacity of morphophonological alternations did result in 
disadvantages. Learners of Czech misapplied inflectional patterns associated with verb 
classes more than learners of German; they also failed to recall the correct segments 
associated with inflections, which resulted in more frequent use of inexistent forms.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CROSS-LINGUISTIC DIFFERENCES IN THE LEARNING OF INFLECTIONAL 
MORPHOLOGY: EFFECTS OF TARGET LANGUAGE PARADIGM 
COMPLEXITY 
 
 
 
 
by 
 
 
Ekaterina Solovyeva 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the 
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
2020 
 
 
 
 
 
 
 
 
Advisory Committee: 
Professor Robert M. DeKeyser, Chair 
Professor Steven J. Ross 
Dr. Polly O?Rourke 
Dr. Amir Zeldes 
Dean?s Representative: Professor Ralph Bauer 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
Ekaterina Solovyeva 
2020 
 
 
ii 
 
Acknowledgements 
Clich?s are clich?s for a reason. It does take a village, and mine extends in space and 
time. I have a lot of thanks to give: 
I thank my advisor, Robert DeKeyser, for the gifts of close reading, intellectual 
engagement, and generously letting me explore and speculate. 
Steven Ross, Polly O?Rourke, Amir Zeldes, and Ralph Bauer for insightful questions 
and stimulating thoughts about the meaning of the data and its limitations. 
The support provided by the dissertation grant from the National Foreign Modern 
Language Teachers Association jointly with the National Council of Less Commonly 
Taught Languages. 
Countless students of German, Italian, and Czech who?by choice or out of 
necessity?braved the journey of language learning and participated in the data collection 
that served as the basis of the corpus used in my study. 
At my first academic home in the US, the University of Northern Iowa: Ardith 
Meier?for modeling a life of curiosity, service, and high standards (as well as assuring 
me that I am not going to ?get stupider?). Siegrun Wildner, John Balong, Reinhold 
Bubser, Otto Maclin?for giving me the tools and space to engage in scholarship and 
giving me my start on this path. 
My cohort at the University of Maryland, College Park: Ilina Kachinske, Stephen 
O?Connell, Susan Benson. The camaraderie of our first year is among my fondest 
memories. 
My tribe at the University Career Center: Rachel Wobrak, Becky Weir, Erin Rooney-
Eckel, Pamela Allen, Erin Brault. I had a home on this campus thanks to all of you.  
iii 
 
The Graduate School Writing Center, Linda Macr??for providing the space, support, 
and snacks for the practice of scholarship throughout all stages of this and other projects.  
Zach Hebert?for coffee, walks, eye rolls, your humor and sanity. 
Rene Jones?I?d say that our friendship cannot be measured, but it can: in minutes, 
pages written and edited, deadlines blown, late nights, and trees. 
My oldest friends?Maya Kosova, Olga Vasileva, Olga Avsyukevich. Your 
friendship has taken many forms since we met, but what has not changed is your warmth, 
sincerity, and you just ?getting? me. 
My parents, Lyubov Solovyova and Sergei Solovev?for buying me all those books 
throughout the years, unwavering support of my ambitions, and making peace with my 
nomadic life. 
  
iv 
 
Table of Contents 
Acknowledgments ...................................................................................................................... ii 
List of Tables ............................................................................................................................ vi 
List of Figures ......................................................................................................................... viii 
Introduction: What Makes Inflectional Morphology Hard? ............................................................ 1 
Chapter 1: Regularities in L1 and L2 Learning of Inflectional Morphology ..................................... 7 
1.1 Morphological development in L1 ..................................................................................... 7 
1.2 Morphological development in L2: English ...................................................................... 10 
1.3 Morphological development in L2: other target languages .................................................. 17 
Chapter 2: Accounts of Morphological Development .................................................................. 26 
2.1 Syntactic Competence .................................................................................................... 26 
2.2 General-cognitive Approaches ......................................................................................... 35 
Chapter 3: Research Questions and Motivations ......................................................................... 42 
3.1 Benefits of Paradigm Complexity .................................................................................... 42 
3.2 Potential Trade-offs between Learning and Processing ....................................................... 46 
3.3 The Current Research ..................................................................................................... 48 
3.4 Target languages and their inflectional systems ................................................................. 50 
Chapter 4: Methods?Corpus Study of Written Learner Productions ............................................. 73 
4.1 Data Source and Learner Backgrounds ............................................................................. 73 
4.2 Procedure ...................................................................................................................... 81 
4.3 Error Categories and Their Significance ........................................................................... 82 
4.4 Cleaning and Coding of Data .......................................................................................... 86 
Chapter 5: Results?Cross-linguistic Differences in Inflection Error Frequency ............................. 96 
5.1 Regression Model Specification and Model Selection ................................................. 99 
5.2 Regression Model Results ............................................................................................. 102 
5.3 Cross-Validation .......................................................................................................... 124 
Chapter 6: Results?Production of Verbal Inflection in German: Phonological Environments ....... 128 
v 
 
6.1 Methods ...................................................................................................................... 133 
6.2 Results ................................................................................................................. 139 
6.3 Conclusions ................................................................................................................. 151 
Chapter 7: Discussion and Conclusions .................................................................................... 154 
7.1 Key research aims and findings ..................................................................................... 154 
7.2 Theoretical implications and takeaways .......................................................................... 157 
7.3 Limitations to consider in future research ....................................................................... 162 
7.4 Contributions ............................................................................................................... 169 
References ............................................................................................................................ 174 
vi 
 
 List of Tables 
Table 1 Syllable structure and permissible coda clusters in target and first languages .....59 
Table 2 Corpus frequencies of inflected forms in Czech (spoken, written) ......................61 
Table 3 Written frequencies of inflected forms in a web corpus of Italian .......................64 
Table 4 Frequencies of German inflected forms in a written corpus .................................65 
Table 5 Comparison of rank orders of inflected forms in written German, Italian, and 
Czech..................................................................................................................................71 
Table 6 Summary of key differences between the morphological systems of target 
languages............................................................................................................................71 
Table 7 Merlin corpus statistics: Number of texts re-rated at each CEFR level ................74 
Table 8 Error types adopted in the coding scheme, with examples from each TL ............85 
Table 9 Examples of data excluded during data cleaning ..................................................90 
Table 10 Structure of the Data ...........................................................................................97 
Table 11 Variables used in the analysis .............................................................................99 
Table 12 Summary of Poisson Model Dispersion and Model Fit Values ........................101 
Table 13 Regression model results predicting error rates in German, Italian, and 
Czech................................................................................................................................103 
Table 14 Significance of model contrasts integrated by variable and interaction ...........105 
Table 15 Contributions of interaction terms to model fit (assessed by single term 
deletions) ..........................................................................................................................105 
Table 16 Summary of pairwise comparisons of error rates between target languages by 
type ...................................................................................................................................110 
Table 17 Rank orders of error types by target language and proficiency level ...............117 
Table 18 Rank orders of error types by target language, averaged across all proficiency 
levels ................................................................................................................................119 
Table 19 Prediction accuracy of regression models when tested on unseen test data .....126 
Table 20 First language backgrounds of learners in the sample ......................................134 
Table 21 Classification schemes for predicate type .........................................................137 
Table 22 Effects of predicate type (four coding schemes) on inflection accuracy ..........141 
Table 23 Effects of syllabicity on accuracy of production ..............................................143 
vii 
 
Table 24 Effects of previous segment class on accuracy of inflectional ending: Obstruents 
versus sonorants ...............................................................................................................144 
Table 25 Effect of previous segment on inflection accuracy: Manner of articulation .....145 
Table 26 Effect of following segment on inflection accuracy .........................................146 
Table 27 Joint effects of phonological environment on inflection accuracy ...................147 
Table 28 Combined effects of syllabicity of ending and phonological environment on 
inflection ccuracy .............................................................................................................149  
viii 
 
List of Figures 
Figure 1. Learner L1 backgrounds by Target L2, aggregated across all proficiency 
levels. .................................................................................................................................77 
Figure 2. L1-TL contingency table Chi-square test residuals (left) and their % 
contribution to total statistic (right). ..................................................................................78 
 
Figure 3. Model residuals plotted against fitted values for the model predicting error 
counts from: target language, CEFR, error type, and interactions between?TL*error 
type; TL*CEFR; CEFR*error type. .................................................................................101 
 
Figure 4. Plots of aggregated model effects. Top panel: target language by error type 
interaction; middle panel: target language by proficiency interaction; bottom panel: 
proficiency by error type interaction. ...............................................................................106 
 
Figure 5. Model-predicted rates by type for German, Italian, and Czech across 
proficiency levels A2 through B1+. .................................................................................107 
 
Figure 6. Relativized (per number of texts) observed frequencies of error types in 
German, Italian, and Czech across proficiency levels A2-B1+. ......................................108 
 
Figure 7. Summary of pairwise comparisons among error type rates within each TL, 
averaged across all proficiency levels. .............................................................................117 
 
Figure 8. Cross-over pattern in morphosyntactic and morpholexical errors depending on 
target-language complexity. .............................................................................................120 
 
Figure 9. Error rates by type and target language across CEFR proficiency levels ........121 
 
Figure 10. Interaction between class of following phonological segment (x axis) and 
previous phonological segment (y axis) in affecting inflection accuracy. .......................149 
  
 
1 
 
Introduction: What Makes Inflectional Morphology Hard? 
Morphological difficulties have been perhaps the most salient hallmark of adult 
language learning since the inception of its study. Morphology and morphosyntax 
continue to dominate our notions of L2 proficiency, ultimate attainment, and fossilization 
(e.g., DeKeyser, 2000; Johnson & Newport, 1989; Lardiere, 1998), even as our thinking 
on communicative competence evolves and is enriched by considerations of 
sociocultural, pragmalinguistic factors, or phenomena at the interface between pragmatics 
and syntax, among others.1 
Some of the claims downplaying morphosyntactic difficulties are based on the 
first emergence of grammatical forms used contrastively, rather than a preponderance of 
grammatical forms used correctly, or even on the learners? ability to successfully 
comprehend the feature of interest (as demonstrated through sentence interpretation tasks, 
for example). Thus, a feature considered ?acquired? based on this type of analysis may, 
in fact, fail to be realized in the majority of learner productions. Such assertions also 
emphasize the primacy of competence over performance, arguably distorted by ?noise?, 
such as processing limitations, retrieval failure, and memory bottlenecks. And yet, learner 
errors are not as random as would be expected on account of processing resource 
breakdowns.  
It is this non-randomness in learners? errors that has been invoked (White, 2003, 
p. 196) to argue against a global breakdown in abstract syntactic competence (e.g., 
                                                 
1 Recent examinations of near-native learners? difficulties, by contrast, have emphasized 
phenomena at the interface of syntax with pragmatics and semantics as main areas of 
difficulty, not morphosyntax and syntax per se (e.g., Sanchez, Camacho, & Ulloa, 2010; 
Sorace, 2011; Sorace & Filiaci, 2006). However, considering the small proportion of learners 
who reach near-nativeness among a vast majority who do not, it seems fair to say that 
morphology and syntax are far from trivial for the average learner. 
2 
 
Clahsen, 1988; Meisel, 1997) as the root cause of persistent morphological errors. 
Attempts to explain morphological difficulties by positing gaps in lexical learning and 
memory retrieval are also insufficient upon closer inspection. They seem to merely 
replace the phenomenon to be explained with a new one, leaving us with a different 
formulation of the same question: ?Why are some inflected forms more easily accessible 
(readily retrievable) than others??.  
The question of how accurate use of inflected forms develops is interesting in its 
own right, whether one accepts it as reflecting syntactic competence or calls it by any 
other name. Regardless of its place as part of syntactic competence or outside of it, 
target-like use of inflected forms requires extensive learning. Even though such learning 
has been outsourced to the lexicon in fairly recent syntactic thinking in SLA 
(Herschensohn, 2001; Lardiere, 1998), the need for it is undisputed even in the strongest 
Universal Grammar literatures (Lidz & Gagliardi, 2015; Yang, 2002).  
The details of this learning have been filled in, to a certain extent, by general-
cognitive proposals (DeKeyser, 2005; Goldschneider & DeKeyser, 2001), accounts 
invoking the limitations of the production processor (Pienemann, 2015), and 
psycholinguistic approaches that posit developmental shifts between the reliance on 
whole-form storage of inflected forms and compositional assembly (from former to 
latter?Clahsen & Felser, 2006; from latter to former?Gor & Jackson, 2013; Portin, 
Lehtonen, & Laine, 2007).2 However, these theoretical pieces do not add up to a 
                                                 
2 The two directions are not mutually exclusive in the sense that within one learner, 
representations of both kinds are likely to coexist and develop non-linearly. The two positions 
taken by psycholinguistic researchers and cited here claim that the overall progression tends to be 
in one or the other direction. 
3 
 
complete puzzle, not least because they only hint at how their explanations might fit with 
the others when some pattern in the data cannot be explained from within a theory itself.  
The task at hand, therefore, is to study the progression of this learning, moving 
from the consideration of difficulty intrinsic to a particular morpheme towards 
accounting for the acquisition of inflectional paradigms?that is, systems of contrasts 
among morphemes. Accounting for the totality of forms to be learned is a way to 
respect?and model?current linguistic descriptions of the adult native speaker endstate, 
which treat it as a system of structural relations between elements and not lists of 
independent elements, following a long-standing structuralist tradition (e.g., Saussure, 
1966).  
Even though evidence on second language (L2) morphological development has 
by now accumulated for a number of target languages (TLs) other than English, an 
explicit comparison of rates of morphological learning has not been pursued. Research 
programs that include multiple TLs are often carried out to validate an existing?and 
one-TL based?account of difficulty (e.g., Pienemann, 2003) rather than to seek 
potentially disconfirming evidence or to accommodate cross-linguistic data in a 
principled way. In this sense, the spirit of multi-language research efforts, such as those 
based on the Processability Theory (PT) or the Shallow Structure Hypothesis (SSH), is 
anything but cross-linguistic.  
One of the dimensions on which TLs differ in ways potentially consequential for 
theory building is paradigm complexity. By presenting radically different learning 
problems, languages ranging in paradigm richness can serve as a testing ground for 
competing theories of linguistic complexity or even competing conceptualizations of the 
4 
 
learning mechanism itself. Considering entire linguistic systems, rather than isolated 
grammatical phenomena, has the advantage of reflecting more closely the reality of 
learning: the language presents itself to the learner at once, even in conditions of the 
strictest instructional control over input. Even when such control is present, instruction of 
the focus-on-forms (Long, 1991) variety tends to respect paradigms: it is hard to imagine 
a pedagogical approach, for instance, that would selectively rely on drills for just one 
feature combination (e.g., second person singular).  
A cross-linguistic study, therefore, holds promise of theoretical significance, 
beyond reflecting more fully the diversity of real-life language learning contexts. As the 
review of accounts of grammatical complexity will show (Chapter 2), conceptualizations 
of L2 morphological difficulty make assumptions about the very nature of grammatical 
learning, which may have been inherited from descriptions of the learning of English. 
Deliberately extending existing theoretical accounts to other TLs can subject theoretical 
accounts to additional scrutiny that may help with the pursuit of a transition account 
(Gregg, 1996) of SLA. The pressure to account for TL differences revealed by such 
comparisons can refine the accounts of morphological difficulty to a level that is 
sufficient for their ultimate integration into a coherent account of SLA.  
The present proposal focuses on verbal morphology, owing to verbs? special 
status as the hub of sentential meaning?dictating argument structure, thematic roles, and 
case assignment. Verbal inflectional morphology is common denominator for a study 
with a cross-linguistic focus, since it deals with units of close to universal semantic 
meaningfulness, such as present and past tense, in contrast to the distinctions that are 
layered with semantic complexity, such as case. Verbal morphology has been the most 
5 
 
widely studied due to its role in syntactic processes: verbs are typically predicating 
(Gentner, 1982) and not referring; they are overall more involved in syntactic processes 
than nouns (Dressler, Stephany, Aksu-Koc, & Gillis, 2007, p. 68).3 Perhaps owing to this, 
learning theories have been proposed with far greater enthusiasm for the acquisition of 
verbal, than nominal, inflectional morphology (e.g., Clahsen & Felser, 2006; Ullman, 
2004) and tasked, in addition, with representing lexical and combinatorial processes (e.g., 
irregular and regular verbal forms, respectively). While I will be invoking data from the 
psycholinguistic literature on single- versus dual-mechanism processing of inflected 
words, I will only do so as far as it adds to insights to comparisons of different TLs and 
relates to learnability?ignoring the debates internal to this literature.   
Chapter 1 summarizes the empirical facts on morphological development, from 
early morpheme order studies to recent corpus analyses. The review will include data 
from diverse TLs as much as possible, both in L1 and L2 development. As will become 
clear throughout the review, scholars disagree on what the facts to be explained are. This 
is not surprising in the absence of predictions guided by a learning theory. In chapter two, 
I will present an overview of approaches to morphological difficulty?focusing on the 
accounts that emphasize syntax-morphology interdependencies, followed by those based 
on general-cognitive mechanisms and processing principles. In particular, I will focus on 
spelling out the assumptions made by both accounts about the nature of the learning 
mechanism transitioning the learner from one stage to the next. In chapter three, I will 
argue that examining rates of growth for different target languages can push theories to 
                                                 
3 Even though nouns are also influenced by contextual factors?for example, their case?their 
role in syntactic processes such as agreement is one of providing the ?inputs? for the verb to 
agree with, where it is the verb that does the agreeing in response to the noun. 
6 
 
be more explicit in the learning mechanisms they posit. This may lead to more accurate 
descriptions of the data and to novel insights into the nature of learning in adults, 
potentially testing whether it proceeds in piecemeal fashion or whether elements in a 
complex system are acquired in a way that reflects their similarities at an abstract level. I 
will conclude Chapter 3 by presenting the research questions of the study and by 
providing descriptions of the relevant aspects of target-language grammars (German, 
Italian, and Czech). In Chapter 4, I will describe the research methods and the data 
sources, as well as the error taxonomy and its application to the data during data coding. 
Chapters 5 and 6 report the results: Chapter 5 concentrates on the results of comparisons 
among the target languages with respect to the proportions of different error types. 
Chapter 6 focuses on the production of inflection in L2 German, examining the data 
through the lens of interlanguage phonological processes. Chapter 7 concludes this 
dissertation by offering key takeaways and theoretical implications while noting the 
study?s limitations.   
7 
 
Chapter 1: Regularities in L1 and L2 Learning of Inflectional 
Morphology 
There are well-documented sequences in children?s L1 and adult L2 
morphological development. Although the exact ordering of morphological features in L2 
development have differed from study to study, there is general agreement about broad 
patterns in the data (Dulay & Burt, 1973; R. Ellis, 1994, 2015; Mitchell & Myles, 2004), 
sometimes referred to as ?Long?s Law? (e.g., Ellis, 2015). The sources of discrepancies 
among studies in the learning orders they propose are multiple and include differences in 
learners? L1s, as well as the differences in experimental task demands, which, in turn, 
stemming from different theoretical perspectives on what constitutes ?learning? and 
?knowledge?. For example, approaches that endeavor to characterize the nature of L2 
syntactic competence generate their evidence from sentence interpretation and 
grammaticality judgments, whereas a more applied, testing, or skills-based perspective 
would include accurate production or aspects of learner performance (e.g., speed). 
This chapter will first characterize the regularities in child L1 acquisition, 
particularly highlighting any cross-linguistic and typological differences. Then it will 
review the work on the so-called ?morpheme orders? identified in the learning of English 
as a second language. Finally, it will summarize findings from learners of TLs other than 
English, which have employed both production data and psycholinguistic measures. 
1.1 Morphological development in L1 
The existence of broad regularities in L2 acquisition?and the very desire to find 
them?parallels observations of developmental sequences in L1 acquisition. In L1 
acquisition, morphological development is characterized by the presence of root 
8 
 
infinitives (RI)?seemingly non-finite forms that lack overt morphological marking and 
are produced where a finite form is required. RIs have been attested across typologically 
diverse languages (Rizzi, 1993/1994; Wexler, 1994). Notably, RIs, which are errors of 
omission, are more common than errors of commission (supplying wrong inflection). The 
gradual disappearance of RIs in development has led some researchers to conclude that 
the syntactic projections supporting finiteness mature over time, even though others have 
disagreed with RIs? characterization as infinitival in the first place (Phillips, 1995).  
Whether truly non-finite or tacitly finite but lacking overt morphological markers, 
RIs lack surface morphology and vary cross-linguistically both in prevalence and the age 
at which they disappear from child productions (Phillips, 1995). The length of RI 
persistence in child speech has been linked to the relative complexity of the language?s 
inflectional paradigm (Legate & Yang, 2007). 
Transcending paradigm complexity, robust differences have also been attested 
along typological lines. Morphological systems are acquired earlier in agglutinative 
languages than in fusional languages, as early cross-linguistic comparisons of L1 
acquisition showed (Slobin, 1985). This finding has been extended through more cross-
linguistic comparisons (Laaha & Gillis, 2007), in which languages were not merely 
construed as representing distinct idealized types (e.g., ?agglutinating?) but, following 
principles of quantitative typology (Hempel & Oppenheim, 1936, cited in Dressler, 2007, 
p. 3, 5), as possessing different levels of the typological property of interest?
?agglutination?, or ?inflection? (Dressler et al., 2007). For example, within the 
?inflecting? group of languages, the graded nature of inflection as a typological property 
was taken into account. French, German, and Dutch were not merely considered ?weakly 
9 
 
inflecting?, and Greek, Croatian, and Russian ?strongly inflecting?. Rather, the 
differences were treated as continuous properties, with French, for instance, being less 
inflecting than German, or Russian less inflecting than Greek. Within the same language 
type (e.g., weakly or strongly inflecting), morphological systems of languages whose 
inflectional paradigms were richer were acquired faster. Among the weakly inflecting 
languages, children acquiring German and Dutch outpaced children acquiring French; 
among the strongly inflecting languages, children acquiring Greek developed inflection at 
a faster rate than did the Russian and Croatian children (Stephany, Voeikova, 
Christofidou, Gagarina, Kovacevic, Palmovic, & Hrzica, 2007, p. 46). 
Comparisons were also conducted between typological groups: agglutinating, 
weakly inflecting, and strongly inflecting. Both paradigmatic and syntagmatic richness 
were considered: paradigmatic richness refers to the number of structural choices made 
available by a language, whereas syntagmatic richness refers to the average length of 
morpheme sequences in words. For the development of inflection in the verbal domain, 
only paradigmatic morphological richness was predictive of rate of development. By 
contrast, in the nominal domain syntagmatic richness mattered as well (Xanthos, 2007, p. 
64)?expressed as the average number of affixes per word. Spearman correlation values 
between paradigm richness attested in the input and children?s speed of development 
were quite high: for verbs (paradigmatic richness)?0.76, p = 0.003; for nouns 
(paradigmatic)?0.93, p < 0.001, syntagmatic?0.77, p = 0.02. Notably, the roles of 
transparency, uniformity, and salience could not be consistently identified (Dressler et al., 
2007, p. 70; Xanthos, p. 64). This suggests that their role may be one of a tiebreaker or 
more easily identifiable when languages within the same typological group are examined. 
10 
 
These results put observations about the role of language typology in acquisition 
from earlier studies on firm quantitative ground. In earlier work, the advantages of 
agglutination were noted and explained in a general-cognitive light as stemming from the 
transparency of form-meaning mappings (Peters, 1997, p. 181; Slobin, 1985, p. 1216). 
The more recent studies conducted by Dressler and colleagues (2007) propose that 
paradigm richness may also contribute to this developmental advantage, possibly by 
exerting communicative pressure on the child to pay attention to subtle differences in 
meaning between forms (p. 9). Both explanations may be valid, considering that 
communicative pressure alone does not necessarily predict a difference between 
agglutinating and strongly inflecting languages.  
1.2 Morphological development in L2: English 
The morpheme order studies yielded the observation that, often independent of 
instruction, learners tend to acquire the grammatical features of TL English in a similar 
order (Brown, 1973; Dulay & Burt, 1973; Larsen-Freeman, 1975; Pica, 1983). Despite 
two frequent criticisms levied against the early work on acquisition orders, the findings of 
morpheme order studies are still considered valid (Larsen-Freeman & Long, 1991).  
The first criticism concerns their cross-sectional nature; the second is their 
reliance on arbitrary criteria to determine learning, such as a 90% accuracy rate. 
Addressing the first problem, a number of subsequent studies tracked individual learners 
longitudinally and yielded similar findings (e.g., Dyson, 2009; Lardiere, 1998) both 
across different L1s in the learning of English and across different L2s, including German 
and Swedish (Pienemann, 2005). Thus, acquisition orders are not merely an artifact of 
averaging across multiple learners. However, the potential for discrepancies between the 
11 
 
performance of any one learner picked at random and the acquisition orders captured in 
aggregate also exists. Insofar as studies oriented at describing the orders do not make 
commitments to a particular learning procedure, operating on specified inputs to produce 
a range of expected outputs, expecting their results to adequately capture developmental 
idiosyncrasies of single learners will be a recipe for disappointment. As such, these 
discrepancies may not speak to the presence or absence of orders in learning but 
underscore the need for principled criteria in deciding which ones of them are noise and 
which would invalidate the notion of orderly development. 
Concerning the second criticism, different criteria have been proposed in the 
literature to make morpheme studies less vulnerable to tracking the noise in data that 
arises from learner performance. For example, the emergence criterion (Meisel, Clahsen, 
& Pienemann, 1981) relies on the first instances of reliable, contrastive use of a 
morpheme, ignoring the lingering optionality that may persist in learner productions for 
years. However, neither emergence nor the achievement of some level of accuracy 
indexes any learning phenomena interesting in its own right. Rather, it is a snapshot in 
time of the underlying movement driven by the operation of learning processes. The 
utility of both lies in their ability to reflect an underlying growth curve that is driven by 
the operation of learning mechanisms, which then produce distinct ?orders??or rankings 
of morphemes by accuracy?at whichever points one samples along this trajectory. 
Therefore, any discrepancies in orders that arise as a byproduct of one?s choice of the 
points to be sampled (90% accuracy, first emergence) are not altogether surprising, and 
neither invalidate nor prove the existence of a universal learning mechanism. Difference 
12 
 
in orders (or accuracy rankings) will then emerge because of differences in growth rates, 
much like lines that will cross if their slopes are different. 
These caveats aside, a consensus seems to have emerged from studies relying on 
error rates in learner productions that morphological errors in L2 are more varied than 
those attested in child L1 and include erroneous inflections, not just omissions (Bruhn de 
Garavito, 2004; Jia & Fuse, 2007; Mezzano, 2003; Morales, 2014; Prevost & White, 
2000).  On the other hand, some of the findings have been invoked in the syntactic 
literature (e.g., White, 2003, p. 196) to make comparative claims about the prevalence of 
omission and substitution. In particular, their authors argue that learners predominantly 
omit inflection, but when they do inflect, they tend to inflect correctly (Grondin & White, 
1996; Haznedar & Schwartz, 1997; Ionin & Wexler, 2002; Prevost & White, 2000b; 
White, 2002). 
Some other developmental patterns in morphological development across a 
number of studies concern the relative difficulty of bound morphology compared to 
morphological features expressed through free morphemes (Dyson, 2009; Jia & Fuse, 
2007; Lardiere, 1998a, b; Vainikka & Young-Scholten, 1996; Zobl & Liceras, 1994). 
This distinction is appealing because it transcends theoretical boundaries (UG, 
processing, general-cognitive accounts) by drawing on the salience of free morphemes as 
freestanding words in an adult?s mind (Van Patten, 2004).  
Looking at error rates on verbal inflectional morphology, what stands out is the 
overall low accuracy regardless of length of residence. For instance, Lardiere?s (1998) 
subject Patty (LOR at first recording was 10 years, last recording?18 years), whose first 
languages were two varieties of Chinese (Hokkien, Mandarin), correctly marked tense 
13 
 
only about 34% of the time (through inflection or tense-expressing auxiliary). This rate 
stayed steady over the course of the 8.5 years that elapsed between the first and 
subsequent recordings.  
This pattern holds even in studies that included younger learners, who cannot 
possibly be deemed to have entered a fossilized state. In a study conducted within the 
processability theory framework, Dyson (2009) analyzed the productions by two 
adolescent, beginner-level learners of English as an L2 who were native speakers of 
Chinese. The measurements were spread over the course of nine months, and both 
participants had received some English instruction in their home country prior to 
relocating to Australia. Only one of the two learners reached the emergence criterion on 
third person singular by the sixth?and last?measurement, achieved through the 
suppliance of three correct forms out of 51 available obligatory contexts. The other 
learner did not reach the criterion, supplying zero correct forms on four measurements, 2 
(out of possible 46) on measurement 4, and 1 (out of 43) on the very last occasion. Tense 
marking did not fare much better: the first learner showed emergence of irregular past-
tense marking at the second measurement but the regular ?ed rule only at the 4th (based 
on one correct token and two overgeneralizations), continuing at that level at 
measurement 5 (one token only), and producing no tokens whatsoever at measurement 
six. The second learner demonstrated the emergence of irregular past at measurement 
four (with 10 tokens) but produced no tokens of regular ?ed past over nine months. 
Similarly low accuracy was reported in a study comparing early- and late-starters 
of English as an L2?with ages at arrival between 5 and 16, who were also native 
speakers of Mandarin (Jia & Fuse, 2007). The study started tracking participants three 
14 
 
months after arrival in the United States and continued over five years. By the 16th testing 
session administered at the end of five years of residence in the U.S., only three out of 10 
participants had achieved 80% or higher accuracy on third person singular, four out of 10 
on the irregular past tense, and none of the 10 on regular past tense. These data were 
obtained from spontaneous productions during interviews with the researchers.  Over 
90% of errors in both domains were errors of omission, not wrong inflection. 
Ruling out a purely phonological explanation of these difficulties, it has been 
shown that learners with L1 backgrounds other than Chinese, including Korean (Johnson 
& Newport, 1989), Hungarian (DeKeyser, 2000), and Russian (DeKeyser, Alfi-Shabtay, 
& Ravid, 2010), among others, experience difficulty with English inflectional 
morphology. These difficulties have been demonstrated on a variety of tasks that are 
deemed to be less processing-intensive, such as grammaticality judgments, rather than 
spontaneous production. 
Somewhat obscuring the regularities, L1 influences have been shown to be one 
source of discrepancies in the learning orders (Goldschneider & DeKeyser, 2001; 
Murakami & Alexopolou, 2015). For example, there are departures from the 
generalization that free morphemes are learned earlier than bound ones that can be 
attributed to L1 influences.  The relationships between L1 and L2 are far from 
straightforward and do not boil down to simple transfer. For example, learners of English 
who are L1 speakers of Chinese master third person singular ?s before regular past tense, 
even though neither feature is expressed morphologically in Chinese (Jia & Fuse, 2007; 
Luk & Shirai, 2009). By contrast, L1 speakers of Korean were among those learners for 
whom regular past-tense marking (?ed) had one of the highest target-like use percentages 
15 
 
(along with L1 speakers of Turkish, Japanese, Russian, German and French), even though 
Korean lacks this feature too. Speakers of Spanish, surprisingly, had one of the lowest 
TLU rankings for third person singular  ?s among all L1 groups, even though Spanish 
marks this feature combination (Murakami & Alexopolou, 2015).  
Most importantly, even those learners whose L1s did have morphological features 
comparable to those of the TL, while showing higher accuracy than speakers of L1s 
lacking them, were well short of 100% accuracy even at the highest proficiency 
examination levels (Murakami & Alexopolou, 2015: CPE?Cambridge English: 
Proficiency, equivalent to the C2 level in the CEFR framework). Therefore, any account 
of the facilitating role of L1-L2 overlap has to consider that ceiling level. On the one 
hand, the absence of a morpheme in the L1 seems capable of depressing learners? 
accuracy. For example, progressive ?ing, plural ?s, and possessive ?s are considered 
among the easiest based both on earlier morpheme order studies and on theoretical 
grounds (as belonging to lower-level syntactic projections), yet they exhibited the 
strongest influences from the presence or absence of comparable features in the L1. 
Conversely, third person singular ?s?one of the hardest features of English as an L2?
showed the least variation among the L1s examined. This points to an attenuating effect 
of the L1, not one erasing other dimensions of difficulty altogether.  
Conclusions. The original formulation of acquisition orders explained their 
existence as the result of a universal learning mechanism. Arguably, such a mechanism is 
conceptually separate from the ingredients supplied to it by the L1 and the input. The 
operation of the mechanism on variable prior knowledge (L1 grammar and the weights of 
hypotheses about grammar it supplies) and variable input would result in noticeably 
16 
 
different accuracy rankings. Therefore, even the absence of uniform orders in the data 
need not mean that the operating learning mechanisms are any different.  
Even demonstrable L1 influences on accuracy orders are not incompatible with 
the notion of general acquisition orders. Depending on the learning mechanism one 
assumes (e.g., MacWhinney, 1989; Yang, 2002; cf. Pinker, 1984), the L1 may merely 
change the weights of learners? prior hypotheses about the TL grammar, in a Bayesian 
sense. This would mean that hypotheses with lower prior probabilities would require 
more evidence from input to successfully influence grammar building, leading to slower 
acquisition. A learner whose L1 marks tense morphologically would accept that 
expectation as their prior hypothesis about the TL by default, which would then be 
strengthened or left without support by the input. By contrast, a learner with no 
knowledge of past-tense morphological marking from their L1 would have no reason to 
postulate this feature for the TL a priori and would take longer to learn it.  
The discrepancies in orders revealed by different studies may never be resolved, 
nor is such a resolution a prerequisite for successful theory building. Without postulating 
a learning mechanism that produces close-enough approximations of the empirically 
observed patterns, not much will be gained from additional data. Large-scale 
examinations of the kind supplied by corpus studies illustrate the limits of putting 
description before theory, and scale alone does not always translate into more confidence 
in the results. On balance, without a general idea of what kind of a learning mechanism 
lurks behind the differences, it is hard to evaluate which of them may be informative and 
which ones are noise. Finally, it is also not clear whether the ?naturalness? of any orders 
stems from ?nature? or results from range restriction of sorts, reflecting prevalent patterns 
17 
 
in the development of learners of English, as opposed to any target language. More recent 
thinking in this area has shifted towards elucidating the properties of morphemes 
(ranging from perceptual to syntactic) that could account for the observed orders, rather 
than insisting that it is particular grammatical features that are acquired in a fixed 
sequence (e.g., DeKeyser, Alfi-Shabtay, Ravid, & Shi, 2017). 
1.3 Morphological development in L2: other target languages 
Work on TLs other than English has been sparse. Rather than originating as a 
research topic in its own right, the inclusion of other TLs has been a by-product of the 
multitude of theoretical approaches enlisted to explain the acquisition of grammar in L2, 
each largely concerned with their own agendas.  
The acquisition of Spanish was studied from a Processability Theory (PT) 
perspective by Bonilla (2015). Applied to Spanish, PT predicts first the emergence of 
plural marking on lexical heads (manzanas, ?apples?), followed by intraphrasal agreement 
within the determiner phrase (las manzanas, ?the-fem-pl, cf. ?apple-s?), followed by inter-
phrasal agreement (agreement marking ?s). This order was upheld in oral productions of 
21 L2 learners of Spanish at beginner, intermediate, and advanced levels. In all learners, 
syntactic manifestations of the levels, in line with Pienemann (2004), emerged before the 
corresponding morphology. Indirectly, this also supports the notion of bound morphology 
being more ?difficult?. With respect to agreement marking specifically, only five learners 
out of 21 reached stage four (interphrasal agreement).  Proficiency levels are not detailed 
for individual learners: instead, the author reports that there were seven learners at each 
level, corresponding to approximately 180, 750, and 895 hours of instruction (beginner, 
intermediate, and advanced levels, respectively), and an additional year abroad for the 
18 
 
advanced level. While it is unknown what the proficiency levels were of the learners who 
achieved each stage of agreement marking, one can estimate that at least two of the 
advanced students did not reach the acquisition criterion for stage-four features. Since the 
study did not focus on types of errors, it is impossible to draw further conclusions about 
the rates of errors of different kinds. 
In an analysis of L2 French that did separate erroneously supplied morphological 
marking from uninflected forms, Herschensohn (2001) provides a breakdown of errors by 
two intermediate-level learners, one of whom completed a six-month study abroad 
program. While the number of participants was very small, the study involved sizeable 
samples of learner discourse with a high number of obligatory contexts for the forms of 
interest. Over the course of three interviews spanning six months, the suppliance of 
correct inflectional morphology in obligatory contexts in the present (and past, in 
parentheses) tenses increased in both learners: 88% (45%) and 74% (10%)?at the first 
interview, 86% (56%) and 89% (88%)?at the second interview, and 96% (79%) and 
98% (97%) at the last, third interview. The second (and more accurate) participant was 
the one who had studied abroad. Even though the accuracy is already rather high, it is 
even higher if one ignores errors that involved ellipsis. Without taking into account 
ellipsis errors, out of the remaining 60 errors, 38% involved the substitution of present-
tense forms for past-tense forms, while the remaining two-thirds (62%) were inflectional 
errors in a narrow sense, broken down into infinitival forms (where finite was required) 
in less than half of those cases (16 out of 37 inflectional), slightly fewer yet (14/37) 
19 
 
involved applying incorrect morphological marking, and a handful of substitutions of 
forms marked for the wrong person-number features (7/37).4 
Studies involving learners of German and French (Pr?vost & White, 2000)and 
Spanish (Bruhn de Garavito, 2003; Mezzano, 2003) are in agreement with these generally 
high accuracy rates. Although they employed different methods, learners of different 
ages, and proficiency is not equated across these three studies (three years of exposure for 
French, less than two for German), tentative observations can be drawn. Rates of non-
suppliance of inflection were higher for French than German (cited in Morales, 2014, p. 
91), even though the corresponding syntactic operation (verb movement) had been 
acquired by then. For French, non-suppliance reached 23-24%, while for German the 
figure was 10-16%. Not all finite forms that were supplied were correct, and incorrectly 
provided forms were more prevalent among learners of German (~12%) than among 
learners of French (4-5%).  
By contrast, lack of finiteness marking in Spanish amounted to only 4% after just 
24 hours of exposure in a beginner-level course (Mezzano, 2003), while erroneous 
agreement was also encountered?12% after 88 hours of classroom instruction. Person-
marking forms were used interchangeably, with singular forms tending to be used in 
place of plural (Mezzano, 2003). Similarly low error rates on agreement were reported by 
Bruhn de Garavito (2003):  only 10%, including both substitutions of inflected forms for 
one another, as well as infinitives in place of finite forms. A study of learners of Russian 
showed similarly high levels of accuracy on verbal inflection (~80%?Tkachenko & 
Chernigovskaya, 2010). 
                                                 
4 Despite the low number of participants, the study had qualitative depth by virtue of its 
longitudinal design, which resulted in a high number of instances of the features studied.  
20 
 
This pattern of relatively high accuracy contrasts with the difficulties with 
inflection demonstrated by L2 learners of English (reviewed in the previous section) and 
echoes observations on aphasia across speakers of different first languages. Similar to 
learners of English as an L2, English-speaking aphasia patients have been reported to 
predominantly omit?rather than substitute?inflectional endings (Grodzinsky, 1984; 
Gorema, 1998). By contrast, speakers of languages in which uninflected forms result in 
non-words predominantly make substitution errors?as, for example Italian (Miceli, 
Mazzucchi, Menn, & Goodglass, 1983), Greek (Kehayia, 1990; Kehayia, Jarema, & 
Kadzielawa, 1990), or Hebrew and Arabic (Mimouni & Jarema, 1997), in which affixes 
are discontinuous and are inserted into a consonantal root. Therefore, the error patterns 
reported for English and constituting the ?natural order? may be idiosyncratic to a poorly 
inflected language.5 
At least on some accounts (see Chapter 2), the errors are linked to developing 
syntactic knowledge, whereas on others they are caused by retrieval or access failures. 
Thus, researchers have attempted to isolate the difficulty and only examine compositional 
processes at the word level, thereby separating it from any processing demands imposed 
by production or syntactic computations within a sentence. This approach has yielded a 
separate set of facts to be explained and integrated with the findings obtained using other 
paradigms.  
                                                 
5 Even though the exact phonological realizations of morphemes differ from language to 
language, the ?natural order? implies some measure of cross-linguistic relevance. Considering 
that in English a number of morphemes are realized identically (third person ?s, possessive ?s,  
and noun plural marker ?s), any putative explanations of their relative difficulty would logically 
have to be based on grammatical meaning, at least in part.  
21 
 
Decomposition in L2 processing of inflected forms. Single-word morphological 
processing research has centered on characterizing the decomposition and whole-form 
storage of inflected words, and the changes in the relative reliance on them throughout L2 
development. As such, learning orders of morphological features have not been of 
interest. Although cross-linguistic comparisons are as rare in this literature as in the 
research from other paradigms, the results of psycholinguistic studies that examined 
different TLs lend themselves more easily to comparisons, owing to the narrower 
variation in their methods.  
On the one hand, many studies of English as a TL have failed to show 
morphological decomposition of inflected forms, as reflected by the magnitude of 
priming effects in masked priming tasks (Kirkici & Clahsen, 2013; Neubauer & Clahsen, 
2009; Silva & Clahsen, 2008). A handful of studies that used paradigms that allow for 
conscious perception of the prime?such as cross-modal priming?did sometimes show 
decomposition in L2 English (Basnight-Brown, Chen, Hua, Kostic, & Feldman, 2007; 
Feldman et al., 2010). By contrast, a growing literature on other TLs includes studies that 
have demonstrated decomposition in learners of French (masked?Coughlin & Tremblay, 
2015), early bilinguals in Finnish (Lehtonen & Laine, 2003), early (Lehtonen et al., 2006) 
and late bilinguals in Swedish (Portin et al., 2007), and learners of Spanish as early as at 
the intermediate level (masked?Foote, 2015; Presson, Sagarra, MacWhinney, & 
Kowalski, 2012) but not advanced (naming?Bowden, Gelfand, Sanz, & Ullman, 2010). 
Two opposing predictions have been made with respect to developmental 
trajectories in the reliance on decomposition versus whole-form storage in the L2. On the 
one hand, dual models that posit separate neurocognitive mechanisms for the two kinds 
22 
 
of processes specify that learners move from whole-form storage to decomposition as 
their proficiency increases, even though decomposition may not be achieved fully 
(Clahsen, Felser, Neubauer, Sato, & Silva, 2010; Ullman, 2004). On the other hand, a 
different body of psycholinguistic studies (mainly employing morphologically richer 
languages) have claimed that the direction is the opposite: initially, learners decompose 
inflected forms but eventually move on to retrieving them whole as increased experience 
with them allows for the proceduralization of compositional rules (Portin, Lehtonen, & 
Laine, 2007). Of course, it is also possible that there are multiple shifts between the two 
throughout development, and these switches are just too granular to be detected by 
research programs that sample learners at discrete points on the developmental trajectory. 
It is unclear how robust the psycholinguistic insights are where cross-linguistic 
differences are concerned, considering the overall low number of such studies and their 
inevitable reliance on learners at higher proficiency levels. For a priming study, one has 
to recruit learners who are familiar with the inflected forms of interest and have 
accumulated sufficient lexical knowledge to support 30-50 items per condition (including 
less-frequent items). However, ideas have been put forward regarding how the processing 
of inflected forms may differ in languages differing in typology and morphological 
complexity. For instance, it has been argued that in languages with richer paradigms the 
application of inflection is a graded, probabilistic process based on the recognition of 
arbitrary subclasses (Gor & Jackson, 2013).6 Thus, producing or comprehending an 
                                                 
6 Ultimately, it is a question best solved empirically: if inflection is indeed applied in a graded 
fashion in inflectionally rich languages, learners would be expected to make errors in the 
application of a default (usually ?regular?) pattern to novel items?applying subrules and 
alternations where none are needed??irregularizing?, so to speak. 
23 
 
inflected form entails the application of several processes involving both inflection and 
phonological alternations.  
In my view, rather than positing a categorical difference between the richer and 
poorer languages, one may instead view inflection in both as a confluence of processes. 
The distinction between the core and the periphery is particularly called for in this 
context. While it may be obscured by alternations, the ?core? process involved (in the 
TLs studied) is that of combining a stem (in whatever form is revealed after the 
phonological processes) and an affix. Morphophonological alternations can be seen as 
falling in the middle of a continuum from full suppletion to subregular patterns (with 
rule-like status) and regarded as the periphery. Viewed through a connectionist lens, the 
affixation pattern should be a more reliable cue for the learner, since it applies uniformly 
and categorically, whereas any subrules will, by definition, have lower frequencies and, 
therefore, lower cue reliability (Presson et al., 2012).  
Conclusions. Decades of research on L2 grammatical development have 
produced some foundational insights into the course of learning of inflectional 
morphology, creating the overall perception of it as a lengthy uphill climb that is destined 
to stop short of the goal. Even though these developmental regularities have been deemed 
to reflect a single universal acquisition order (itself stemming from universal learning 
principles), they were largely generalized from data from English L2 learners. While 
studies on other target languages (TLs) also exist, they are less numerous and have been 
conducted in efforts to validate existing acquisition theories, as opposed to specifically 
focusing on what might be similar and different in the course of learning these different 
TLs.  
24 
 
Despite the lack of studies explicitly comparing different TLs, some differences 
appear to emerge across single-TL studies, however coarse cross-study comparisons may 
be (complicated by differences in the TLs, learners? L1s and ages, and contexts of 
learning). Learners of TLs other than English appear to achieve higher accuracy on 
inflectional morphology earlier in development. Thus, learners? difficulties with 
inflectional morphology in these morphologically richer TLs are not merely ?scaled up? 
difficulties experienced by learners of English, and the accuracy they achieve would be 
radically underpredicted if one were to simply extrapolate from English learners? data. 
Psycholinguistic data are more mixed but suggest intriguing possibilities with respect to 
learners of morphologically richer languages relying less on whole-form storage of 
morphologically complex words than learners of English.  
Most studies that have examined production accuracy did not concern themselves 
with the specific ways in which learners? utterances deviated from the target. However, 
the few cases that took this direction can spark new lines of inquiry that are theoretically 
meaningful. Even those studies that have interpreted different error types as theoretically 
relevant (e.g., Herschensohn, 2001) have done so ad hoc. Some potentially relevant 
distinctions that have emerged concern omission of inflectional endings, substitution of 
inflected forms for one another, and uses of non-finite forms in place of finite ones. 
Meanwhile, a focus on error types has the potential for formulating fine-grained 
predictions with respect to how learning paths may be different for different TLs. 
In this dissertation, therefore, cross-linguistic data will be analyzed with the 
express purpose of drawing comparisons between learning processes as reflected by 
different error types.  In Chapter 2, I will apply existing theoretical accounts of L2 
25 
 
learning to the task of explaining the data summarized above, particularly contrasting the 
predictions of syntactic and general-cognitive accounts with respect to the existence of 
any cross-linguistic differences, their nature, and implications for the learning 
mechanisms they assume.  
  
26 
 
Chapter 2: Accounts of Morphological Development 
Theoretical accounts of L2 learning difficulty followed tend to emphasize either 
issues of syntactic representation versus performance or general-cognitive processing 
properties, such as salience broadly construed. I will review each approach in turn, first 
summarizing the positions within the respective literatures, focusing, where possible, on 
cross-linguistic data, and conclude by laying out that approach?s assumptions about the 
process of learning, as well as any extensions it permits to cross-linguistic comparisons. I 
will additionally highlight any outstanding issues within each approach that could benefit 
from cross-linguistic analysis and a detailed examination of learner error types. 
2.1 Syntactic Competence 
Several syntactic accounts have been proposed to explain L1 and L2 data, 
situating the acquisition and learning of morphology within the development of 
grammatical competence. These accounts take differing stances with respect to the nature 
of the knowledge learners start out with (?the initial state?), the role of Universal 
Grammar in the learning process, as well as the scope of its influence, and the interaction 
between the syntactic properties of the L1 and those of the L2. Due to its interest in 
linguistic representation over performance, the UG tradition has centered on linking 
patterns in learner performance (e.g., errors in the use of overt IM) to the syntactic 
representations underlying their behavior or on denying such links. The specifics of the 
morphology-syntax links have been a matter of intense internal debate in this literature, 
and the issue of their relative sequencing and, thus, causal relationship, has been theory-
carrying. 
27 
 
At one extreme, learner difficulties with IM have been taken to reflect underlying 
representational deficits, which are either viewed as permanent (Clahsen, 1988) or 
gradually disappearing. At the other end of the spectrum, accounts such as the Missing 
Surface Inflection Hypothesis (Haznedar, 1997) have posited intact syntactic 
representations in learners and placed the source of the difficulty with IM solely at the 
stage of PF (phonetic form). That is, the difficulty lies in mapping the feature-marked 
forms (correctly generated by the syntax) to morphological forms of the specific lexical 
item. Effectively, this dissolves any ties between observed morphology and abstract 
syntax by absolving syntax from any responsibility for learner errors. 
Positioned between these two extremes are theoretical accounts that do 
acknowledge the connection between morphology and syntactic representations and have 
concerned themselves with clarifying which one drives the other. First, according to early 
influential views, morphological forms in both L1 and L2 development give rise to the 
functional projections that enable them: e.g., in English it is the copulas and third-person 
singular ?s that trigger the emergence of the AgrP (agreement phrase), 
complementizers?the emergence of the CP (complementizer phrase), and so on. This 
means that learners should first achieve accuracy on inflected forms and then start 
exhibiting facility with the syntactic phenomena associated with those respective 
projections (e.g., word order operations).  
For instance, according to the Minimal Trees Hypothesis (Vainikka & Young-
Scholten, 1996), in early grammars only lexical projections are present, while functional 
projections are missing altogether, and syntactic structure is not projected above the level 
of the VP (verb phrase). Exposure to morphologically complex forms through input then 
28 
 
allows the learner to gradually build those higher projections. This proposal was meant to 
account for opposing patterns in child L1 and adult L2 acquisition: while morphological 
features expressed through affixes emerge earlier than free morphology in child L1 
acquisition, in L2 learning the pattern is reversed and it is bound morphemes that present 
a challenge. Another implementation of this idea (?morphology causes syntax?), the 
Valueless Features Hypothesis (Eubank, 1994) proposes that the syntactic projections 
themselves are present from the beginning but their syntactic features are ?inert??not 
specified as either strong or weak, and thus rely on exposure to morphologically inflected 
forms to be determined as such.  
Zobl and Liceras (1994), in contrast, argue that all functional projections are 
present from the start and the difficulties arise due to the marked way of merging 
inflections with lexical heads in English, which requires the lowering of affixes from IP 
onto V, compared to the unmarked way requiring the raising of V to I (e.g., French). 
Even though this proposal treats all features at the same syntactic level as having the 
same difficulty, its merit may be in postulating an asymmetry in difficulty between 
movement operations and, by virtue of that, in allowing for cross-linguistic differences in 
L2 learning difficulty.  
Building on the distinction between lexical and functional projections, Hawkins 
(2001) brings more nuance to this approach. In keeping with Vainikka and Young-
Scholten (1996), he contends that the initial state for L2 may be comprised of lexical 
projections transferred from L1 (which may persist or rapidly restructure under the 
influence of L2 input), whereas functional projections emerge gradually. The gradual 
manner of development applies both to the emergence of projections in relation to one 
29 
 
another (IP before CP), as well as to sets of morphemes expressed at the level of the same 
functional projection. For example, head-complement features develop earlier (Aux be) 
than non-local binding relations (tense), with specifier-head relations (agreement of I 
with its specifier?the subject) being last. Allowing for this granularity among features 
belonging to the same level allows Hawkins to incorporate L1 influence into the model. 
He argues that L1 transfer will facilitate learning at the stage when it is relevant and the 
necessary projections are in place. He uses as an example the development of third 
person singular ?s and past-tense ?ed in Spanish and Japanese L1 learners. When both 
groups of learners have an underspecified IP that is only used as a landing site for moved 
auxiliary verbs, both groups perform poorly on third person singular, even though 
transfer would be expected for Spanish speakers. When learners begin to realize 
morphology in IP, however, Spanish L1 speakers are able to benefit from L1 transfer and 
become more accurate on third-person singular than on past tense ?ed, whereas Japanese 
L1 learners remain equally inaccurate on both features. 
Conversely, another family of accounts shows that syntax may be in place before 
the morphology that relies on it. Demonstrations to this effect have gone hand-in-hand 
with arguing not merely for syntax before morphology, but for a complete break between 
the two. For example, in one report an endstate learner (Lardiere 1998a, b, L1 Chinese) 
mastered nominative and accusative case assignment, adverb and negation placement, 
and non-null subjects near perfectly?all taken to be indicative of robust syntactic 
projections well above the VP, while tense and agreement were still impaired. Similar 
observations have been made about an L1-Turkish child learner of English (Haznedar & 
Schwartz, 1997) and a sample of L1-Russian child learners (Ionin & Wexler, 2010). 
30 
 
Notably, unbound morphology (such as auxiliaries and copulas) were supplied more 
accurately than were bound inflected forms. Studies conducted in the processability 
framework further exemplify this sequencing pattern, even though PT does not directly 
predict this difference (Bonilla, 2015; Dyson, 2009). Arguably, word-movement 
operations may be available to learners without involving much syntax per se: it is 
impossible to say with certainty whether the processes by which learners arrive at 
seemingly ?nativelike? word orders or case, for example, are syntactic in the same way as 
a native speaker?s or whether these nativelike productions are the product of pattern-
matching strategies of a general-cognitive kind (Bley-Vroman, 1997). 
While syntactic accounts have captured an important generalization in the data 
(higher accuracy on syntactic than morphological phenomena), their explanations of it 
miss the mark in several ways. First, relegating morphological errors to the PF or the 
morpholexicon does not account for the regularities in learner errors so painstakingly 
documented. As argued by Franceschina (2001), if the existence of patterns in learner 
errors is used to argue against a global syntactic breakdown in the L2 (otherwise errors 
would be randomly distributed), the same reasoning should apply to the morphology and 
PF modules. If the L2 PF is deficient, it should be across the board, which is not the case, 
as the studies reviewed in Chapter 1 showed. In other words, why would the phonological 
module respect syntactic distinctions?such as the difference between case and tense, for 
instance (Lardiere, 1998)?  
Claims of native-like competence (despite optionality in performance) in the 
endstate (Lardiere, 1998; 2006; White, 2003)?enabled by continued access to UG?
exonerate syntax as the source of difficulty but also enable an inconvenient restatement 
31 
 
of the problem. From asking ?why does L2 syntax not generate appropriately inflected 
forms?? one is left wondering why L2 morphology (or morpholexicon, or PF) sometimes 
does and at other times does not generate target-like inflected forms, largely following 
syntactic distinctions.  The answer to this question matters from both theoretical and 
practical standpoints, and claiming that the knowledge is ?there? but we cannot observe 
or measure it offers little by way of implications for instruction, testing, or materials 
design. Such reasoning is also inconsistent conceptually: any successful suppliance of 
morphology is attributed to the operation of target-like syntax, whereas any departures 
from the target are somehow not syntactic but morphological or PF-related. Without any 
means to independently determine the source of any given utterance, such explanations 
are not credible. 
One promising feature of syntactic theories that can resolve this contradiction is 
their acknowledgment of morpholexical learning as a necessary addition to UG and 
syntactic knowledge. However, the specifics of such learning are poorly understood and 
have largely been left outside the scope of syntactic development theories in L2. Learning 
theories developed in the L1 acquisition literature have been more explicit and have 
integrated UG with input-driven distributional learning (Lidz & Gagliardi, 2015; Pinker, 
1984; Yang, 2002). Their focus on learnability, rather than solely competence and 
performance, provides a way to account for the regularities in errors by linking them to 
aspects of the grammar that generated them without dismissing them as being solely 
reflective of perfomance. Extending this approach to L2 learning can help L2 syntactic 
theories navigate some of the contradictions described above. 
32 
 
Employing a different syntactic formalism, one rooted in lexical-functional 
grammar, processability theory (PT?Pienemann, 2005) has linked syntactic complexity 
to processing difficulty. It posits an acquisition order proceeding from features expressed 
at the level of lexical items (e.g., noun plurals in English), to the within-phrasal level 
(e.g., agreement between article and noun), and, finally, the level of interphrasal links 
(e.g., subject-verb agreement). Thus, difficulty increases with the need for feature 
coordination across phrases, compared to the difficulty of phenomena that are expressed 
locally on lexical heads. This approach has spawned a lot of cross-linguistic evidence but 
has focused on cross-validating the proposed acquisition orders on different TLs, without 
an interest in any differences in rates of development not accommodated or predicted by 
it.  
Even though PT posits an explicit processing machinery, it employs the processor 
to recast syntactic phenomena in psychological terms, as opposed to detailing how it 
serves as a learning mechanism. The relation that PT posits between ease of processing 
and ease of acquisition is one of identity. The two need not be the same, as has been 
claimed for L1 (Pinker, 1984): for example, producing agreement across intervening 
material is memory-costly even for adults fully competent in the L1, as exemplified by 
agreement attraction errors such as The key to the cabinets *are on the table. For a child, 
errors would be expected on such structures even after they have been ?acquired?. In 
contrast, ease of acquisition is a function of the availability of relevant evidence in the 
input.  
While it is easy to dismiss any production errors as caused by processing 
bottlenecks when focusing on one target language, a true test for a purely processing-
33 
 
driven explanation is its ability to hold across multiple TLs. Learners of all TLs should be 
equally susceptible to processing difficulties and memory breakdowns. Any residual 
differences among TLs would be attributable to learnability and the differences in the 
TLs? surface realizations of morphology, such as the number of overtly expressed 
morphemes, number of homonymous morphemes, or the morphemes? perceptual 
properties. After all, why should it be more difficult for learners of English to produce an 
utterance such as My child walks to school with correct inflectional marking than it would 
be for a learner of German? 
Conclusions. Despite discrepancies among the L2 morpheme order studies, a 
recurring theme across several research literatures has emerged concerning the higher 
difficulty of bound morphemes compared to free morphemes (e.g., auxiliaries) (e.g., 
Pienemann, 2005; Vainikka & Young-Scholten, 1996; Zobl & Liceras, 1994). True to 
their mission of characterizing linguistic competence, linguistic theories have put more 
weight on emphasizing what is common, rather than unique, to learning different TLs: 
the development (or preexisting presence) of syntactic projections, the interface between 
syntax and phonological form, or the differences and similarities between L1 and TL.  
Extensions of syntactic theories to predicting the comparative ease (and difficulty) 
of learning for different TLs are less clear and fraught with contradictions. To the extent 
that the core syntactic architecture proposed by linguistic theory is shared among 
languages, so should be learning difficulty. Thus, third-person singular marking should 
cause comparable difficulty for learners of different TLs from a syntactic standpoint, 
even though it could be additionally complicated by the idiosyncrasies of different TLs? 
morpholexicons. On the other hand, by virtue of presupposing abstract syntactic 
34 
 
knowledge in learners, these theories leave room for synergies of the kind proposed in the 
L1 acquisition literature (Legate & Yang, 2007; Yang, 2002), which arise from the 
activation of abstract morphosyntactic features whenever morphemes are encountered. 
For example, encountering a verb overtly marked for second-person singular would 
contribute to the learning of third-person singular, by strengthening the ?singular? 
feature. On this proposal, the parameters of learners? grammars change in response to 
encountering inflected forms, not just the lexical entries of the verbs. This approach 
reconciles a common syntactic architecture shared by different TLs with room for cross-
linguistic differences that would be due to the specifics of their morphemes? realizations. 
On this view, TLs with inflectional paradigms in which more contrasts are marked 
overtly would have lower learning difficulty than those depriving learners of overt 
evidence in input. Theories of L2 syntax have not pursued this approach and have mainly 
placed learners? underperformance on inflection at the level of morpholexicon?not 
morphosyntax.  
Another point of uncertainty in a cross-linguistic extension is whether any 
learnability benefits afforded by the presence of multiple overtly marked morphemes 
would be offset by the increased need for morpholexical learning. For example, having 
six distinct inflected forms in the present tense?rather than just two (as in English, one 
of which is zero-marked)?may strengthen the [Tense+] grammar faster. However, it 
may also mean that for each lexical verb six inflected forms need to be learned (along 
with any potential phonological alternations), as opposed to just one.7  All of these 
                                                 
7 This does not imply that all six forms would be necessarily learned together or intentionally. 
Over time, however, all six would have to be integrated by the learner into the mental lexicon for 
successful production. 
35 
 
apparent extensions are highly speculative and have not, to my knowledge, been 
advanced from within these literatures themselves. 
None of the L2 syntactic theories have integrated learnability into their accounts 
of how grammatical competence is acquired and performed in the domain of inflection. 
Such an integration would advance theory building by linking the two and explaining 
how imperfect performance may arise from a relatively sophisticated grammar. However, 
a syntactic lens has much to offer to a characterization of L2 learning, especially in 
developing a taxonomy of learner errors. The distinction between competence and 
performance, or a tripartite split into competence, performance, and learnability may 
provide a path to viewing certain error types as reflecting the acquisition of abstract 
principles of grammar (e.g., agreement), others as reflecting the accessibility of inflected 
forms during production, and yet another group as being indicative of the accumulation 
of morpholexical knowledge.  
2.2 General-cognitive Approaches 
The linguistic accounts reviewed in the previous section acknowledge the need 
for learning, in addition to any innate syntactic knowledge they posit, and have to be 
supplemented by proposals detailing such learning. The general-cognitive approach (GC) 
can supply this missing piece of the puzzle by specifying the factors influencing 
morpholexical learning, which has been outside the scope of linguistically oriented 
theories. One alternative approach to explaining learning sequences has been to study the 
properties of the morphemes in question and classify them along dimensions such as 
perceptual salience (e.g., syllabic status), homonymy, transparency of form-to-meaning 
mapping, and so on. Similar factors have been proposed for child L1 acquisition but as 
36 
 
part of broader learning theories rooted in the syntactic tradition (Pinker, 1984; Yang, 
2002). 
In SLA, an influential account positing a general cognitive explanation was 
advanced by Goldschneider and DeKeyser (2001).  On this view, a feature?s salience is a 
confluence of factors that include its syntactic function along with perceptual, semantic, 
and frequency characteristics. Together these variables explained as much as 70% of the 
variance in acquisition orders of English morphemes in ESL.  
This approach differs from syntactic theories in its key assumptions about 
learning. First, by situating learning difficulty in individual morphemes one tacitly 
assumes that morphemes are both adequately segmented in the input and then learned 
one-by-one. This contrasts with the notion entertained by some syntactic theorists that 
learners entertain abstract hypotheses at a level above any individual elements?one of 
grammatical features and parameter settings for the TL grammar (cf. Legate & Yang, 
2007). This is evident from the fact that only the frequency counts of individual 
morphemes matter for predicting a feature?s learning trajectory, whereas and any indirect 
evidence available in the input does not. Such indirect evidence is supplied by the 
oppositions entered into by individual morphemes: in the simplest case?the opposition 
between a morpheme itself and its absence. Furthermore, the effects of frequency are 
considered without reference to any typological patterns, unlike in the L1 acquisition 
literature, where the effects of frequency are deemed to vary depend on the learning 
mechanism one posits. For example, on Pinker?s (1984) proposal the learning mechanism 
needs radically different frequencies for agglutinated versus fusional forms to achieve 
equivalent strength of knowledge. Even though it is acknowledged by the GC proposal 
37 
 
that some typological configurations may result in higher salience (e.g., agglutinative 
because of the transparency of each morpheme being mapped to only one feature), any 
facilitation resulting from it has not been elaborated or weighed against the other aspects 
of salience.  
The second assumption is a summative, flat view of the effects of saliency factors 
without postulating any hierarchy among them. Because all predictors are treated equally, 
the model has descriptive but not explanatory power. If feature A scores high on 
variables 1, 2 but not 3, the predicted effect is identical to a low score on variable 1 
coupled with high scores on 2 and 3. The inputs to morphological development included 
in the model capture notable regularities in the data  and allow the formulation of fine-
grained predictions of learning difficulty if one compares one morpheme to another. 
Extending this model to comparisons between entire morphological systems (of 
different TLs) is not straightforward and would require adjustments that go beyond the 
original proposal, considering that it was not formulated for cross-linguistic use. Since 
difficulty is a property of individual morphemes on this view, TLs with a higher number 
of them in their inflectional paradigm should have an overall slower rate of learning. This 
is because even in the most beneficial scenario each additional morpheme (expressed 
overtly) has a non-zero difficulty score. Conversely, a TL with fewer overtly marked 
morphological contrasts would seem preferable.  
Due to the flat structure within the set of predictors it proposes,  the GC model 
formalized in 2000 and 2005 (Goldschneider & DeKeyser, 2000; DeKeyser, 2005) 
mathematically implies no difference between a TL with many morphemes of low 
individual difficulty and a TL with only a few high-difficulty ones (English). To build 
38 
 
into the model any synergies (such as the added transparency from agglutination), one 
would presumably lower difficulty scores for all the morphemes, but it is unclear by how 
much.8 In a contrast with syntactic accounts, in assessing learning difficulty the GC 
model has placed less interest in the structural relationships among morphemes, including 
the possibility that difficulty may well lie in a morpheme?s very uniqueness as the only 
overtly marked form in a paradigm, not just its low frequency per se. 
Similar to syntactic accounts, the GC approach has not pursued examining error 
types, even though interesting possibilities arise from attempting to use GC principles to 
predict error differences cross-linguistically. On an extremely speculative reading of it, it 
appears consistent with predicting infinitival or unmarked forms in place of inflected 
forms: not using any morpheme would seem easier than a morpheme with non-zero 
difficulty. At the same time, substitutions of inflected forms for one another are also not 
antithetical to this proposal. However, on my reading, substitutions would have to be 
directed towards more salient morphemes, even though it is not clear which salience 
dimensions would be prioritized (e.g., morphological, phonological, or syntactic).  
For example, if it is claimed that third-person singular (English: ?s, German ?t) is 
difficult because it is non-syllabic, bound, and semantically redundant, learners could be 
predicted to produce any number of alternative ?easier? forms, in theory?ranging from 
uninflected forms to an competitor that is syllabic (e.g.,  ?en) or more frequent. On the 
other hand, substitutions that do not go in the direction of increased salience would be 
                                                 
8 In a similar vein, the GC account does not explain task-related discrepancies: why 
would the same morpheme with all the same properties be more easily comprehended or 
judged as grammatical than it is produced? Any accounts of such differences are of 
necessity ad hoc and external to the proposal. 
 
39 
 
harder to account for. Any uses of forms that do not exist, furthermore, would be the 
most difficult to accommodate.  
Conclusions. The theoretical approaches to L2 grammatical difficulty 
summarized in this chapter have provided complementary perspectives on morphological 
learning but have yet to be integrated. Their integration, however, is not straightforward.  
The two approaches (general-cognitive, syntactic) are not mutually exclusive: the 
need for distributional learning is undisputed in the syntactic literature, whereas general-
cognitive approaches include a syntactic dimension of difficulty. While being broadly 
complementary, the syntactic and GC lines of research differ in their fundamental views 
of what it is that learners learn. On the GC view, it is individual morphemes (or form-
meaning mappings); on the syntactic view, it is abstract features and parameter settings. 
Thus, when applied to the same data, syntactic and general cognitive approaches can 
jointly provide a new perspective on the nature of the learning mechanism in L2. 
To achieve this, the path forward could be to select one domain of learner 
behavior that is meaningful to most parties, such as oral spontaneous production, or 
written production, and then spell out the patterns of performance that would be expected 
under each theoretical proposal. In doing so, a focus on different error types and cross-
linguistic comparison would be instrumental to pushing each approach to maximum 
clarity about its predictions, which are currently hard to extrapolate. 
Cross-linguistic comparisons can be informative for separating learning difficulty 
and processing difficulty, which is something that both approaches could benefit from. 
While language processing systems share largely the same working memory and 
production architectures (Levelt, 1989), their linguistic systems present vastly differing 
40 
 
amounts of learning data in their inflectional paradigms. Therefore, difficulties stemming 
from processing bottlenecks should be similar across TLs, assuming similar processing 
demands for any given structure. Conversely, differences across TLs would speak in 
favor of deeper learnability challenges associated with different TLs? systems. 
Additional insight is also to be gained from considering learner errors as well. As 
pointed out in Chapter 1, detailed accounts of error types have not been pursued in 
previous research; nor have the theories outlined in this chapter made references to error 
types and what they would signify within the respective frameworks. When applied on 
top of a cross-linguistic approach, an additional analysis of errors would allow to test 
deeper and finer-grained hypotheses about the L2 learning processes. For example, if 
learners of Italian are more accurate at producing verbs inflected for the third person 
singular than are learners of English, does that mean they have acquired more robust 
representations of agreement, the exact lexical items tested, or is the inflectional marker 
more salient? Posing this question cross-linguistically would not be sufficient.  
An error analysis in this context, however, could offer answers. One could 
speculate that errors related to the use of uninflected or infinitival forms in inflectionally 
rich TLs should match those observed in learners of English if the source of the difficulty 
lies in processing resource limitations. If, however, learners of inflectionally richer TLs 
make other kinds of errors more readily (e.g., substituting other inflected forms), a 
learnability benefit can be inferred for the richer TL. By contrast, different frequencies of 
errors related to the use of inexistent inflected forms could reflect the processing costs of 
richer inflectional paradigms?such as more competition in the mental lexicon during the 
retrieval of inflected forms. 
41 
 
The purpose of this dissertation is, therefore, to produce data of a kind not 
examined or generated before: data that are both cross-linguistic and granular at the level 
of error types. At a minimum, this chapter has shown that both theoretical approaches can 
use these kinds of data to refine their predictions and test some of the hypotheses that 
have been advanced about the sources of learners? problems with inflectional 
morphology. In addition, when both theoretical approaches are applied to the same data, 
their assumptions about the nature of learning can also be tested against each other. 
  
42 
 
Chapter 3: Research Questions and Motivations 
Chapter 1 showed that cross-linguistic L2 acquisition data are exceptionally 
sparse, and any comparisons have to be pieced together from disparate studies originating 
from a variety of SLA research fields, driven by different agendas and utilizing different 
methods. Despite this sparseness, some tentative observations can be drawn from the 
handful of studies that have investigated diverse target languages (TLs). As Chapter 2 
demonstrated, neither syntactic nor general-cognitive accounts of L2 learning can explain 
these observations or formulate predictions as to what the data ?should? look like if a 
cross-linguistic comparison were to be attempted. Nevertheless, cross-linguistic research, 
coupled with a detailed examination of learner errors, can offer insight into the learning 
of L2 morphology that cannot be achieved by either syntactic or general-cognitive 
theories based on single target-language data alone. 
This chapter will first recapitulate key gaps in the research on L2 morphological 
learning, especially as they relate to predicted difficulties of the morphological systems 
of different TLs. It will connect these gaps to the research reported in this dissertation. 
Finally, the target languages selected for the current research will be described from the 
standpoint of their verbal inflectional morphology and the systems of verb classes. 
3.1 Benefits of Paradigm Complexity 
Data on the learning of TLs other than English (presented in Chapter 1) appear to 
suggest that learners achieve higher accuracy at producing inflectional morphology 
earlier in development than do learners of English. However, the obvious caveat here is 
that English and other TLs have not been compared head-to-head within the same studies, 
43 
 
to the best of my knowledge. This is not predicted by either syntactic or general-cognitive 
accounts.  
According to the former, morpholexical learning should occur gradually and, as 
implied by its item-by-item nature, should be slower the more there is to learn. On my 
reading of the latter account (as formalized in DeKeyser, 2005 and Goldschneider & 
DeKeyser, 2000), the difficulty of learning an inflectional paradigm can be expressed as 
arising bottom-up from the individual difficulties of the morphemes in question. This 
implies that a greater total number of morphemes should result in slower learning overall.  
However, a number of studies across different subfields of language science have 
indicated that a longer ?to-do? list may not be detrimental to the task of learning. In L1 
acquisition, similarly counterintuitive patterns of findings have been dealt with by 
positing competing grammars that coexist in a child?s mind (Yang, 2002). These 
grammars presumably include conflicting settings for a given parameter?for example, 
[+ Tense] for a grammar resembling English or [-Tense] for a grammar resembling 
Chinese?and compete until one wins.  
For example, a child acquiring English receives evidence in favor of the [+Tense] 
setting not only from past-tense forms, but also?indirectly?from present-tense forms 
that express agreement, which strengthens the tensed grammar until it wins the 
competition with the grammar without tense (Guasti, 2002; Legate & Yang, 2007). The 
rankings of languages based on the amount of evidence compatible with a tensed and a 
tenseless grammar were shown to mirror the rates of disappearance of bare infinitives in 
child language (Legate & Yang, 2007). Similar proposals, couched in terms of competing 
grammars, have also been advanced for L2 learning, however, without catching on 
44 
 
(Amaral & Roeper, 2014) or advancing hypotheses as to how grammar competition is 
resolved.  
Further evidence suggesting that complexity, broadly construed, may be 
beneficial during learning have originated from the cognitive science and information 
theory. In artificial grammar learning (Thompson & Newport, 2007), pseudo-syntactic 
frames consisting of syllables (cf., English am__ing) were learned better if the variability 
of the intervening syllables was higher.  Maximally diverse filler syllables likely 
highlighted the stability of the frame bookending them. Extended to L2 morphology, this 
could mean that encountering lexical items inflected in more diverse ways may 
strengthen learners? lexical representations. This insight parallels observations from L1 
acquisition, where the diversity of contexts in which parents used finiteness morphemes 
(e.g., is) predicted children?s productive use not only of the morpheme in question nine 
months later, but also of other finiteness markers with the same feature specification 
(third-person singular ?s) 15 months later (Rispoli & Hadley, 2012). Although here the 
?complexity? in these examples extends to the contexts surrounding the elements of 
interest, the value of such facilitatory effects lies in opening the door to speculation about 
any effects of paradigmatic complexity more narrowly. 
Richer inflectional paradigms? benefit may lie in putting communicative pressure 
on the learner and inducing attention to morphological elements, forcing the learner to 
discover the semantic distinctions they encode (child L1 acquisition?Dressler et al., 
2007). Offering an L2 parallel to this proposal, an artificial L2 learning experiment 
(Fedzechkina, Jaeger, & Newport, 2012) showed that the learning of differential object 
marking was facilitated (as reflected by higher accuracy scores) when the morphemes 
45 
 
signaling it were the only cues to thematic roles. Conversely, learning was impeded when 
the information encoded by DOM was recoverable from word order as well. These results 
generally echo the notion of semantic salience proposed under general cognitive accounts 
(e.g., VanPatten, 2004): if a morpheme is not critical for comprehension, the semantic 
dimension of its salience is lowered, and learning happens at a slower rate.  
When applied to distinctions between morphemes in the same paradigm (such as 
present-tense indicative) across different languages, it is unclear whether communicative 
pressure would predict differential outcomes for individual morphemes within it. Since 
inflectionally rich languages also tend to be PRO-drop, which makes morphology more 
essential to comprehension, more controlled studies will be needed to separate the impact 
of paradigm richness per se from the confounding communicative pressure introduced by 
missing overt subjects. 
On the connectionist account, the presence of multiple non-zero marked forms in 
the paradigm may increase the validity of inflectional morphemes as cues to meaning. In 
this literature, learning of inflectional paradigms is a process of strengthening form-
meaning mappings (Kempe & MacWhinney, 1998). For example, Russian and German 
systems of noun inflections in the nominative and the accusative cases differ in the 
number of unique inflected forms (higher in Russian), the average uniqueness of 
inflections?or the ratio of unique inflected forms to the total number of possible forms 
(higher in German), as well as the validity of inflection as a cue to thematic roles (defined 
as the reliability with which agent/patient roles can be predicted from the case).  
In Russian, case is expressed synthetically, whereas in German it is marked by the 
combination of the morphology of the  article (and sometimes also an affix). Learners of 
46 
 
both languages completed a forced-choice picture identification task, where they had to 
select the picture representing the agent of a sentence they just heard. Controlling for 
familiarity with their respective L2s, learners of Russian made fewer errors than did 
learners of German, whose accuracy improved, however, with additional self-reported 
experience. This led the researchers to conclude that case marking is learned faster in 
Russian. In addition, a connectionist network simulation replicated the advantage of 
Russian in correctly selecting the second noun as the agent in a case-marked OVS 
sentence. This demonstrates the joint benefits of paradigm richness and cue reliability, 
which were conflated in this study.  
3.2 Potential Trade-offs between Learning and Processing 
There are two opposing psycholinguistic implications of paradigm complexity for 
online processing. On the one hand, current psycholinguistic models suggest that the net 
effect of neighborhood and cohort density should be negative rather than facilitative. For 
example, in models of spoken word recognition (e.g., Marslen-Wilson & Tyler, 1980; 
Marslen-Wilson & Zwitserlood, 1989) increased cohort size or neighborhood size (Luce 
& Pisoni, 1998) result in slower recognition due to the competition of lexical items 
during selection. However, in models of speech production the presence of 
phonologically similar competitors facilitates word naming once the lemma has been 
selected (Lupker, 1982; Meyer & Schriefers, 1991). 
When considering the psycholinguistic repercussions of complexity, one 
inevitably ventures into lexical territory and the interplay between phonology and 
morphology, or broadly?the array of tools that express morphological contrasts (e.g., 
root vowel changes and quasi-regularities associated with verb classes). Therefore, 
47 
 
inflectional paradigm complexity may differentially affect phenomena that are closer to 
the lexical or the combinatorial ends of the spectrum. 
By contrast, when one examines learning, as opposed to processing, some 
psycholinguistic evidence points to denser lexical neighborhoods being beneficial. In L1 
acquisition, root infinitives have been observed to be less frequent with verbs from dense 
neighborhoods than with those from sparse ones (Hoover, Storkel, & Rice, 2012). The 
authors interpreted their findings as indicating that verbs from more dense neighborhoods 
are represented in the mental lexicon with more phonological detail and are, therefore, 
more accessible in production. They also proposed that the existence of phonologically 
similar words pushes the learner to differentiate any given word?s phonological form 
more finely. By extension, this would apply to inflected forms stored in the lexicon as 
well and afford learnability benefits in TLs that contain more such forms. Learnability 
evidence from artificial language learning studies also suggests that high input variability 
may facilitate regularization (Hudson Kam & Newport, 2007, 2009). Admittedly, 
regularization of this kind is a double-edged sword: whereas the learning of the most 
regular pattern is facilitated, input is essentially being coerced toward it, resulting in 
intake that does not match the input?s distributional properties.  
The question of tradeoffs, therefore, will rely on one?s definition of learning. For 
example, if by learning we mean the disappearance of bare (unmarked) or infinitival 
forms from learners? productions, then exposure to varied inflected forms in input may be 
facilitative. If learning is equated with producing inflected forms correct down to the last 
letter, then this type of input may be adverse in so far as a greater number of forms in the 
mental lexicon leads to greater competition during retrieval. Thus, a distinction between 
48 
 
psycholinguistic processing and outcomes of the learning process can be studied by 
recourse to error types. 
3.3 The Current Research 
The most obvious gap in research that this dissertation seeks to address is the very lack of 
a set of cross-linguistic phenomena to explain. A bona fide cross-linguistic examination 
of morphological learning can provide theoretically relevant insights, and specifically: 
- Refine syntactic and general-cognitive approaches by serving as a testing ground 
for their predictions, especially with respect to processing versus learnability; 
- Test the assumptions about the learning process implicit in syntactic and general-
cognitive theories against the same set of data, particularly any increases in 
learning difficulty that accompany increases (or decreases?) in the complexity of 
a TL?s morphological system. 
In addition, a close focus on learner productions and the ways in which they deviate from 
the target can lend further insights into learning: 
- Link aspects of performance on the same task to different mental processes and 
levels of grammatical knowledge or its abstraction (cf. the common practice of 
using performance on different tasks to tap different types or levels of 
knowledge): e.g., morpholexical versus morphosyntactic learning; learning of 
abstract principles related to agreement versus flawless online production of the 
exact inflected form required; 
- Characterize the learning of entire grammatical systems (of varying complexity) 
rather than of single morphemes. 
49 
 
Thus, the present research has combined a cross-linguistic focus with an interest in 
learner errors.  
Research questions and hypotheses. To contribute to existing accounts of L2 
morphological development and resolve some of their contradictions, the present research 
has carried out cross-linguistic comparisons of rates of learning in L2 learners. The goal 
of this dissertation is to explicitly contrast the repercussions of TL paradigm richness for 
the overall use of inflected versus non-inflected forms, as well as for the retrieval of 
correct forms during production. This research endeavors to provide insight into the 
following questions:  
1. Are rates of learning of L2 inflectional morphology proportionate to the 
complexity of the TL morphological system, expressed as the number of distinctions 
made in the paradigm? 
a. As the TL morphological richness increases, will learners become 
proportionately less accurate in written production at marking agreement 
through inflectional morphology on the verb? 
b. Does morphological richness impact different kinds of knowledge 
differently??Are there differences among TLs differing in complexity for 
errors involving more combinatorial, rule-like processes, compared to 
errors involving aspects of inflection involving lexical knowledge (and 
instantiated as knowledge about lexical items)? 
2. Are there tradeoffs between the effects of morphological richness on the 
mastery of agreement as a concept and its influence on online retrieval (potentially 
negative)? 
50 
 
a. Does the percentage of uninflected, substituted inflected forms, and 
inexistent forms (e.g., formed following a compositional rule correctly but 
failing to respect subregular morphophonological patterns) differ among 
languages of different morphological richness? 
b. Will this ratio change at different speeds among the different TLs? 
Specifically, will completely uninflected (or seemingly infinitival) forms 
drop off more sharply (i.e., starting at lower proficiency levels) in richer 
languages than in poorer languages?  
3. Can interlanguage phonological processes account for the pattern of errors in 
German? 
a. Does accuracy differ between inflectional endings that are phonologically 
felicitous, compared to endings that are phonologically marked or 
typically dispreferred in the interlanguage? 
b. When errors of substitution occur, do they flow in the direction from more 
phonologically complex to less phonologically complex realizations? 
Due to the exploratory nature of this research, detailed hypotheses were impossible to 
formulate from the standpoint of either syntactic or general-cognitive accounts. 
3.4 Target languages and their inflectional systems 
To answer the research questions, TLs of varying paradigmatic complexity have 
been selected, for which publicly accessible learner corpora were available through the 
Merlin project, that includes learner productions in German, Czech, and Italian.  
In previous research (Kempe & MacWhinney, 1998), complexity was 
conceptualized along three dimensions. First, of interest is the number of dimensions 
51 
 
expressed in a given paradigm: for example, such common ones as number, person, 
gender, and less commonly grammaticized distinctions, such as animacy. Second, the 
interaction of the number of dimensions with the number of levels each dimension can 
take (e.g., gender?masculine, feminine or masculine, feminine, neuter) can be expressed 
as their product, which reflects the total number of cells in the paradigm. For example, 
for German, which expresses the dimensions of person (with three levels) and number 
(two levels) on the verb, but not gender, that total is six. Third, the total of all cells (or 
forms) can be expressed relative to the number of cells containing unique elements: in 
German, there are three cells representing the combinations of persons with the plural 
number, but two out of the three combinations are marked by the same inflection on the 
verb, -en, yielding only two unique forms. This would produce a uniqueness ratio of 0.67 
= 2/3?one unique affix out of possible three. 
In the present study, any references to ?complexity? refer to the structural 
complexity of the TL?that is, the number of choices (distinct morphemes) contained in 
its morphological paradigm, and not putative difficulty for the learner. The descriptions 
of each TL?s complexity below will focus on the indicative mood, since the subjunctive 
or even the imperative are restricted both in use and in pedagogical emphasis. By the time 
the subjunctive is introduced in instruction, the development of inflection and agreement 
should be well underway. Following this logic, the passive voice was also excluded, since 
it is an advanced structure which tends to be expressed analytically. 
The remainder of the chapter will first introduce the verbal class systems of each 
TL, moving to brief summaries of their tense systems (as relevant to the present 
52 
 
research), and finally presenting each tense paradigm and the number of morphological 
distinctions they express.  
Inflectional systems of the target languages: verb classes. The three languages 
represent major language groups within the Indo-European family?Germanic, Romance, 
and Slavic?and express similar ranges of grammatical meaning, such as tense, aspect, 
agreement. All three use the Latin alphabet. In verbal conjugation, all three have a system 
of verb classes, which give rise to morphophonological sub-regularities. In the case of 
German, there is a regular/irregular distinction, which is the most visible in the formation 
of the preterite and the past participle (which is used to form the perfect, pluperfect, 
analytical future, and the passive voice), but is also evident in root vowel changes in the 
present. The vowel changes in the present tense stem from a distinct phonological 
process, Umlaut, and involve vowel harmony conditioned by the no longer surviving 
phonology of the affix.  By contrast, the differences in vowels between the infinitive, 
preterite, and past participle go back to pre-Indo-European processes of Ablaut. 
 In the case of Italian, the three verb classes are signaled by thematic vowels, 
resembling the system of Latin; in addition, there is a small number of truly irregular 
verbs (e.g., have, go, be, etc.). Finally, Czech has an even more elaborate verbal class 
system: depending on the linguistic analysis used, as few as four and as many as six 
classes are identified (e.g., Janda & Townsend, 2002). Regardless of the merits of each 
analysis, it is fair to argue that the Czech system is the most intricate of the three. Since 
arbitrary verb classes are present in all three languages, it bears emphasizing that the 
recognition of class membership by learners is not the focus of the present research. One 
could argue that within the general domain of inflection these sub-regularities of root 
53 
 
transformations represent a more lexicalized process than the application of inflectional 
morphemes. For this reason, it is our view that the application of inflectional endings is 
and should be treated as a procedure separable from phonological root and suffix 
alternations at morpheme boundaries. This is why any inflected forms that only differ 
from each other by some element signaling class membership (e.g., Italian?thematic 
vowel; Czech?suffix) will not be counted as distinct as long as the inflectional ending 
stays constant regardless of class. 
Inflectional systems of the target languages: tenses. Verbal inflection is 
expressed in a network of tenses, which are sometimes formed synthetically and 
sometimes analytically in the TLs. It is the synthetic forms that are the focus of the 
present study. All three languages have at least two synthetic tenses. In German, it is the 
present and the preterite; the perfect and pluperfect are expressed analytically through the 
combination of an auxiliary (forms of have, be) with the past participle which remains 
unchanged. Since have and be are the only components of this form that are conditioned 
by agreement and are likely to be highly overlearned, we will not consider these two 
tenses as contributing to paradigm richness in a major sense. Likewise, the analytical 
future in German is formed with the auxiliary werden (cf., English ?will?) and the 
infinitive, and will not be factored into the total complexity picture. In Italian, there are 
four synthetic tenses?the present, the imperfect past, the absolute past, and the simple 
future. Similar to German, there is also a perfect and a pluperfect tense, and a future 
perfect, differing from each other only in the tense of the auxiliary. However, in contrast 
to German, it is not only the auxiliaries but also the participles that agree with the subject 
in gender and number, expressed through four distinct forms (masculine singular, 
54 
 
feminine singular, masculine plural, feminine plural)?but only with unaccusative verbs 
requiring the auxiliary essere. Since the participle forms are identical for the three tenses, 
we can count the four participle forms once. In Czech, there is a synthetic present, which 
has a present meaning for imperfective verbs and a future meaning for perfective ones; 
the past is expressed analytically through a combination of the auxiliary be and a past 
participle?agreeing with the subject in person/number and in gender/animacy/number, 
respectively. Therefore, maintaining consistency with the treatment of participles in 
Italian, we can add this tense to Czech?s overall complexity but limit it to the six distinct 
participle forms. The analytical future in Czech and German was not considered because 
it is formed as a combination of an auxiliary and the infinitive. In both languages, the 
grammatical present tense has the capacity to express future meanings (in Czech?
obligatorily so for perfective verbs). 
In the present tense, all three languages express six number-person combinations 
in the sense that these are non-zero marked. However, in German the first- and third-
person plural forms are homophonous with the infinitive, whereas the second person 
plural form (informal) has the same inflectional ending as the third person singular but, 
for irregular verbs only, differs from it in the root vowel. Thus, if we de-emphasize root 
processes, the second person plural would not be counted as a distinct form, leaving a 
total of three different morphemes unambiguously different from the infinitival template. 
Another possibility is to acknowledge that the root differences would merit a separate 
count and even to count the two other plural forms once, resulting in a total count of up to 
five distinct forms. In Italian, all six person-number combinations are expressed by 
distinct morphemes, none of which resemble the infinitive morphology. Even though the 
55 
 
thematic vowels preceding inflectional morphemes vary by verb class, only six forms 
were counted as contributing to overall complexity (instead of eighteen?six endings 
multiplied by three possible thematic vowels). Finally, Czech expresses all six person-
number contrasts through forms that are distinct from each other and the infinitive. The 
calculation of the number of forms contributing to complexity is made less 
straightforward by the interaction of inflectional endings and verb classes. In contrast to 
Italian, where verb classes mostly determine alternations in the root and do not influence 
the choice of inflectional morphemes, in Czech the endings themselves differ depending 
on class. For example, the first-person singular form can end in vowel + -m (for two 
classes), -i, or ?u. All other person-number combinations are more uniform and end in a 
vowel suffix followed by an ending (same for all classes). On the most conservative 
calculation, we can ignore the vowel differences between two verbal classes (before ?m), 
in parallel to the choice made for the Italian present tense, where we also ignored 
thematic vowel differences. This amounts to three forms of the first person singular, one 
of the second person singular (collapsed across all vowel suffixes), three possibilities for 
the third person singular, one each for first and second person plural (with different 
vowels preceding the ending), and at least two broadly distinct forms of the third person 
plural?one ending in ?ou and the other in ?(vowel+j)?. This is a total of 11 forms. 
In the synthetic past tense, both German and Italian use some combination of the 
root, a suffix expressing tense, and person-number endings. In German, both the first- 
and third-person singular forms are zero-marked, on a strict morphological analysis. But 
because the suffix expressing tense, -(e)te, ends in ?e, and the person-number endings are 
identical to those of the present tense, the appearance is created that the final vowel of the 
56 
 
tense suffix marks first-person singular. There is no such illusion for irregular verbs, 
which do not add a suffix but use the preterite form plus person-number ending (cf., 
steal?stole). For these irregular verbs, the first- and third-person singular forms are 
identical and not marked overtly. Therefore, in both regular and irregular cases these 
forms can either not be counted at all or counted as one. This would give a total count 
ranging from three distinct forms with overt morphology (second person singular, 
first/third person plural, second person plural) to four (first/third person singular, second 
person singular, first/third person plural, second person plural). Because of the presence 
of the tense-expressing suffix, even the first- and third-person plural forms do not 
resemble the infinitive as much as their present-tense counterparts, so counting them at 
least once seems fair. 
In Italian, the synthetic imperfective tense forms follow the same general ?recipe? 
as in German, represented in six distinct forms: a stem (root plus thematic vowel) is 
followed by a suffix (-v) and an ending representing the person-number combination. The 
inflectional endings in the imperfect tense are identical to those of the present indicative. 
In both Italian and German, the formation of past-tense forms essentially amounts to 
agglutination, raising an interesting contrast to English. Where past-tense marking is 
omitted in English, it is inherently uncertain whether tense or agreement is to blame. In 
German and Italian, by contrast, a dissociation is possible where either the tense- or 
agreement-bearing morphemes can be omitted or otherwise incorrectly rendered. In 
addition, the Italian absolute past adds six inflected forms to a learner?s plate, which are 
derived using a different set of person-number inflectional endings.  
57 
 
In the past tense in Czech, in addition to person and number, gender and animacy 
are also expressed. This form blurs the distinction between synthetic and analytical 
expression: even though there is an analytical component to it in the form of an auxiliary 
(be) expressing person-number, gender is expressed on the lexical verb. There are three 
distinct forms in the singular?for the masculine, feminine, and neuter genders; and three 
forms in the plural?one for masculine animate (plural), another for masculine inanimate 
or feminine plural, and a third for neuter plural, totaling six forms. It was thus deemed 
more reasonable to include this form in the consideration of overall complexity rather 
than rule it out the same way that German perfect tense was ruled out, for instance, in 
which only the auxiliary changes but not the participle. To maintain consistency with 
Italian and German, only the number of distinct forms of the participle (conditioned by 
the grammatical gender of the agreeing subject) was considered as a contributor to 
overall complexity, not the product of distinct auxiliary by distinct participle forms. Thus, 
there are four distinct forms of the participle.  
Finally, the Italian simple future tense was also considered as adding to the 
overall paradigm complexity of Italian. This tense marks all six person-number contrasts, 
which, however, are expressed as cliticized (and reduced) forms of the verb go. 
In addition to the structural descriptions of target languages just presented, the 
frequencies of inflected forms were also taken into account (see section Corpus 
frequencies of inflected forms below), to control for the possibility that forms inflected 
for certain feature combinations may be more frequent in some TLs than in others. Such 
differences may arise, for instance, due to Czech and Italian being PRO-drop languages. 
The data on German were obtained from the DeReKo corpus (Das Deutsche 
58 
 
Referenzkorpus) maintained by the Institute of German Language in Mannheim (Institut 
f?r Deutsche Sprache, 2017). Data on Italian were extracted from the general reference 
corpus of Italian, Coris / Codis (Corpus di Italiano Scritto?Rossini Favretti, Tamburini, 
& De Santis, 2002). Finally, the frequencies for Czech are reported based on the Czech 
National Corpus (Kren et al., 2015). 
Phonological realizations of inflectional morphemes. Any comparisons 
centering on paradigm complexity are complicated by the fact that phonological 
realizations of morphemes differ as well. These differences in phonological realization 
can result in consonant clusters of different length and complexity, interacting with the 
different restrictions on permissible syllable structures in learners? L1s, as well as L1-
independent markedness constraints on consonant clusters in the interlanguage. Since all 
TLs are suffixing (German and Italian strongly, Czech?weakly) (Dryer, 2013b), the 
combination of roots ending in consonants with inflections expressed as consonants can 
create clusters that can be disproportionately located in codas (unless the inflection is 
syllabic). This potential imbalance towards codas poses an additional burden because 
codas are both more constrained typologically than syllable onsets in the number and 
identity of the consonants they allow (e.g., Blevins, 1996; Goldsmith, 2011, p. 190; 
Selkirk, 1980) and pose a greater difficulty for L2 learners (Anderson, 1987; Eckman, 
1986; Sato, 1984) than do onsets. 
The properties of the target languages? syllable structures, as well as those of 
learners? first languages, are summarized in Table 1. Syllable complexity, according to 
the World Atlas of Languages (Madieson, 2013), is a broad classification that is 
nevertheless intuitive: at the simplest level, there are languages that only allow the (C)V 
59 
 
syllable type (with an optional consonant onset); moderately complex languages permit 
CCV and CVC syllables?with CCVC being the most complex combination allowed?
but pose strict restrictions on what the second consonant in the onset can be; complex 
syllable structures are those which permit three or more consonants in the onset and/or do 
not restrict the nature of the second consonant as narrowly as in the moderate group, 
while allowing two or more consonants in the coda. 
Even though not all languages of interest are represented in the Syllable Structure 
map of WALS, its classification is still included in the table as a starting point. The 
learner metadata provided by the Merlin corpus (Wiesniewski et al., 2013) do not 
distinguish between European and Brazilian Portuguese, nor do they specify the varieties 
of Arabic spoken by learners. 
Table 1 Syllable structure and permissible coda clusters in target and first languages 
Language Syllable Complexity Allowed Codas 
(WALS 
classification) 
German Complex1 Maximum onset is CCC (C1 has to be /s/ or /?/; 
maximum coda?sonorant in nucleus plus CCC 
with CC in appendix outside syllable (coronal 
obstruents).15 
Czech Complex4 Both Moravian and Bohemian: CCCC maximum 
in onset, CCC in coda (rare in practice). 
Italian Moderately Codas with monophtong nuclei only: r l m n; 
complex3 fricatives or stops only as a result of 
germination. 
Russian Complex1 Maximum onset: CCCC; CCCC possible in coda 
if C1 is a liquid, CCC more common10. Fewer 
combinatorial restrictions9.  
Polish Complex1 Up to six consecutive consonants: onsets of 
maximum CCCC, codas of CCCCC.12 
Slovak Complex5 Maximum onset: CCCC; maximum coda: CCC 
(only 4 attested); 53 possible two-consonant 
clusters.5 /l/ and /r/ can form nucleus (are 
syllabic).6 
60 
 
French Complex1 Onset optional, maximum is CCC; coda of CCC 
possible but very restricted, CC more common.13 
Spanish Moderately CC onsets only. Coda optional: any single 
complex1 consonant except palatals; /s/ most frequent2 
Portuguese Moderately Few syllable-final consonants allowed, 
complex restricted. European variety allows more clusters 
than Brasilian.8 
Hungarian Complex1 CCC onsets possible but rare, CC more 
common?both found in foreign words; CCC 
coda possible.16 
Arabic* Variable Modern Standard Arabic: maximum onset is C; 
coda CC.14 Highly dialect-specific: Moroccan 
allows complex clusters, particularly in onsets.14 
Egyptian: Complex 1 
Chinese Mandarin, Mandarin: in coda only glides, nasals, or /r/ as a 
Cantonese: result of affixation; no obstruents. 
Moderately 
complex 
Turkish Moderately Generally no CC onsets. CC codas restricted: 
complex1 sonorant + obstruent; voiceless fricative + oral 
plosive; /ks/7 
 
Notes. Sources: 1. Maddieson (2013). 2. N??ez-Cede?o (2007). 3. Hall (1944). 4. 
?im??kov?, Podlipsk?, Chl?dkov? (2012). 5. Gregov? (2011). 6. Pouplier & Be?u? 
(2011). 7. Kornfilt (2013). 8. Parkinson (2009). 9. Davidson & Roon (2008). 10. Halle 
(1959). 11. Chew (2003). 12. Sadowska (2012). 13. Dell (1995). 14. Hamdi, Ghazali, & 
Barkat-Defradas (2005). 15. Grijzenhout & Joppen (1998). 
Overwhelmingly, the first languages of learners sampled in the corpus belong at 
least to the moderately complex type in their syllable structures. It is not clear to what 
extent the differences in syllable complexity impacts written production. Considering that 
task and register effects on consonant cluster simplification tend to favor more deliberate, 
monitored speech (Dickerson, 1974; Gatbonton, 1975), it seems that writing an essay for 
a proficiency exam would exert similar kinds of pressure and discourage cluster 
simplification. Furthermore, neither consonant cluster simplification nor similar 
hypotheses, such as the prosodic transfer hypothesis (Goad, White, & Steele, 2003), 
would explain oversuppliance of inflection. Nor do they explain any asymmetries in 
accuracy for morphemes with the same phonological realization, such as English plural 
61 
 
vs. third person singular ?s. Although limited, there is evidence that second language 
learners are more likely to simplify clusters formed by inflectional morphemes, rather 
than those occurring at the ends of monomorphemic words, in contrast to native speakers 
(Bayley, 1996; Saunders, 1987; Wolfram & Hatfield, 1984), which means that the 
simplification is sensitive to morphological phenomena and is not exhaustively explained 
by phonology. 
Corpus frequencies of inflected forms. To rule out the possibility that some 
inflected forms are more frequent in some target language than in others, a corpus 
analysis was conducted. The availability of high-profile, representative reference corpora 
varied for the target languages, and written corpora were the most easily available. 
Therefore, the following analysis will primarily focus on written corpus data. In each 
case, the most authoritative corpus sources with the most relevant annotations were 
prioritized. 
Table 2 Corpus frequencies of inflected forms in Czech (spoken, written) 
 
Form Written Frequency (ipm), ranking Spoken Frequency (ipm), ranking 
Present Tense 
1 Pers. Sg. 9,809.93, 3rd 28,334.69, 2nd 
2 Pers. Sg. 1,558.75 10,935.59, 3rd 
3 Pers. Sg. 33,192, 1st 38,158.94, 1st 
1 Pers. Pl. 4,649.37  7,074.83 
2 Pers. Pl. 2,682.46 1,879.84 
3 Pers. Pl. 10,077.94, 2nd 8,155.04  
Future Tense 
 
1 Pers. Sg. 247.97, 3rd 1,016.71, 2nd 
 
2 Pers. Sg. 81.09 541.68  
62 
 
3 Pers. Sg. 1,547.35, 1st 2,472.14, 1st  
1 Pers Pl. 202.95 594.49, 3rd 
 
2 Pers. Pl. 144.93 494.55 
 
3 Pers. Pl. 482.98, 2nd 560.07 
Note. Frequencies are listed relativized, in items per million, calculated as raw frequency 
(number of search results) divided by corpus size and multiplied by 1,000,000. 
 
For Czech, both spoken and written corpora were available. SYN2015 is a written 
representative corpus of 100 million tokens. It is compiled from traditional?as opposed 
to web-crawled?sources, such as fiction, non-fiction, and periodicals dating from 2010-
2014. It is lemmatized, morphologically tagged, syntactically annotated, and published 
within the Czech National Corpus framework (K?en et al., 2016). ORAL is a reference 
corpus of informal spoken Czech, containing over 500 hours of conversations between 
friends and family recorded between 2002 and 2011, or over five million tokens 
(Kop?ivov? et al., 2017). It is lemmatized and annotated for the same morphosyntactic 
features as the SYN2015 written corpus. 
Present tense. Both in written and spoken language use, the most frequent form 
used was the third-person singular (Table 2). The second-most frequent inflected form in 
the written corpus was third-person plural, followed by first-person singular. In the 
spoken corpus, the first-person singular was ranked second?higher than in the written 
corpus, whereas the second-person singular took third place?ranking higher than in 
written use. These results likely reflect the sampling of speech situations/genres in the 
63 
 
spoken corpus, which included spontaneous exchanges between two interlocutors and 
provided many contexts for the use of second-person singular forms (?informal? you). 
Future tense. Third-person singular continued to be the most frequent form in 
both written and spoken Czech (Table 2), followed by third-person plural and first-person 
singular in written Czech and by first-person singular and first-person plural in spoken 
Czech. These rankings of the future-tense forms were exactly the same as in the present 
tense in the written corpus; they deviated only by one position in the spoken corpus, 
where the third-most frequently used form was now the first-person plural?and not the 
second-person singular, as in the present.  
The present- and future-tense counts were added together for the purpose of 
representing them in the joint table comparing frequencies in all target languages (Table 
5). 
For Italian, the corpus that came the closest to desired annotation depth was 
PAIS?, a web corpus of 250 million tokens, compiled from texts around 2010 (Lyding, 
Stemle, Borghetti, Brunello, Castagnolli, Dell?Orletta, Dittmann, Lenci, & Pirelli, 2014). 
It is richly annotated for parts of speech with morphosyntactic properties, as well as 
dependency relations. It is not balanced or representative, but due to the nature of the text 
genres included (blogs, Wikipedia entries) its register can be considered somewhat less 
formal than that of a written corpus (e.g., based on fiction and periodicals) and spoken 
conversation. The counts reported in Table 3 are combined for all tenses. 
  
64 
 
Table 3 Written frequencies of inflected forms in a web corpus of Italian 
Form Frequency (ipm), ranking 
1 Pers. Sg. 1375.41, 3rd 
2 Pers. Sg. 
319.04 
3 Pers. Sg. 33270.48, 1st 
1 Pers. Pl. 804.47 
2 Pers. Pl. 
197.95 
3 Pers. Pl. 9699.24, 2nd 
Note. Frequencies are listed relativized, in items per million, and calculated as raw 
frequency (number of search results) divided by corpus size, multiplied by 1,000,000. 
 
The DWDS corpus of German, created at the Berlin-Brandenburg Academy of 
Sciences, was used to look up the frequencies of inflected forms. To concentrate on the 
most recent usage patterns, we focused on the sub-corpus comprising texts from the 
2000s, DWDS-Kernkorpus 21, a differentiated, but not balanced, collection of texts from 
newspapers, fiction, journalism, and scientific research literature (Geyken, 2007). The 
corpus contains 15?462?297 tokens from 12?186 documents and was automatically tagged 
for parts of speech but not morphological features. 
  
65 
 
Table 4 Frequencies of German inflected forms in a written corpus 
Form Frequency (ipm), ranking 
1 Pers. Sg. 3446.19, 3rd 
2 Pers. Sg. 758.23 
3 Pers. Sg. 5432.18, 1st 
1 Pers Pl. 932.78 
2 Pers. Pl. 16.08 
3 Pers. Pl. 5106.55, 2nd 
Note. Frequencies are listed relativized, in items per million, calculated as raw frequency 
(number of search results) divided by corpus size, times 1,000,000. 
 
The absence of morphosyntactic annotation necessitated additional strategies for 
obtaining valid data. The iterative process of refining the search criteria is detailed below 
for the first person singular; only its results are reported for the other person-number 
combinations, unless there were additional changes to the search criteria prompted by 
challenges particular to any given form. 
As a starting point of the search, I combined syntax searching for parts of speech 
(finite thematic verbs, finite auxiliary verbs, and finite modal verbs) with regular 
expressions referring to letters corresponding to German verbal affixes (e.g., -e, -st) in 
word-final positions. This process yielded results that could be further improved: for 
example, this search returned false positives, such as third-person singular preterite and 
subjunctive, which are indistinguishable from first-person singular forms based on their 
surface attributes alone.  
An inspection of the first 100 results from this search revealed that only 56 out of 
100 tokens were, in fact, first-person singular forms; one form was a participle, and the 
others were third-person preterite and subjunctive forms falsely identified as first-person 
66 
 
singular based on their surface attributes. Based on the low accuracy of the search, the 
criteria were augmented to specify that the verb also had to co-occur within the personal 
pronoun (ich, ?I?). To determine the appropriate window within a sentence to specify, we 
counted the number of times within the same sample of 100 results that the subject and 
verb were n words apart. Even though there were extreme cases where the verb and its 
subject were 9, 10, or 13 words apart, in the vast majority of cases (71%) the subject and 
verb were directly adjacent (that is, one word apart), and in a further 17% of them the 
distance ranged between two and four words. A window of six words covered 94% of the 
sample and was deemed appropriate. We specified the final search criteria to look for 
occurrences of words ending in ?e and tagged as finite thematic verbs within eight 
positions of the pronoun ?ich? in either direction (left or right). The eight-position 
window was chosen to make the search more permissive and allow for contingencies 
such as the inclusion of an adverb or a preposition phrase between the subject and the 
verb. 
This reduced the number of hits from 109,023 (with false positives) to 21,489. A 
sample of 100 hits revealed 14 incorrectly included forms, improving the accuracy of the 
search from 56% to 86%. These incorrectly returned cases involved duplicates that were 
produced when the same verb token was counted twice?first with its true subject and a 
second time with another instance of ?ich? in a following clause, for example the 
sentence I like to cook, but I prefer to eat out when I am tired would produce the 
following hits: I like, I prefer and I am?matching the true purpose of our investigation, 
but also like [to cook, but] I and prefer [to eat out when] I. In German all such cases 
involve a comma, so the solution was to search for all such instances separately and then 
67 
 
subtract their count (3787) from the total of 21,489. This yielded a count of 17,702, or 
1144.85 items per million.  
In addition, I searched for forms of modal and auxiliary verbs that do not end in ?
e, such as kann, muss, soll, darf, wei?, specifying the same criteria for co-occurrence with 
an overt subject pronoun as for finite thematic verbs. This was necessary only for the first 
person singular, because all the other person-number combinations in German are 
expressed through identical, non-zero inflectional endings on both thematic and modal 
verbs. 
The same process was followed to obtain counts of second-person singular forms 
(ending in ?st or ??t). In this case, misclassification issues were due to the overlap of 
second- and third-person singular surface forms for stems ending in ?s: for example, 
forms such as erweist or schie?t (from erweis-en and schie?en, respectively). However, 
this was helped by restricting the search to forms occurring in the vicinity of the second-
person singular pronoun (?du?), within the same span as that described for first-person 
singular above. No separate search was necessary for forms of modal verbs, since their 
endings match those of thematic verbs. 
For the third person singular, it was not possible to restrict potential subjects in 
the same way as for the first and the second person: subjects can be expressed both 
through pronouns and noun phrases, but in the absence of morphosyntactic annotations 
on the nouns the search cannot be restricted to singular nouns only. Thus, as an 
approximation, we searched only for tokens of finite verbs ending in ?t occurring within 
the same window as a closed class of pronouns?personal pronouns in the nominative 
case (specified as an exhaustive list) and certain indefinite pronouns (e.g., ?somebody?, 
68 
 
?nobody?, ?one??jemand, niemand, man). In addition, the same search procedure was 
applied to return instances of modal verbs that do not end in ?t: the same list of 
pronominal subjects was specified within the same span of words from the verb (eight). 
This method, admittedly, underestimates the frequency of third-person singular forms in 
written discourse. Since it is the relative prevalence of inflected form, not their absolute 
frequencies, that is at the heart of this analysis, this was deemed an appropriate 
compromise. 
For the first person plural, which is homonymous with third- and second-person 
plural (formal), we specified as criteria the co-occurrence with the pronoun ?we?, but also 
with the phrases ?I and_?, ?_and I?. In the case of ?I and_?, we expanded the search region 
following it (preceding the verb) from eight to ten positions, to allow for two-word 
determiner-noun phrases, such as ?I and my sister?, or ?I and the neighbor?. The number 
of results with this type of subject was very low, 15, and was corrected manually to 
exclude wrong matches that captured fragments of adjacent clauses. 
The search for forms of the second person plural was complicated by the fact that 
the second person plural pronoun is homophonous with the feminine possessive, ihr 
(?her?), and the dative of the feminine pronoun sie (?she?). This meant that applying the 
criteria used for the other forms?specifying the inflectional ending ?t in the vicinity of a 
pronoun?returned many false positives, including strings where ihr was preceded by a 
preposition or modified a noun which was the true subject of a third-person singular verb. 
With third person singular verbs also ending in ?t, the difference between the third person 
singular and second person plural is often in the presence or absence of root vowel 
processes: for example, compared to the citation form of the verb ?to read?, lesen, the 
69 
 
third person singular form is marked both by the ending ?t and by the vowel change from 
e  to ie, liest (?[he/she/it] reads?), whereas the second person plural only has the ?t ending 
(ihr lest). it was impossible to narrow down the criteria sufficiently through the search 
syntax alone, but for a handful of modal verbs that have this vowel pattern it was feasible 
to specify exclusion criteria and rule out their third person forms, specified individually. 
From the results produced by this process, two samples of 100 tokens each were 
examined?one with the search results for the SV word order and the other for the VS 
word order. Within each sample, the number of true second-person plural forms was 
tallied and divided by 100 to produce an accuracy estimate for this search method. The 
total number of hits returned by the search was then multiplied by this percentage to 
produce an adjusted estimate of the form?s occurrence. The accuracy of this search for 
the SV word order was 11%, or 11 target forms out of 100 results produced; for the VS 
word order, the figure was 5%. 
For the third person plural, the procedure was the same as for the third person 
singular but with the personal pronoun ?they? (sie) and the plural definite article (die) 
specified as desired neighbors within the same eight-word window (in the case of sie) or 
within nine words, in the case of die, to accommodate the following noun. Just as was the 
case for the third person singular, this method underestimates the true rate of occurrence 
of third person plural forms, because it does not account for subjects that are bare plural 
nouns (without a definite article) or subjects that are expressed as coordination 
constructions of two singular nouns (der/das_ und der/das_), except when one of the 
coordinated nouns is a feminine noun with a definite article, which is homonymous with 
the plural definite article. 
70 
 
Samples of 100 results were examined for accuracy for the SV and VS word-
order searches. In the SV sample, 58 tokens were true matches. A further 33 tokens were 
not, in fact, third person plural forms but matched the search criteria due to appearing 
adjacent to the pronouns specified; these would all be ruled out by the corrective search 
procedure specified above, which searches for sequences containing a comma; out of 
these 33, only three would be wrongly excluded by this procedure. Thus, the percentage 
of correct matches would be 88%, or [58 + (33-3)]/100. The remaining nine incorrect 
search results matched the criteria superficially but were not true third person plural 
forms: for example, they involved verbs ending in ?en appearing close to an accusative 
use of sie (feminine) or with a subject that included a plural noun followed by a 
coordination construction with I, leading to a first person plural interpretation. There 
were no additional restrictions that we could specify to rule out such cases. 
In the VS sample, 33% of cases outright were ones included correctly. A further 
50% were incorrectly returned forms captured by the permissiveness of the search, all of 
which would be ruled out by the procedure excluding sequences with commas. Only one 
token would have been falsely excluded based on this search.  Finally, 16 tokens 
represented cases that would not be affected by excluding sequences with commas: 
mostly, they were first-person plural verbs followed by a direct object with the definite 
article die (accusative of feminine and plural). Combined, these results represented an 
83% accuracy of the search results. 
Separately, the same search was conducted for the form sind (?are?), since it 
would evade the search for forms ending in ?en. The results were then added to the 
counts of forms ending in -en, producing the numbers reported in Table 4. 
71 
 
In all three languages, the first four positions in the frequency ranking of inflected 
forms was identical (Table 5), with the third person singular being the most frequent, 
followed by the third person plural and the first person singular. The least frequent form 
in German and Italian was the second person plural, whereas in Czech it was second to 
last, outranking the second person singular. 
Table 5 Comparison of rank orders of inflected forms in written German, Italian, and 
Czech 
 Target Language 
Ranking of forms German Italian Czech 
1, most frequent 3rd person singular (she) 3rd person singular (she) 3rd person singular (she) 
2 3rd person plural (they) 3rd person plural (they) 3rd person plural (they) 
3 1st person singular (I) 1st person singular (I) 1st person singular (I) 
4 1st person plural (we) 1st person plural (we) 1st person plural (we) 
5 2nd person singular (you) 2nd person singular (you) 2nd person plural (you Pl.) 
6, least frequent 2nd person plural (you Pl.) 2nd person plural (you Pl.) 2nd person singular (you) 
 
These data indicate that there are little to no material differences in the frequency 
of occurrence of the inflected verbal forms among the target languages. Any 
discrepancies in learner success on these forms cannot be explained by differences in 
input properties?at least, in its written usage. 
Table 6 Summary of key differences between the morphological systems of target 
languages 
 Target Language 
Attribute German Italian Czech 
Synthetic tenses 2 4 1 
Analytic tenses 3, pers.-num. agreement 3, pers.-num. agreement on 2, pers.-num. agreement 
on auxiliary auxiliary, participle on auxiliary, participle 
(gender/number) (gender/number/animacy) 
Gender not expressed participles participles 
72 
 
Frequency of 5th: you-Sg.; 6th: you-Pl. 5th: you-Sg.; 6th: you-Pl. 5th: you-Pl; 6th: you-Sg. 
inflected forms 
Verb classes 2, effect on participle 3 (thematic vowels), no 4-6, effect on endings9 
endings effect on endings 
 
Conclusions. Integrating these observations on paradigm complexity across the 
multiple tenses, mood, and voice combinations (Table 6), one notes the general ranking 
from German at the least complex end (with minimal complications in the form of 
regular/irregular differences), to Italian (complex with additional complications in the 
form of regular and multiple irregulars), to Czech at the most complex pole?with the 
most numerous distinctions and with the highest number of perturbations caused by 
lexical class groupings and the alternations they condition. Even though there are a few 
nuances in operationalizing complexity, which result in slightly different counts, it is the 
relative complexity of the three TLs that matters for the present study. 
On a morpheme-by-morpheme view of learning espoused by general cognitive 
accounts, accuracy on inflection should be highest (or achieved earliest) in German, 
followed by Italian and Czech, potentially differing in the relative ordering of Italian and 
Czech learners in the present versus the past tense. By contrast, if forms interact in 
development, and richness is beneficial (or at least not detrimental) to learnability, 
production accuracy would be positively related to the number of distinct morphological 
forms in the target language.   
73 
 
Chapter 4: Methods?Corpus Study of Written Learner 
Productions 
The present chapter will describe the methods employed in the study, including 
data sources, elicitation tasks employed, and the cleaning and coding of data. 
Additionally, background information is provided about the distribution of learners? L1s 
among the target languages, with the goal of ruling out potential differences that could 
bias the results. The procedures involved in the actual analysis of the data are presented 
in detail in Chapter 5, considering the iterative nature of the decisions made at each step 
and their close connection to the results. 
4.1 Data Source and Learner Backgrounds 
Written production data by learners of German, Italian and Czech were obtained 
from the Merlin corpus (Wiesniewski et al., 2013), which spans the levels from A1 to C1 
of the CEFR proficiency framework and contains essays written as part of foreign 
language proficiency exams in the respective target languages. The written essay 
responses were collected by the Merlin project and subsequently rerated by its staff 
utilizing the CEFR guidelines. The test-takers were speakers of a variety of L1s, and the 
implications of this diversity for the results of this study will be addressed in the 
following sub-section (Learner L1s). Despite this variation, L1 influence alone does not 
exhaustively explain grammatical difficulties in the second language, as was shown in 
Chapter 1. 
The texts in German and Italian were the writing sections of proficiency exams 
developed and administered by the provider telc; the Czech texts came from proficiency 
tests administered by the test center at the Institute of Language and Preparatory Studies 
74 
 
at Charles University in Prague. Both providers are members in the Association of 
Language Testers in Europe and are audited by this organization (Texts and test 
institutions, n.d.).  
Consistent with the CEFR approach to testing, the writing tasks were designed to 
be representative of everyday language use and to have a communicative impetus in the 
form of a prompt. For example, a task in the informal register included responding to a 
friend?s invitation to visit and asking what kind of birthday present their child would 
prefer. To elicit language use in the formal register, for example, the learner would 
respond to a mock-up of a job ad and inquire about a few topics specified in the prompt. 
Performance elicited by such tasks can be expected to be monitored but also relatively 
spontaneous, due to the task?s communicative relevance and lack of focus on accuracy 
alone. 
Since the essays that provided the data had been collected in the language testing 
context, the intended proficiency levels of the tests did not always match the proficiency 
levels of learners as rated by test raters (Table 7). For example, a test-taker of a B1 
proficiency exam in Czech may have been rated higher (e.g., B1+) or lower (e.g., A2) 
than the stated level. All references to learner proficiency in this paper denote proficiency 
as rated by raters, not the proficiency levels targeted in the tests.  
 
Table 7 Merlin corpus statistics: Number of texts re-rated at each CEFR level 
Target  CEFR Level 
Language  A1 A2 A2+ B1 B1+ B2 B2+ C1 
German Texts 57 199 107 217 115 219 73 42 
Sentences 280 1334 977 2322 1583 3274 1071 603 
75 
 
Czech Texts 1 76 112 90 75 72 9 4 
Sentences 5 787 1728 1668 1116 1103 124 73 
Italian Texts 29 289 92 341 53 2 
0 0 
Sentences 180 2475 901 4314 707 20 
 
Because data were extremely sparse for Czech and Italian at the A1 level and 
Italian at the B2 level (as indicated by the low counts of learner texts and sentences), only 
data from levels A2 through B1+ were included in the study. This was done to maximize 
the comparability of the data and to explore the full range of variation associated with 
linguistic typology at each level. 
Learner texts in the corpus came tokenized, lemmatized, and automatically tagged 
for parts of speech, morphological features, and syntactic dependencies. In addition, 
manual annotations were supplied by the Merlin project for ?target hypothesis??
minimally corrected versions of language produced by the learners. For example, a 
learner?s sentence such as ?Yesterday I *walk home? would receive the target hypothesis 
annotation of ?Yesterday I walked home?.  
Learner L1s. Learners? first languages varied among the target languages. To 
pinpoint the sources of this variation and to determine whether the L1s were distributed 
in ways that could jeopardize the analysis of learners? accuracy, a chi-square analysis of 
independence was conducted. To this end, L1s were aggregated into seven categories 
(Figure 1), and their counts were cross-tabulated against the TLs. The groupings were 
meant to reflect any relatedness of the L1s or any commonalities in their morphological 
systems.  
The first and, perhaps, least interesting group, was comprised of learner texts for 
which L1 information was either not reported (i.e., not collected) or reported as ?Other?. 
76 
 
Secondly, a number of Romance languages were represented (Spanish, Italian, 
Portuguese, French) in the sample, particularly among learners of Italian and German. 
Counts of learners who were L1 speakers of Slavic languages were tabulated together and 
included Russian (more common among learners of Czech and German), Polish (more 
common among learners of German and Italian), Czech (only a handful of learners of 
German and Italian), and Slovak (one learner of Czech). Next, English and Chinese were 
grouped together owing to their relatively poor inflectional morphology. This decision 
was motivated by the low number of learners with Chinese as their L1 (10), which would 
have resulted in expected cell counts less than five for each TL if Chinese had been kept 
as a separate category. The grouping with English resulted in higher expected counts and 
allowed the use of the Chi-squared test of independence on the cross-tabulation of L1s. 
Also grouped together, despite being unrelated, were Turkish and Hungarian, on the 
grounds of both being agglutinating languages with rich morphological systems and 
neither being related to any of the TLs or the other L1s. Finally, Arabic formed one of the 
two single-language categories, owing to its unique morphological properties?
discontinuous, interlocking roots and patterns; the other single-language category was 
German. On the one hand, its morphology was considered richer than English (or 
Chinese), making their grouping undesirable; on the other, it was not meaningfully 
related to any of the remaining L1s.  
77 
 
 
Figure 1. Learner L1 backgrounds by Target L2, aggregated across all proficiency 
levels. 
The resulting contingency table of target languages and L1 languages and 
language groups was submitted to a chi-square test of independence, conducted in R 
software using the chisq.test function in its base stats package. The chi-square test 
revealed a significant association between TL and L1 group:  ?2 (12) = 858.3, p < .001. 
To examine the sources of this association, Pearson residuals were obtained for each cell 
and plotted in proportion to their magnitude (Figure 2, left panel) using the corrplot 
package (Wei & Simko, 2017). In addition, for each residual a percentage score of its 
influence on the total Chi square statistic was calculated, according to the formula: 
squared residual / chi square statistic *100%. The results of this procedure are 
represented in Figure 2 (right panel). Notably, German as the L1 was only a represented 
78 
 
among learners of Italian and Czech and was logically impossible as a cell value for 
German as the TL. 
Figure 2. L1-TL contingency table Chi-square test residuals (left) and their % 
contribution to total statistic (right). 
Italian had an overrepresentation of learners whose L1s were agglutinating and 
whose L1 was German, compared to what would be expected based on chance alone; it 
also had fewer than expected speakers of Arabic, English or Chinese, and Slavic 
languages as an L1, as well as those learners for whom L1 information was absent. 
German, by contrast, had an overrepresentation of these exact groups?Arabic, English / 
Chinese, and Unknown L1 backgrounds. Czech had a higher representation of learners 
with Slavic L1 backgrounds and, less so, German; underrepresented were learners whose 
L1s were agglutinating or Romance languages, as well as Arabic. 
Next, an inspection of the percentage contribution of these differences to the total 
chi-square statistic revealed that the chi-square test was driven primarily by the absence 
of German L1 among learners of German, as well as the high proportion of learners of 
German for whom L1 information was not reported. The patterns in the distributions of 
79 
 
L1s among the TLs were examined in closer detail because of their potential ability to 
invalidate any findings of differences among TLs. First, vastly different proportions of 
L1s closely related to the TL could mean that the TLs with higher concentrations of such 
learners would be at an advantage overall or with respect to particular error types. 
Second, equally problematic would have been a situation where the TLs would have 
vastly differing proportions of learners with backgrounds in some of the ?richer? L1s, as 
opposed to some of the inflectionally ?poorer? L1s.  
With respect to the first possibility, none of the L1s in the sample were closely 
related to German. By contrast, among learners of Czech, having an L1 background in 
one of the Slavic languages was more common than expected. Finally, among learners of 
Italian L1 knowledge of a Romance language was represented at a frequency consistent 
with the expected count. Romance languages were, in fact, more common among learners 
of German than Italian. The contributions of the differences in numbers of Romance and 
Slavic languages as L1s to the total chi-square score were not sizeable (Figure 2, right 
panel). One potential effect of these differences could be an advantage for the speakers of 
the related L1s on those aspects of inflected forms that relate to their phonological 
makeup, due to the presence of shared roots or inflectional endings (between the L1 and 
the TL). On such an account, therefore, learners of German would commit more 
phonological errors, owing to their not speaking L1s closely related to it; by contrast, 
learners of Italian and Czech?each with a group of speakers of related L1s?would be 
more familiar with some of the roots or inflectional endings and would confuse their 
phonological makeup less frequently. 
80 
 
With respect to the second possibility (involving differences in proportions of L1 
speakers of inflectionally ?richer? or ?poorer? languages among the TLs), the only group 
of L1s that could be considered noticeably ?poorer? than the rest was English and 
Chinese. Overall, there were few learners with these L1s, and they were concentrated in 
the groups of learners of Czech and German, representing 3% (12 out of 353) and 5.6% 
(36 out of 638) of these groups, respectively. The residual values showed that English 
and Chinese were overrepresented among learners of German and underrepresented 
among learners of Italian (Figure 2, left), but both contributed only minimally to the total 
chi-square statistic (Figure 2, right).  
Thus, a pattern of findings where German learners show a higher rate of 
phonological errors than learners of Italian and Czech could be indicative of being biased 
by the distribution of L1s in the sample. In this case, the absence of learners with L1s 
closely related to German would mean that they couldn?t capitalize on any phonological 
similarities?in contrast to those learners of Czech and Italian who were speakers of 
Slavic and Romance L1s, respectively. Similarly, a pattern of errors where German 
learners demonstrate a higher prevalence of bare, uninflected forms than learners of 
Italian could have as its source the learners with English and Chinese as their L1s in the 
German corpus than in the Italian corpus. This difference, nevertheless, would not be 
expected to affect the use of infinitival forms instead of finite ones, owing to the fact that 
in all TLs infinitival forms are morphologically marked and, therefore, supplying this 
marking would not reflect L1 transfer. 
  
81 
 
4.2 Procedure 
The Merlin corpus was queried through its online interface (ANNIS?Krause & 
Zeldes, 2016) to extract errors of types that were broadly relevant to the present research.  
The frequencies for each error type were obtained for each target language at each CEFR 
level separately. The following search terms were used: EA_type = ?G_Agr? for errors 
annotated in the corpus as agreement errors; G_Inflect_Inexist_type=?verb? for errors 
annotated as uses of inexistent inflected forms of verbs; G_Verb_compl_type = ?ch? for 
errors involving the wrong form of a complement or auxiliary. In addition, a search was 
conducted for the ?sentence? annotation to return the number of sentences in each sub-
corpus, with the goal of using the sentence counts to relativize the observed number of 
errors for each TL-proficiency combination when plotting the data. 
Second, the errors were coded by speakers of the target languages according to 
the scheme described in the following section (Error categories and their significance). 
The procedure for it was piloted on the German sub-corpora A1 and A2, which finalized 
the categories of errors to be used for all TLs. For Czech and Italian data, coding was 
carried out by target-language experts, who were either native speakers of these 
languages or had academic training in them which was coupled with experience residing 
in a community where the target language was the primary language. This step also 
entailed data cleaning, such as removal of duplicates or instances that reflected a lack of 
lexical or orthographic knowledge rather than the knowledge of inflectional morphology 
of the TL. 
Finally, the data were modeled using the Poisson regression approach. Data were 
analyzed broadly following the steps outlined in Hilbe (2016) for modeling count data, 
82 
 
proceeding from assessing the dispersion of the data to choosing a final best-fitting 
Poisson regression model, examining its coefficients, interactions and pairwise contrasts. 
The resulting models were then validated on out-of-sample data. The specifics of these 
steps and the conclusions drawn from them will be presented in Chapter 5. 
4.3 Error Categories and Their Significance 
This section links the source data annotations to the categories of analysis 
developed for this study, based on theoretically driven interpretations of the processes 
that could have generated the errors. The error categories are presented in Table 8 with 
examples from each TL illustrating them. The present section focuses on a conceptual 
description of the coding scheme and the rationale behind treating certain groups of 
errors separately. The practical application of the coding scheme and the decisions made 
in less clear-cut situations are the subject of the following section (Cleaning and Coding 
of Data). 
Three error annotations in the Merlin corpus were relevant to this research and 
served as the foundation for the tailored classification described in the remainder of this 
section: agreement errors, inexistent inflection errors, and verb complement errors. The 
?agreement? annotation in the corpus captured substitutions of one inflected form for 
another, regardless of how the two forms related to each other. By contrast, in the present 
study the directionality of these substitutions was deemed meaningful. Substituting an 
infinitive or bare (uninflected) form where a finite, inflected one is required, can speak to 
the strength of the ?Tense? feature in the underlying grammar, according to some 
accounts proposed for child acquisition of English as an L1 (e.g., Yang, 2002). However, 
substitutions between two finite inflected forms would not warrant such an interpretation, 
83 
 
since both forms (the target form and the one substituted for it) could be considered 
reflecting a learner grammar that expresses Tense. One could speculate that substitutions 
of this kind may be traceable to lexical selection?on the assumption that inflected forms 
are stored unanalyzed in the mental lexicon?or to the application of a wrong 
compositional rule. To honor the different conclusions that can be drawn from these 
errors, tokens annotated as ?agreement? errors in the corpus were separated into 
substitution, infinitive, and bare form errors in the study. 
Going in the opposite direction, learners also sometimes use finite forms in 
contexts where non-finite ones are required: for example, on verb complements (with a 
modal auxiliary, or in analytical tense-aspect and mood forms). For comparison, in 
English this would amount to saying I want to *reads that book or She has *reads that 
book. In the Merlin corpus, such instances were annotated as ?Verb complement? errors, 
and this tag applied without differentiation with respect to the type of complement 
required and also included incorrect auxiliaries?again, without taking into account the 
direction of the error or what specifically made it wrong. For instance, uses of the 
infinitive in lieu of a past participle, or selection of wrong auxiliary were tagged as ?Verb 
complement? errors. In this study, however, overuse of inflection formed a separate 
category and covered the use of finite forms or participles in contexts requiring an 
infinitive. By contrast, uses of the infinitive in lieu of a participle were captured by the 
?infinitive? category described above, whereas auxiliary selection errors were dropped 
from the analysis altogether.  
The third error annotation (?inexistent inflection?) in the corpus encompassed any 
departures from target-like orthography, as well as any inaccuracies in the segmental 
84 
 
composition of the form, whether they concerned the root, any suffixes, or endings alike, 
as long as they resulted in an inexistent form. This annotation was applied to errors vastly 
differing in their proximity to the target form. On the one hand, some forms were mostly 
on the right track and could be traced to a misapplication of rules and patterns present in 
the TL grammar. For example, the past participle form *gekaufen (used in place of 
gekauft) does not exist in German, yet it follows the familiar inflectional template of past 
participle formation for irregular verbs (ge- plus -en) and is easily interpretable both with 
respect to the lexical verb and its intended grammatical meaning. On the other hand, 
other errors deviated further from the target and may have stemmed from a failure to 
recall the phonological composition of the root or ending. For instance, this could occur 
as a result of phonological processes such as consonant cluster simplification or the 
vagueness of lexical representations themselves. In this study, therefore, the ?Inexistent 
inflection? tag was separated into errors of verb class (misapplication of inflectional 
patterns associated with a verb class), root?involving wrong application or non-
application of a root alternation process, bare?dropped inflectional ending, and 
phonological?errors with incorrectly specified segments of the root or ending (unless 
associated with another verb class or existing root alternation process). Admittedly, 
phonological errors partially overlap with orthographic errors. More detailed reasoning 
behind separating the two is presented in the following section (Cleaning and Coding of 
Data). 
  
85 
 
Table 8 Error types adopted in the coding scheme, with examples from each TL 
German Italian Czech 
Substitution 
A2, English: Ich *gratuliert max zum A2, Hungarian: *Vogliano A2, German: Kdy *bude (bude?) 
er geburtstag hat. (vogliamo) mangiare cibo culiniare. na nadra???  
B1, French: Wir *werde das bei dir  
feiern. 
Infinitive 
A2, Arabic: und hute ich *treffen mit A2, German: Da lunedi a venerdi ho A2, German: Bych *t??it (t??ila 
michael bei eine Kafftref in stadt *frequentare (frequentato) un corso se) m? velmi.  
um zwei Uhr mittag. italiano. 
B1, L1?Other: Wenn man Deutsch 
lernet, dann *bekommen man 
einfache ein Job. 
Overuse 
A2, English: Am n?schte A2, German: Vuole *apprende A2, L1?Other: V kolik hodin 
Wochenende m?chte ich bie dir (apprendere) un po? d? italiano.  bude to *za??n? (za?ne)? 
*komme. B1, Hungarian: Vorrei ti *aiuto 
B1, Spanish: Wurde ich Freizeit (aiutarti).  
*habe? 
Root 
A2, L1 not reported: Kannst du bei A2, German: E cosa *potriamo A2, Russian: [..] *sv?t? (sv?t?) 
mir *hiffen (helfen). (possiamo) fare la sera? sl?nce. 
B1, L1 not reported: Deshalb m?chte 
ich mich *bewarben (bewerben). 
B1, L1 not reported: Vielleicht, du 
*k?nnst (kannst) zum meinen 
Geburtstag kommen. 
Class 
A2, L1 not reported: [..] dann hat im A2, Turkish: I tailandese *cuoca A2, German: *Muse? (Mus?? m?) 
krankenhaus *geblibt (geblieben). molto bene. meho navst?vit dopoledne.  
B1, L1 not reported: Ich habe dein B1, Hungarian: Mi piace *lavorere 
Brief *bekommet (bekommen). (lavorare) in gruppo con altri. 
Bare 
A2, Spanish: Wenn du deine B1, Hungarian: ma io non sono A2, German: Co to *stuj (stoj)? 
ausbildung Fertig *gemach potuto *giocar (giocare) a tennis. 
(gemacht) Haste. 
86 
 
B1, Spanish: Ich *arbeit (arbeite) gans 
toll im Team [..] 
Phonological 
A2, Arabic: Ich *Fr?ch (freue) mich A2, German: O che cosa *possimo A2, English: Hraja (Hraj?) si na 
F?r anne [..] (possiamo) fare la sera? pl??i a vypadaj? 
B1, Russian: Wieleicht *m?chtes 
(m?chtest) du die Reise machen? 
Note. Errors relevant to the present study are marked with asterisks. Corrected 
forms of the verbs are listed in parentheses. 
 
4.4 Cleaning and Coding of Data 
After error tokens were extracted from the Merlin corpus (see Procedure for the 
search syntax used), data were inspected and cleaned. For German data, this was done by 
the researcher; Czech and Italian data were cleaned by the target language experts who 
also conducted the coding. First, some cases were excluded either because they were 
duplicates or because the departure from the target form was deemed not to involve 
inflection or a process broadly associated with inflection. Table 9 summarizes types of 
cases that were excluded and provides illustrative examples, whereas the remainder of 
this section will present each in turn, in narrative form. Finally, some common decision 
points that occurred during coding will be presented for each error category. As a rule of 
thumb, both data exclusion and data coding decisions were motivated by the following 
principles: minimizing interpretation of learner intent; focusing on inflectional endings 
before anything else; excluding errors of selection and semantics (e.g., appropriate lexical 
elements; auxiliaries; tense and mood). 
Exclusions. The types of departures from target-like use presented in this section 
were not considered errors, as long as they were the only departures from target-likeness 
87 
 
for a given token. When they co-occurred with another error that was relevant to this 
study, they were ignored (but noted), and the verb token was classified on the basis of the 
other, relevant, error. For example, an instance classified as a misspelling (German: Ich 
*arbite) was not counted among any error category if its agreement was correct. 
However, an instance of Ich *arbiten was classified as an infinitive use error with an 
additional misspelling notation, due to the verb?s being apparently non-finite (or plural), 
whereas Ich *arbitet was considered a substitution error (based on the inflectional ending 
associated with the wrong person-number combination). 
The category ?Ambiguous? captured those tokens for which it was impossible to 
determine with certainty what the learner was trying to convey. For example, in German 
some tokens were ambiguous with respect to their part of speech, or multiple verbs were 
strung together without it being obvious how they related to each other. Rather than 
engaging in mind-reading, the data coders stayed as firmly as possible within the realm of 
what was uttered and, when conflicting interpretations presented themselves, preferred to 
exclude the token altogether.  
Next, not considered errors were misspellings or typos (?Typo?, Table 9). This 
group included instances such as single consonants where double were required, and vice 
versa; missing or superfluous diacritics (including on vowels); missing or superfluous 
umlaut (in German), as long as the presence or absence of the umlaut was not a legitimate 
root alternation process. For example, the token of bis?che (instead of besuche) was 
considered correct because in the indicative mood there is no alternation involving u and 
? among thematic verbs (Table 9). By contrast, the modal verb m?ssen does have a vowel 
alternation (with u, in some person-number combinations), and, therefore, instances of 
88 
 
m?ss were classified as errors (superfluous root process). Tokens involving the letters e 
and i, and their combinations (ei, ie) and transpositions, were treated as correct, as long 
as the identity of the word could still be determined and unless the change of the letters 
was part of an existing alternation. For example, *arbite (instead of arbeite) was treated 
as a correct token, because there is no ei-i root process in the present tense of the 
indicative. By contrast, *hilfen instead of helfen was treated as an error classified as a 
superfluous root alternation (because e/i is a legitimate root alternation in the indicative 
present tense and in the lexical family sharing the root). Similarly, the a/? alternation is 
represented among thematic but not modal verbs, meaning that an instance such as *f?hre 
(instead of fahre) was classified as an error of superfluous root change (because some 
person-number combinations of this verb include ?), whereas a form such as *k?nn 
(instead of kann), by contrast, was classified as a non-existent root, since no person-
number combinations of the verb exist with the vowel ?.  
The next group of exclusions involved stylistic and pragmatic errors. In Czech, 
uses of Common (vernacular) Czech forms were excluded and not considered errors, 
taking into account the fact that learners? histories and experiences in TL speech 
communities were unknown. Pragmatic errors concerned the use of ?formal? and 
?informal? forms of address?for example, uses of verb forms agreeing with du 
(?informal? you) when a formal one agreeing with Sie (?formal? you) was called for, and 
vice versa. Stylistic errors included any non-targetlike uses that appeared to be 
situationally determined or involved the knowledge of the ?educated? usage prescribed 
for the TL. For example, in German there is the so-called ?introductory es? which can 
function as a false subject (as opposed to referring to a singular entity in the third person): 
89 
 
in these cases, the verb agrees with the ?true? subject that follows, not the ?es? 
pronoun?Es kamen-Pl. 18 Besucher (not: Es *kam-Sg. 18 Besucher). However, if 
learners used ?es? as the target of agreement, such tokens were excluded. Other special 
cases of agreement in German involve complex predicates with nominals, which can 
create confusing situations depending on the grammatical number of both the subject and 
the predicate nominal. These cases were also considered to be well outside the scope of 
?mainstream? agreement in L2 learning and were, therefore, excluded. 
Exclusions related to selection included choices of wrong lexical items but also 
the choice of a wrong auxiliary, which in Italian and German, in part, depends on the 
knowledge of detailed semantic properties of verbs. For example, forms requiring a 
choice between haben and sein in German are the perfect and pluperfect tenses, and 
between essere and avere in Italian?passato prossimo or trapassato prossimo. In a 
similar vein, uniquely to Italian and Czech, cases involving wrong gender marking on the 
participle (in analytic tenses) were excluded. This was because accurate agreement 
production in these cases requires lexical knowledge of the grammatical gender of the 
nouns they agree with as much as a command of inflection proper. Non-targetlike uses of 
verb tense and mood forms were also not considered errors, as long as they were 
correctly marked for agreement: appropriately choosing tense and mood was deemed a 
problem that is more semantic in origin. 
  
90 
 
Table 9 Examples of data excluded during data cleaning 
Reason for exclusion Examples 
Ambiguous G, A2, Turkish: wann *ist deine Kinder *sein (not clear). 
Typo G, A2, Arabic: Ich *bis?che (besuche) Wir nechst Woche. 
It, A2, German: Mentre I miei genitori *preferisconno (preferiscono) che lei 
aspetter? ancora.[..]. 
Auxiliary selection G, A2, L1 Unknown: [..] dann *hat (ist) im krankenhaus *geblibt. 
It, A2, German: Non *ha (?) cambiato tanto solo qualcosa. 
Gender It, A2, German: perche mio capo ha *fissata (fissato) una riunione [..] 
Cz, A2, L1 Unknown: Jak? hezk? *byla (bylo) po?as?! 
Analytical form G, A2, Russian: Sie *ist (no verb needed) *wonnen in Haus. 
Cz, A2+, German: Ve kolik hodin *bude? pr?jet (p?ijede?) na n?dra??? 
Superfluous inflected G, A2+, L1 Unknown: [Es ist] sehr gut zu horen, dass du die Pr?fung bestanden 
element *gemacht (no participle needed, missing auxiliary). 
Missing or wrong elements It, A2, Polish: Io *voreii *invitare (invitarti) *ti a cena. 
(reflexive, conditional Cz, A2+, German: *Bychom (bych) se t??ila, [..] 
particles, verb prefixes) Cz, A2, German: *Platim (Zaplat?m) se kdy? sedjeme. 
Tense It, A2, German: [..] non ci *abbiamo *visto (vediamo) Io sono ritornata.  
Cz, A2+, German: *Budu (byla) bych cel? vikendu. 
Lexical G, A2, L1 Unknown: Ich *nahme Fahrkarte Bus gekauft. 
It, A2, French: Come lo *savete *stiamo (siamo) partiti in Italia [..] 
Stylistic G, A2+, L1 Unknown: Meine ganze Familie *leben (lebt, singular) in Serbien. 
G, A2, Hungarian: [..] wann es *ist (sind)  zwei S?hne, [..] 
Cz, A2+, L1 Unknown: [..] jestli m?te parkovi?t? (proto?e *jedem /jedeme/ autem). 
Note. Errors are denoted by asterisks (*); target forms are listed in parentheses 
directly following the errors. 
91 
 
Also excluded were cases when superfluous elements were added, such as 
unnecessary auxiliaries or semi-lexical verbs (e.g., ?give?, ?do?) that appeared to have an 
intended aspectual meaning. These cases were considered ?correct? as long as agreement 
was marked appropriately on one of the elements, and the lexical verb in the infinitive 
was not counted among infinitive errors. This was because with the asynchronous 
medium of written essays it is impossible to draw the line between any self-corrections 
and complex (but non-targetlike) predicates.  
Unique to Czech was the exclusion of non-targetlike uses that involved wrong, 
missing, or superfluous prefixes (that can carry aspectual meaning in Czech), as well as 
wrong, missing, or superfluous reflexive and conditional particles. In German and Italian, 
close analogs of this error type were omissions of the reflexive pronoun. In addition, in 
Italian there were also uses of the full infinitive followed by a personal pronoun instead 
of an infinitive followed by a postclitic reflexive pronoun: *invitare ti instead of invitarti. 
These instances were excluded from the analysis. 
Coding choices. The category of substitution errors was restricted to substitutions 
within the same mood and tense. For instance, in German the imperative of many verbs 
in the second  person singular corresponds to their stem (e.g., gib! geh!), but when such 
forms were encountered in syntactic contexts requiring a form of the indicative, they 
were classified as bare ([er] gib-t, [es] geh-t) as long as an overt subject was present?
not as substitutions of a wrong mood form. 
For infinitive errors, the coding of Italian and Czech data was mostly 
straightforward, as long as an overt subject or discursive information made it 
unambiguously clear what agreement was required. In German, the infinitive and 1st and 
92 
 
3rd person plural forms are homonymous. When faced with wrong agreement involving 
forms ending in -en, I considered it parsimonious to interpret them as infinitives and not 
plural forms, unless discursive evidence suggested otherwise. If some instances of forms 
ending in -en were, in fact, plurals and not infinitives, one would expect the counts of 
infinitive errors in German to be inflated at the expense of substitution errors, which 
would be artificially depressed, compared to Italian and Czech. However, as Chapter 5 
shows, substitution error counts did not significantly differ among the TLs, whereas the 
other error types did?strongly suggesting that those differences did not occur as a result 
of any underestimation of substitution errors. 
The category of bare forms included instances of fully omitted inflectional 
endings. Any suffixes or thematic vowels that preceded the inflectional ending (with 
reference to the target form) could be either retained or also omitted for the token to be 
considered in this category. For example, in Italian this category included such cases as 
those where the final -e of an infinitive was truncated but the stem was otherwise 
preserved: mangiar instead of mangiare. Mangia, however, was classified either as a 
substitution or overuse of inflection (if used instead of an infinitive). 
Inflection overuse errors encompassed the uses of finite forms where non-finite 
ones were required, as, for example, participles in analytical tense or voice forms, or 
infinitives in analytical tenses or in complex predicates (with modal or semi-lexical 
verbs). In addition, the uses of participles instead of infinitives were also categorized as 
inflection overuse. This was done because in all three TLs the formation of the past 
participle necessitates considerable morphological transformations of an extent similar to 
inflecting a finite form for person-number features. Thus, excluding the uses of these 
93 
 
complex forms where an infinitival one suffices would have underestimated learners? 
facility with inflection, despite its overextension. 
Root errors included cases with either superfluous or missing root 
transformations. The boundaries around this category were drawn broadly: besides root 
vowel alternations (e.g., vowel change e-ie in German or o-uo in Italian), it included 
errors on irregular verbs and verbs whose paradigms involve suppletion, even though 
those are not productive morphophonological processes of the contemporary TL 
grammar. For example, German irregular verbs haben and sein both involve considerable 
departures from the segmental makeup of the infinitive in some forms: e.g., sein 
(infinitive)?bin (1 pers. sg.), ist (3 pers. sg.), sind (1, 3 pers. pl.), war (1 pers. sg. 
preterite), gewesen (past participle). Whether root transformations were missing or 
superfluous was decided relative to the infinitive form: for example, the use of *habst 
(instead of hast) was considered an instance of a missing root transformation, since the 
learner retained the segment of present in the infinitive (b) too faithfully. Most decision 
points in this category concerned differentiating these errors from typos and misspellings, 
as was discussed in the previous section (Exclusions).  
Verb class errors were instances of verbs being inflected according to an 
inappropriate inflectional template associated with a different verb class. In Italian, verb 
class membership is discernible in the infinitive and is signaled by thematic vowels 
following the root. In German, there are no overt cues to verb class membership on the 
infinitive: instead, verb class is reflected in the choice of affixes on the past participle and 
sometimes correlates with root vowel changes. In Czech, verb class is identifiable from 
either the infinitive or the third-person plural of the non-past tense by the suffix or the 
94 
 
final segments of the root (Janda, p. 34). In addition, for some irregular verbs that have 
suppletive forms in their paradigms (e.g., German sein), any attempts by learners to 
?regularize? them were classified as verb class errors.  
In German, the challenge with appropriately defining the category of verb class 
errors was in the overlap between the morphology of some participles and the 
morphology of infinitives. While most German verbs form the participle with both a 
prefix and a suffix, the prefix (ge-) is the most reliable way to identify past participles, 
because the suffixes (-t for regular verbs and -en for irregular) are homophonous with 
person-number endings. However, some verbs do not take on the prefix: verbs that 
already have a prefix only add the -en or -t suffix, depending on their membership in the 
irregular or regular class (bekomm-en, begegn-et); regular verbs ending in the productive 
suffix -ieren do not take the prefix either and take on the suffix (-t).  This means that for 
verbs that do not take on the prefix, errors in the suffix are ambiguous between trying to 
derive the past participle using the wrong verb class template and between infinitive, 3rd 
person singular, and 1/3 person plural. In these cases, the deciding factor was the 
presence or absence of an auxiliary verb that could set an unambiguous expectation for 
the use of a particular tense. Without an auxiliary, it was parsimonious to interpret such 
instances as present-tense forms with wrong person-number features, as opposed to 
positing a missing auxiliary accompanied by an incorrectly inflected participle. 
Finally, phonological errors included instances of non-existent forms, sometimes 
comprised of recognizable elements of the TL system and other times of ones that are 
absent in it altogether (e.g., antwortes, m?chtes, m?cht, fruch). The boundaries around 
this category were defined primarily in terms of exclusion, aiming to house errors not 
95 
 
already captured by the remaining error categories. In particular, non-existent errors were 
distinguished from root errors in that root errors involved existing root alternation 
processes applied incorrectly and not merely any inaccuracy in the root?which would 
fall under the non-existent category. 
It is impossible to determine with any certainty what process would generate such 
instances and what exact lexical representations may underlie them. Chapter 6 explores 
one such possibility?namely, that, at least for German data, some errors can be 
accounted for by phonological processes such as consonant cluster simplification. 
However, for the purposes of the main analysis (Chapter 5), this category made it 
possible to isolate such cases with minimal speculation about their origins and status in 
the learners? grammars.  
96 
 
Chapter 5: Results?Cross-linguistic Differences in 
Inflection Error Frequency 
The present chapter will describe the data analytic process, starting with the 
structure of the data and the variables used in modeling it. Second, it will lay out the 
procedures of model specification and selection, including alternative models tested and 
their fit, and the testing of key assumptions involved in modeling count data.  Third, the 
results of the best-fitting model will be presented, including the follow-up analyses 
conducted on significant interaction terms. Where applicable, developmental differences 
will be pointed out. However, any references to proficiency-related changes are to be 
taken with caution, since the data are cross-sectional. Finally, the chapter will explore the 
power of the model it proposes to predict previously unseen data, rather than achieving 
the best fit to the data on whose basis it was specified to begin with.  
As laid out in Chapter 4, errors of different types may have different origins and 
may potentially reflect different knowledge states and properties of learners? grammars. 
In addition, the relative prevalence of some error types over others may call into question 
the assumptions of theories of learning. In Yang?s proposal, which was formulated based 
on the presence versus absence of inflectional marking in English, it was forms with and 
without overt inflectional marking that were informative for the learner (with respect to 
the value of the Tense parameter in the target grammar). Accordingly, the presence of 
verbs with and without overtly marked inflection was taken as evidence of what a 
learner?s parameter value might be at any given time. Considering that the TLs in the 
present study allow far fewer cases of verbs with no overt inflectional marking (e.g., the 
German imperative is a rare exception), infinitive forms (used incorrectly in place of 
97 
 
finite forms) were interpreted in a similar manner to bare forms?as suggesting a weak 
underlying knowledge of tense and inflection.  
The process of coding learner production data was completed by target language 
(TL) experts (described in Chapter 4) and yielded counts of errors of seven types at each 
proficiency level within each TL. Thus, the data had the structure represented in Table 
10. Despite data from learners at levels A1 and B2 being available in the Merlin corpus, 
only data from proficiency levels A2 through B1+ were used because they were available 
for all three TLs. At the A1 level, the corpus contained only one text by a learner of 
Czech, and level B2 contained only two texts by learners of Italian.  
Translated into the language of statistical models, the research questions posed in 
Chapter 4 can be expressed as assessing the significance of variables predicting the 
number of verb inflection errors and the comparative fit of models including or omitting 
them. The predictors are: target language, proficiency level (cross-sectional), and error 
type; their attributes are summarized in Table 11. 
 
Table 10 Structure of the Data 
Target Language CEFR Error Type Count N of texts in corpus 
German A2 Substitution N N 
German A2 Infinitive   
German A2 Overuse   
German A2 Root   
German A2 Class   
German A2 Bare   
German A2 Phonological   
German A2+? ?   
German ?B1+ ?   
Italian A2? ?   
Italian ?B1+ ?   
98 
 
Czech A2? ?   
Czech ?B1+ ?   
Note. Levels of variables have been omitted to convey the overall structure of the 
dataset. The actual numbers of levels for each variable are listed in Table 2. 
 
The analyses reported in the present chapter were conducted using R statistical 
software (R Core Team, 2019), specifically the glm function, which is part of the base 
package. It estimates generalized linear models by selecting one level of factor variables 
as the baseline and providing regression coefficients for the remaining levels of a factor 
against which the other regression coefficients are to be judged. To facilitate the 
interpretation of the regression coefficients, the levels to serve as the baseline were 
selected as follows. For the target language variable, German was selected as the 
baseline: its system of verb inflectional endings has the fewest distinct endings, and thus 
its paradigm can be considered the poorest among the languages studied. Therefore, 
regression coefficients for Italian and Czech can be interpreted as reflecting the relative 
benefits or penalties to accuracy due to their higher inflectional richness. Second, for the 
CEFR variable, the A2 proficiency level was retained as the baseline, meaning that the 
coefficients for levels A2+ or B1, for example, can be interpreted as multipliers adjusting 
the predicted number of errors compared to A2. Finally, for the error type variable 
substitution was selected as the baseline category because substitutions were the most 
frequent error category among all target languages. 
 
  
99 
 
Table 11 Variables used in the analysis 
Variable name Role Type Number of Levels (baseline listed first) 
Frequency Response Count N/A 
Number of texts Offset Log of count N/A 
Target language Predictor Factor 3: German, Czech, Italian 
CEFR level Predictor Factor 4: A2, A2+, B1, B1+ 
Error type Predictor Factor 6: substitution, infinitive, overuse, root, 
class, bare, other phonological 
 
5.1 Regression Model Specification and Model Selection 
The first goal of modeling was to assess the suitability of a Poisson model to the 
data and determine whether any adjustments would be needed to the final models. One of 
the basic assumptions in applying Poisson modeling to count data is that the variance is 
equal to the mean, implying that as the mean number of events increases, so does the 
variance. However, count data are frequently overdispersed, where their variance exceeds 
the mean, rendering the application of the Poisson distribution invalid due to distorted 
standard errors of model estimates (and thus, a distorted picture of their statistical 
significance). Values of the dispersion statistic above 1 are considered overdispersed and 
warranting other modeling approaches that include adjustments to the standard errors, 
such as the application of quasi-Poisson models, the use of robust standard errors, or the 
estimation of a negative binomial model. However, sometimes overdispersion is only 
apparent and signals that potentially relevant predictors or interactions have not been 
included in the model, which is why it is necessary to test a variety of model 
specifications. 
100 
 
Dispersion was assessed using the function P__disp in the COUNT package for R 
(Hilbe, 2016) and calculated as the sum of squared Pearson Chi2 residuals divided by 
residual degrees of freedom (Hilbe, 2014, p. 79).  
The modeling process started with fitting a series of Poisson regression models 
using the glm function in R statistical software (R Core Team, 2019). All models 
discussed below were estimated with an offset, which is used in count data modeling to 
adjust for differences in the sizes of the corpora. In the present case, since the corpora 
included different numbers of learner texts, the error counts would be higher the larger 
the size of the corpus. The offset was a log-transformed count of texts in each corpus: for 
example, for German A2 the offset would equal the logarithm of the number of texts at 
that level (log(199)). 
Table 12 summarizes the specifications of the models tested during this step, 
along with their respective fit and dispersion metrics. The simplest model with all three 
predictors of interest was one with only their linear sum: Count ~ target language + 
CEFR + error type. Despite all predictors being significant, the model had poor fit, as 
indicated by the dispersion statistic of over 4. Progressively adding more interaction 
terms reduced the apparent overdispersion to 0.97 (thus achieving the desirable value of 
1), in the case of the model with three pairwise interactions. A model with a three-way 
interaction between target language, proficiency, and error type was estimated but 
ultimately abandoned due to overfitting: it had no residual degrees of freedom and did not 
fit the data significantly better than model 4 (Table 12)??2 = 43.65, p = .18. 
101 
 
Table 12 Summary of Poisson Model Dispersion and Model Fit Values 
Model AIC Chi2, Dispersion 
Target language + CEFR + error type 707.33 335.14, 4.65 
Target language * error type 474.48 97.43, 1.62 
Target language * error type + CEFR * error type 475.60 63.17, 1.50 
Target language * error type + CEFR * error type + target language * CEFR 456.29 34.85, 0.97 
 
Following the estimation of model 4, its residuals were plotted against the 
predicted values of error frequency (Figure 3). They displayed no discernible pattern in 
the distribution across the range of predicted values. 
 
 
 
 
 
 
 
Figure 3. Model residuals plotted against fitted values for the model predicting 
error counts from: target language, CEFR, error type, and interactions between?
TL*error type; TL*CEFR; CEFR*error type. 
 
102 
 
The results of this step indicated that there was no significant overdispersion and, 
therefore, no need for the use of negative binomial modeling. Thus, model 4 served as the 
basis for a deeper exploration of the roles of the predictors and for pinpointing the 
sources of differences among the target languages, proficiency levels, and error types. 
5.2 Regression Model Results 
The results of the final, best-fitting model are presented in Table 13. Model 
coefficients for factor variables in R are provided for each of the factor levels separately 
and are interpreted as multiplier differences between the respective factor levels and the 
baseline level. In this case, the baseline level was the combination of Substitution (for 
error type), German (for target language), and A2 for proficiency. The regression 
coefficient (exponentiated) for ?Target language: Czech?, therefore, would represent the 
ratio difference in the occurrence of substitution errors between German and Czech at the 
A2 proficiency level. 
The regression analysis indicates that all three predictor variables?target 
language, error type, and proficiency, had levels that either significantly differed from the 
baseline or were part of interactions that did so. Summarizing the results from these 
disparate factor levels into a bird?s-eye picture (Table 14) using the joint_tests function in 
the emmeans package (Lenth, 2019), which tests and compiles all the interaction 
contrasts in a model, we note that target language and error type are significant across the 
range of their individual levels, as well as their interaction with each other. Proficiency, 
however, only interacts with target language (and not error type) and is not itself 
significant. A visual inspection of the plot of the proficiency-error type interaction 
(averaging across target languages) agrees with this conclusion: all CEFR levels exhibit 
103 
 
the same pattern?highest rate of substitution errors, middle range comprised of 
infinitive, overuse, root, and class errors, lowest point representing bare form errors, and 
somewhat higher rate on phonological errors. 
Table 13 Regression model results predicting error rates in German, Italian, and Czech 
Parameter Exponentiated Estimate p value 
Intercept 0.29 < .001 
TL: Italian 0.83 .20 
TL: Czech 0.95 .80 
CEFR: A2+ 1.11  .54 
CEFR: B1 0.67 .01 
CEFR: B1+ 0.50 .002 
Error: infinitive 0.78 .16 
Error: overuse 0.54 .001 
Error: root 0.23 < .001 
Error: class 0.05 < .001 
Error: bare 0.14 < .001 
Error: phonological 0.31 < .001 
Interaction terms: target language by error type 
It * infinitive 0.06 < .001 
Cz * infinitive 0.61 .03 
It * overuse 0.38 < .001 
Cz * overuse 0.22 < .001 
It * root 0.55 .02 
Cz * root 0.33 .004 
It * class 2.75 .003 
Cz * class 6.54 < .001 
It * bare 0.12 < .001 
Cz * bare 0.48 .04 
It * phonological 0.63 .07 
Cz * phonological 2.16 .002 
Interaction terms: Error type by proficiency 
Infinitive * A2+ 0.93 .78 
Infinitive * B1 0.76 .27 
Infinitive * B1+ 1.34 .34 
Overuse * A2+ 0.92 .78 
104 
 
Overuse * B1 0.88 .60 
Overuse * B1+ 1.40 .32 
Root * A2+ 0.87 .72 
Root * B1 1.53 .14 
Root * B1+ 1.89 .11 
Class * A2+ 1.61 .13 
Class * B1 1.96 .02 
Class * B1+ 1.47 .36 
Bare * A2+ 1.76 .19 
Bare * B1 2.67 .01 
Bare * B1+ 1.96 .20 
Phonological * A2+ 0.71 .22 
Phonological * B1 0.76 .29 
Phonological * B1+ 1.33 .37 
Interaction terms: target language by proficiency 
It * A2+ 0.81 .39 
Cz * A2+ 1.57 .03 
It * B1 1.90 < .001 
Cz * B1 1.29 .25 
It * B1+ 1.64 .08 
Cz * B1+ 1.34 .25 
 
Next, the interactions were examined in detail. The overall significance of 
interaction terms was tested using the drop1 function, which sequentially compares the fit 
of the full model to the fit of models after their terms are dropped one at a time. This 
procedure uses the AIC and the likelihood ratio test (LRT) as the indicators of goodness 
of fit. The likelihood ratio test is the difference in model deviance between the full model 
and the model without the predictor. Significant differences in these indicators between 
two models reflect the overall importance of a variable, whereas non-significant tests of 
the difference suggest that a variable may be omitted to achieve a parsimonious account. 
The results of this analysis are summarized in Table 15. 
105 
 
Table 14 Significance of model contrasts integrated by variable and interaction 
Model term Df F ratio p 
Target language (TL)        2 25.27  <.0001 
Proficiency (CEFR)        3 2.21 0.08 
Error type   6 72.14 <.0001 
TL x CEFR     6 4.97 <.0001 
TL x Error type     12 14.41 <.0001 
CEFR x Error type   1 1.52 0.07 
 
The interaction between target language and error type, when dropped from the 
model, significantly increased model deviance by 234.58 (p < .0001)?suggesting that 
different types of errors were characteristic for the target languages (across proficiency 
levels). Similarly, dropping the interaction between target language and proficiency level 
increased model deviance by 31.31 (p < .0001)?indicating that the effects of proficiency 
were not uniform for all target languages across the range of error types. However, 
omitting the proficiency-by-error type interaction resulted in a 28.5 increase in model 
deviance, which was not statistically significant (p  = .05). This suggests that for reasons 
of parsimony one may assume that different types of errors behaved similarly along the 
proficiency trajectory. 
Table 15 Contributions of interaction terms to model fit (assessed by single term 
deletions) 
Term deleted Df Deviance AIC Likelihood Ratio Test p  
None (full model)  43.56 456.29   
TL * error type 12 278.14 666.87 234.58 < .001 
CEFR * error type 18 72.07 448.80 28.51 .05 
TL * CEFR 6 74.87 475.60 31.31 < .001 
 
106 
 
 
Figure 4. Plots of aggregated model effects. Top panel: target language by error type 
interaction; middle panel: target language by proficiency interaction; bottom panel: 
proficiency by error type interaction. 
107 
 
ertype: subst ertype: infin ertype: overuse
0.5
0.4
0.3
0.2
0.1
0.0
ertype: root ertype: class ertype: bare
0.5
0.4
tl
0.3
G
0.2 It
Cz
0.1
0.0
A2 A2pl B1 B1pl A2 A2pl B1 B1pl
ertype: phonol
0.5
0.4
0.3
0.2
0.1
0.0
A2 A2pl B1 B1pl
Levels of cefr
Figure 5. Model-predicted error rates by type for German, Italian, and Czech at 
proficiency levels A2 through B1+. 
 
To tease apart which specific differences between the combinations of factor 
levels are driving the significance of the predictors, follow-up analyses were conducted 
using the emmeans package in R (Lenth, 2019) designed for pairwise comparisons. For 
models with a link function (in this case, Poisson with a log link), the package conducts 
comparisons on the log scale and back-transforms the results to the scale of the response 
variable. Bonferroni adjustments to p-values were made to account for multiple 
comparisons.  
As a first step, these pairwise comparisons will be reported within each CEFR 
level, with a focus on the cross-linguistic differences in the relative frequency of error 
Predicted rate
108 
 
types, providing insight into how the TLs compare at each proficiency level. Second, 
error prevalence will be compared within each TL (for each proficiency level separately) 
to capture patterns in learners? over- and underproduction of errors of different types. 
Finally, holding error type constant, any differences across proficiency levels will be 
highlighted, despite proficiency being a less prominent variable according to the 
regression model.  
 
Figure 6. Relativized (per number of texts) observed frequencies of error types in 
German, Italian, and Czech across proficiency levels A2-B1+. 
Pairwise comparisons of error type frequencies in TLs. This section 
summarizes in narrative form the results of pairwise comparisons presented in Table 16 
and illustrated in Figures 4, 5, and 6.  
At the A2 proficiency level, the target languages did not significantly differ in the 
prevalence of substitution errors (Table 16; Figure 4, top panel; Figure 5, ?ertype: subst?; 
109 
 
Figure 6), indicating that using inflected forms interchangeably may be a common learner 
strategy not affected by typology. This lack of differences among TLs may cast doubt 
upon the classification of some errors in German: since the German infinitive is 
homophonous with the present-tense plural forms (first and third persons ending in -en), 
only forms that did not end in -en and are unambiguously finite were classified as 
substitution errors. This leaves open the possibility that the count of substitution errors in 
German might thus be underestimated?if some of the forms ending in -en were, in fact, 
finite forms, the counts of infinitive errors would have been artificially boosted at the 
expense of substitution errors. In that case, however, German would be expected to differ 
in the prevalence of both error types from both Italian and Czech: on substitution errors, 
German should have shown lower rates, and on infinitive errors?higher rates; all the 
while, no differences between Czech and Italian need to be predicted. In reality, the 
prevalence of infinitive errors was significantly higher in German than in Italian but not 
in Czech; Italian, in turn, had a lower rate of infinitival forms than Czech. This pattern 
cannot be explained by the classification of German homophonous forms alone: if 
infinitive errors had, in fact, received an artificial ?boost? from this classification 
decision, then their count would have to be higher than that of both Italian and Czech. On 
errors of overuse of inflected forms, German had higher rates than both Italian and 
Czech, but the latter two did not significantly differ from each other. On errors involving 
the over- or non-application of root transformation processes, there were no significant 
differences among the TLs. Errors involving the application of a wrong verb class pattern 
were less prevalent in both German and Italian than in Czech but not significantly 
different between German and Italian. When it came to the use of bare forms with no 
110 
 
overt inflectional marking, German had a significantly higher prevalence than Italian, but 
not Czech, and Czech and Italian, in turn, did not differ from each other. Finally, with 
respect to phonological errors German did not significantly differ from either Italian or 
Czech, but Italian had a lower rate compared to Czech. 
Table 16 Summary of pairwise comparisons of error rates between target languages by 
type 
 Proficiency Level (CEFR) 
Error type A2 A2+ B1 B1+ 
Substitution  G = It = Cz G ~ It, G ~ Cz Cz > It G = It = Cz G = It = Cz 
Infinitive G > It, G ~ Cz Cz > It G > It, G ~ Cz Cz > It G > It, G ~ Cz Cz > It G > It, G ~ Cz Cz > It 
Overuse G > It, G > Cz Cz ~ It G > It, G ~ Cz Cz ~ It G ~ It, G > Cz Cz ~ It G ~ It, G ~ Cz Cz ~ It 
Root  G = It = Cz G = It = Cz G = It = Cz G = It = Cz 
Class G ~ It, G < Cz Cz > It G ~ It, G < Cz Cz > It G < It, G < Cz Cz ~ It G ~ It, G < Cz Cz ~ It 
Bare G > It, G ~ Cz Cz ~ It G > It, G ~ Cz Cz > It G > It, G ~ Cz Cz ~ It G > It, G ~ Cz Cz ~ It 
Phonological G ~ It, G ~ Cz Cz > It G ~ It, G < Cz Cz > It G ~ It, G < Cz Cz > It G ~ It, G < Cz Cz ~ It 
Note. The > symbol marks significant pairwise comparisons where the language 
on the left had a higher incidence of an error type than the language on the right. The ~ 
symbol denotes pairwise contrasts that were not statistically significant. The equal sign 
(=) means that all possible pairwise combinations were not significant: G = It = Cz means 
that German did not significantly differ from Italian nor from Czech, and Italian and 
Czech did not significantly differ either. 
At the A2+ level, substitution error rates did not differ significantly between 
German and either Italian or Czech. However, between the latter two it was Italian that 
had the lower rate. Errors of infinitive use were significantly higher in German than 
Italian but did not significantly differ between German and Czech, whereas Italian 
showed a significantly lower rate of them than Czech. Overuse of inflection was 
significantly more prevalent in German than Italian but not significantly higher in 
German than in Czech, despite trending in that direction. In turn, Italian and Czech did 
not significantly differ in the prevalence of inflection overuse. Continuing the trend 
111 
 
observed at the A2 level, at the A2+ level there were no significant differences in the 
rates of root transformation errors. Errors related to verb class were, once again, not 
significantly different between German and Italian but were lower in frequency in 
German than in Czech. In a comparison between Italian and Czech, Czech had the higher 
rate. The prevalence of bare forms without inflectional marking was significantly higher 
in German than in Italian but not Czech. Of the latter two, Czech had the higher rate. 
Errors in the phonological composition of the root or ending did not significantly differ in 
rate between German in Italian, but did so between German and Czech, with German 
having the lower rate. In turn, the rate in Czech was significantly higher than that in 
Italian. 
At the B1 proficiency level, substitution errors were, once again, equally likely in 
all three TLs. Repeating the pattern observed at the lower proficiency levels, infinitive 
use errors were more prevalent in German than in Italian but did not significantly differ 
between German and Czech, whereas Czech had a higher rate than Italian. With respect 
to overuse of inflection, German did not significantly differ from Italian but showed a 
higher rate than Czech, which, in turn, was not different from Italian. Root errors did not 
differ in prevalence among the TLs, extending the trend observed since the A2 level. 
Errors of verb class were less prevalent in German than in Czech, consistent with the 
patterns at the lower levels. In a contrast to the A2+ level, however, German also had a 
lower rate of verb class errors than Italian (at A2+ the two did not significantly differ), 
and Italian did not significantly differ from Czech. The use of bare forms was 
significantly higher in German than in Italian but not Czech?mirroring the patterns at 
levels A2 and A2+. Between Czech and Italian, the rate of bare forms was comparable 
112 
 
with no statistically significant difference. Finally, with respect to phonological errors, 
German and Italian did not significantly differ from each other, but each had a lower 
incidence than Czech. 
At the B1+ CEFR level, rates of substitution errors did not differ among the target 
languages, following the robust pattern that continued from the A2 level. Infinitive errors 
were significantly more prevalent in German than in Italian but did not differ measurably 
between German and Czech. However, Italian showed a lower rate of these errors than 
Czech. On overuse of inflected forms, German and Italian did not differ significantly, but 
German and Czech did, with the rate being higher in German, while Czech and Italian 
showed no significant difference. With respect to root transformation errors, the target 
languages did not differ. Errors of verb class were more prevalent in Czech than German 
but did not significantly differ between German and Italian. In turn, Italian and Czech did 
not differ from each other either. Use of bare uninflected forms was higher in German 
than Italian but not different between German and Italian, and nor was it different 
between Italian and Czech. Finally, phonological errors occurred at similar rates in 
German and Italian but were more prevalent in German compared to Czech, which had a 
higher incidence of them than Italian. 
Summary. Across proficiency levels, the most stable pattern emerging from these 
pairwise contrasts is one where German shows no considerable advantage in accuracy 
over Italian and Czech on error types related to the acquisition of rules governing the use 
of finite forms, the unacceptability of bare uninflected forms, and appropriately 
constraining the use of finite forms. While Italian outperforms German on these types, 
Czech fares at least as well as German. Compared to Italian, the disadvantage of German 
113 
 
on bare form use and use of infinitives continues throughout all proficiency levels (A2 to 
B1+), whereas in the realm of inflection overuse the gap eventually closes at the B1 level, 
and German does not significantly differ from Italian anymore. The only area where 
German appears to have an advantage is errors of verb class and phonological errors. On 
verb class errors, German has lower predicted rates than Czech across all proficiency 
levels and lower predicted rates than Italian at the B1 level. On phonological errors, 
German outperforms Czech (as evidenced by lower predicted error rates) starting at the 
A2+ level and continuing through to B1+. 
The contrast between German and Italian is informative with respect to the role of 
paradigm richness in learning: Italian has more distinct inflections and more verb classes 
and demonstrates that a rich paradigm to learn is not necessarily detrimental to accuracy; 
it can be beneficial for learning when not to use infinitive or bare forms and not to 
overuse finite forms when non-finite forms are required. The comparisons between 
German and Czech, on the one hand, and Czech and Italian, on the other, add nuance to 
this generalization: if paradigm richness comes additionally obscured by 
morphophonological alternations overlaid on top of it, the richness itself may be less 
effective in nudging the learner to developing appropriately constrained inflectional 
morphology. This is evident in the higher incidence in Czech of errors broadly associated 
with phonology, including the application of inflectional templates associated with the 
wrong verb class and the use of inexistent phonemic strings (both in the root and ending).  
The clustering of error types that behave similarly according to TL typology is 
itself thought-provoking. On the one hand, one observes the close association of 
phonological and verb class errors, which pattern together (lower in German and Italian, 
114 
 
higher in Czech; Figures 8, 9). Their occurrence in TLs at different rates is an instance 
where the learning difficulty is in proportion to what the morphological system of the TL 
provides (Figure 8): a system that includes more verb classes and phonological processes 
inevitably presents more opportunities for learners to get them wrong. Hence, German 
has fewer and Czech more. Such proportionality would be consistent with the type of 
learning presupposed by general-cognitive accounts?essentially, serial accumulation of 
knowledge about individual lexical items. The need for such accumulation comes from 
the low salience of individual elements, which makes it impossible to learn any patterns 
from strokes of insight. 
On the other hand, infinitive, bare form, and?to some extent?overuse of 
inflection errors form their own cluster (Figure 9). These errors of supplying inflection do 
not increase in lockstep with the complexity of the target system (Figure 8): the incidence 
of infinitive and bare form errors in Czech on par with German defies proportionality. 
When choosing an inflected form appropriate in a particular context, learners of Czech 
are faced with more options presented by their TL?s morphological system than are 
learners of German, which lowers the probability of supplying a correct one. This lack of 
proportion between error rates and the complexity of the system to be learned is 
consistent with accounts of grammatical development that postulate that the disparate 
inflections that learners encounter are not just learned at their face value but also 
strengthen the underlying syntactic features. Thereby experiencing one inflected form 
ultimately benefits all inflected forms.  
However, this account does not explain the differences between Czech and 
Italian, which are consistently directional and favor learners of Italian. Throughout the 
115 
 
entire proficiency span examined, learners of Czech show a higher incidence of infinitive 
use errors than learners of Italian, suggesting that, for morphological systems of 
equivalent complexity, there is a disadvantage brought on by phonological complexity, 
which nevertheless fails to outweigh the advantage of either of the ?richer? systems over 
a ?poorer? one (German).  
Moreover, whatever the modulating effects of morphophonology may be, their 
extent does not appear to affect all domains: errors of bare form use, for example, did not 
differ between Italian and Czech in a significant way, with the exception of the A2+ 
proficiency level. This suggests that the presence of distinct inflected forms in the input, 
even ones that are less transparent or consistent (due to alternations), may be enough to 
facilitate the presence of any inflections in learners? productions. Whether these 
inflections will ultimately be ones whose phonological forms are misstated, ones that 
mark the infinitive, or ones that are associated with a different verb class?is language-
specific. 
Comparisons of error type prevalence within target languages. This section 
seeks to pose a slightly different question to the last. Rather than asking whether, at a 
given proficiency level, learners of one target language are more likely to make errors of 
type X, it is interesting to explore the ?mixture? of error types in each target language, 
characterizing the distribution of all error types relative to each other and then comparing 
these overall configurations among the languages. This analysis has the capacity to 
uncover learners? favored and dispreferred ways of handling the uncertainty over 
choosing appropriate inflected forms in their language. The rank orders of error types 
within each target language at each proficiency level are summarized in Table 17. Since 
116 
 
the rank orders are remarkably preserved across the span of proficiency levels, data were 
also averaged across CEFR for each language to facilitate comparisons and to make the 
overall patterns stand out more (presented in Figures 7, 8 and Table 18). This averaging 
is additionally warranted by the general absence (with a couple of exceptions) of 
significant proficiency-related coefficients and interaction terms in the model (Table 13; 
also see subsequent section for a detailed account of proficiency-related changes).  
Exploring Figure 7, one notices that the row corresponding to substitution errors 
looks remarkably similar for all three TLs: substitution errors were significantly more 
likely than all other error types, with the exception of infinitive errors in German, which 
were not significantly different in frequency from substitution errors. This points to the 
tendency of learners of all three TLs to use inflected forms interchangeably. Other than 
substitution errors, German was characterized by the prevalence of infinitive errors and 
the overuse of inflected forms?each of which outnumbered four out of the remaining six 
error types: root errors, verb class errors, use of bare uninflected forms, and phonological 
errors. The lowest-ranked error type in German was verb class (significantly lower than 
any of the six remaining error types). Root, bare, and phonological errors formed the 
middle of the pack and were all equally prevalent when compared to each other, all 
outnumbered verb class errors, and all were less prevalent than substitution, infinitive, 
and inflection overuse errors. 
In Czech, substitution errors outnumbered all others and were followed in the 
?ranking? by infinitive errors, errors of verb class, and phonological errors. While the 
presence of infinitive errors among these runner-ups is similar to German, the high 
ranking of phonological and verb class errors stands in contrast to the German pattern. 
117 
 
All three of these types were more prevalent than overuse of inflection, root errors, and 
bare form errors. Unlike German, which had a strongly dispreferred error type (that was 
less prevalent than all of the remaining errors), in Czech the status as the most 
dispreferred error type was shared by three error categories?bare forms, root errors, and 
overuse of inflection. These three did not significantly differ from each other and were 
each significantly less prevalent than substitution, infinitive, verb class, and phonological 
errors. 
 
Averaged across German Italian Czech
CEFR
SubstInf Ovr Rt Cl BarePhnl SubstInf Ovr Rt Cl Bare Phnl SubstInf Ovr Rt Cl Bare Phnl
Substitution - - -
Infinitive - - -
Overuse - - -
Root - - -
Class - - -
Bare - - -
Phonological - - -
Figure 7. Summary of pairwise comparisons among error type rates within each TL, 
averaged across all proficiency levels.  
Note. Comparisons are listed in rows: on the left, categories of errors are listed that serve 
as the basis of comparison in that row; symbols depict whether the row category had a 
higher (?), lower (?), or not significantly different (?) predicted incidence than the 
categories listed across the top row. For example, for errors of inflection overuse in 
German the pattern is (left to right): lower rate than substitution errors (?); not 
significantly different rate from infinitive errors (?); higher rate than root, class, bare 
form, and phonological errors (????). 
 
Table 17 Rank orders of error types by target language and proficiency level 
 German Italian Czech 
A2 
1 most common Substitution Substitution Substitution 
2 Infinitive Overuse Infinitive 
3 Overuse Root Phonological 
4 Root Class Class 
118 
 
5 Phonological Phonological Overuse 
6 Root 
Bare Infinitive 
7 least common Bare 
Class Bare 
A2+ 
1 most common Substitution Substitution Substitution 
2 Infinitive Overuse Infinitive 
3 Overuse Root  Phonological 
4 Root Class Class 
5 Phonological Phonological Overuse 
6 Bare Infinitive Root 
7 least common 
Class Bare Bare 
B1 
1 most common Substitution Substitution Substitution 
2 Infinitive Overuse Infinitive 
3 Overuse Root Phonological 
4 Root Class Class 
5 Phonological Phonological Overuse 
6 Bare Infinitive Root 
7 least common Class Bare Bare 
B1+ 
1 most common Substitution Substitution Substitution 
2 Infinitive Overuse Phonological 
3 Overuse Root Infinitive 
4 Root Infinitive Class 
5 Bare Class Overuse 
6 Phonological Bare Root 
7 least common Class Phonological Bare 
Note. Error types were ranked based on the number of significant pairwise 
comparisons they participate in. For example, an error type that compared higher than 
three other error types is ranked higher than one that scored higher than two error types; 
one that tested significantly lower than another error type in one pairwise comparison 
would be ranked lower than an error type that did not show any significant differences to 
119 
 
the remaining error categories. Lack of significant differences between error types is 
indicated by the merging of cells. 
  
Table 18 Rank orders of error types by target language, averaged across all proficiency 
levels 
 German Italian Czech 
All proficiency levels 
1 most common Substitution Substitution Substitution 
2 Infinitive Overuse Infinitive 
3 Root Class 
Overuse 
4 Class Phonological 
Root 
5 Phonological 
Bare Overuse 
6 
Phonological Infinitive Root 
7 least common 
Class Bare Bare 
 
In Italian, barring substitution errors (which dominated all other error types), the 
remaining types were equally uncommon: inflection overuse, root, class, and 
phonological errors were all equally likely amongst themselves and each outnumbered 
the two least-preferred error categories?infinitive and bare form errors. This 
dispreference for bare form errors is something that learners of Italian share with learners 
of Czech, along with the prevalence of verb class and phonological errors in the middle 
tier of errors. Shared with learners of German is the high ranking of inflection overuse 
errors among learners of Italian, but the ordering of bare form errors as tied for least-
preferred in Italian contrasts with the position of these errors in the middle tier among 
learners of German. Conversely, the middle-tier ranking of verb class errors in Italian 
contrasts with their lowermost status in German.  
120 
 
 
Figure 8. Cross-over pattern in morphosyntactic and morpholexical errors depending on 
target-language complexity. 
The differences in the patterns of pairwise comparisons detailed above can be 
linked to how elaborate the morphological systems are (Figure 8). In German, with its 
two major classes of verbs (?strong? and ?weak?) there is less opportunity to apply the 
wrong inflection templates than in Italian and Czech. In Italian and Czech, with more 
distinct inflectional endings, some of which are also made up of longer strings of 
segments, there is more opportunity for learners to be uncertain about the exact phonemic 
composition of the inflectional endings and roots. In German, the higher or equal 
prevalence of bare form errors on three out of six contrasts stands out as well, in contrast 
to the dispreference for bare forms in Italian and Czech. This is consistent with the notion 
that more saturated morphological paradigms may be conducive to learning about the 
illegality of uninflected forms.  
121 
 
 
Figure 9. Error rates by type and target language across CEFR proficiency levels 
 
Next, I will examine developmental nuances by zooming in on the cross-sectional 
changes in the rates of error types for each language separately.  
Role of proficiency. As the model coefficients foreshadow (Table 13), there were 
very few instances in which it was possible to capture differences between proficiency 
levels in the incidence of error types. The most meaningful comparisons in this case are 
those holding constant target language and error type, while comparing the incidence of 
each error type between adjacent proficiency levels. The follow-up analysis of pairwise 
comparisons did not reveal any differences between CEFR levels for German nor Italian, 
whereas in Czech substitution errors declined between levels A2+ to B1+ by a factor of 
122 
 
2.58 (p = .02) and, not statistically significantly, between levels A2+ and B1 by a factor 
of 2.02 (p = .09). 
Despite the overall paucity of significant pairwise contrasts, the significant 
interaction terms for proficiency and error type, on the one hand, and proficiency and 
target language, on the other, merit a closer look. Examining first the interaction between 
proficiency and error type, one sees that the only significant combinations of them in the 
model are: B1 and bare; B1 and class. The interpretation of these parameters in the model 
is that of a multiplier applied to the baseline category (substitution errors in German at 
the A2 level), along with the other multipliers (exponentiated regression coefficients)?
for proficiency and error type. Thus, the model-predicted rate for errors of verb class in 
German at the B1 level would be calculated as: intercept (rate of substitution errors in 
German at A2) * coefficient for CEFR=B1 * coefficient for Error type=class * coefficient 
for interaction (B1*class). 
The coefficients for the proficiency level  of B1 and error types ?bare? and ?class? 
were all negative in the model, indicating lower expected prevalence of substitution 
errors at B1 (than A2, in German); lower expected prevalence of ?bare? and ?class? 
errors at A2 (than substitution errors, in German). However, the interaction terms 
B1*bare and B1*class were positive, running counter to the lower-level coefficients. This 
suggests that both error types improved less between the levels of A2 and B1 than would 
be expected.  From this we conclude that in German errors associated with wrong verb 
class and uninflected bare forms are more persistent developmentally than would be 
expected from their frequencies alone or from the declines in other error types over the 
same proficiency span. 
123 
 
Finally, the overall significance of the interaction between proficiency and target 
language (Tables 14 and 15) appears to stem from the significant differences of the 
combinations ?Czech, A2+? and ?Italian, B1? from the reference level, as revealed by the 
significant model coefficients for these level combinations in the model (Table 13). 
Again, the significance of the interaction terms is interpreted in the context of the 
coefficients being additive (on the log scale) and multiplicative on the linear scale. For 
example, the model-predicted rate for substitution errors in Czech at the A2+ level is 
pieced together from: the intercept (rate of substitution errors in German at A2) * 
coefficient for CEFR=A2+ * coefficient for Target language=Czech * coefficient for 
interaction (Czech * A2+).  
Neither Italian nor Czech had significant coefficients in the model, indicating that 
their rates of substitution errors at A2 were not significantly different from German, 
although numerically lower (0.83 and 0.95 times the German rate, respectively). Nor was 
the A2+ coefficient significant, meaning that rates of substitution errors in German did 
not noticeably change between A2 and A2+, although there was a trend for them to 
increase by a factor of 1.11. The coefficient for B1 was significant, with an expected 
decrease in substitution errors in German of 0.67 times. The interaction multiplier of 1.57 
for Czech A2+ means a sharper increase in substitution errors between the levels of A2 
and A2+ in Czech than would be predicted either from the pattern in German (over the 
same proficiency span) or from the pattern of non-significant Czech-German differences 
(at A2 for substitution). Similarly, the interaction multiplier of 1.90 for Italian at B1 
negates the trends towards an overall lower rate of substitution errors in Italian (at A2, 
compared to German) and towards substitution errors decreasing between A2 and B1 (for 
124 
 
German). Instead, we see a model-predicted two-fold increase in their incidence over 
what would be expected from those individual trends.  
These results are hard to interpret not only due to the cross-sectional nature of the 
data but also due to the way that proficiency is operationalized in the CEFR framework. 
Not only is grammatical accuracy part of the construct of proficiency, creating potential 
for circularity, but so are lexical and pragmatic aspects of language use, which can 
potentially overshadow accuracy when a holistic rating is produced. Therefore, the 
absence of significant changes from one proficiency level to the next could reflect the 
noisiness of proficiency measurement, or the particulars of the proficiency construct 
definition in the CEFR approach, or a true state of affairs in which it is the typology of 
the target language that predisposes learners to producing errors of different types at 
different rates that may change little throughout development. 
5.3 Cross-Validation 
A separate cross-validation analysis was conducted to assess the generalizability 
of the model proposed in the previous section to previously unseen data. This analysis 
was performed using the holdout method, in which the splitting of the data and the 
evaluation of the model occur only in one run (instead of over multiple runs that are 
subsequently averaged). Data were randomly split and assigned either to the training set 
(encompassing 59 out of 84 observations) or the testing set with 25 observations. Each 
observation corresponded to an error count for a particular combination of target 
language, proficiency, and error type (as presented in Table 10). 
First, the mean of error frequency was calculated based on the training set, and the 
actual values of error frequency in the training data were compared to that mean, yielding 
125 
 
a set of prediction error metrics?mean squared error (MSE), root mean squared error 
(RMSE), and mean absolute error (MAE). These metrics served as the basis against 
which to compare the performance of the regression model?both when applied to 
training data as well as when applied to testing data. Second, the Poisson regression 
model described in the previous section was estimated on the training data, and MSE, 
RMSE, and MAE were calculated from comparing actual frequency values to model-
fitted ones. These values were compared to those obtained from a model with the mean 
alone. It is helpful to calculate the ratios of model-based MSE, RMSE, and MAE to their 
mean-based counterparts: values less than 1 would then indicate improved prediction 
accuracy, and the extent of their difference from 1 would quantify that improvement. 
Finally, the steps just outlined were repeated on the test data. 
Examining the performance of the regression model on the training data, one sees that 
it considerably improved prediction accuracy (Table 19): depending on the metric in 
question, the prediction error associated with the model ranged between 0.3% and 6% of 
the mean-based error?corresponding to a 94 ? 91 % reduction in prediction error. When 
fitted to the testing data, however, the model increased prediction error, compared to 
using the mean of error frequency: the increase varied by metric from 25% (MAE) to 
81% for MSE. Due to this overfitting, successive models were tested further with 
progressively more model terms removed. All of them also fared poorly and resulted in 
an increase of error when applied to test data. However, the amount of the increase in 
prediction error became lower with more terms removed (Table 19). 
 
126 
 
Table 19 Prediction accuracy of regression models when tested on unseen test data 
 Prediction Accuracy 
Mean Squared Root Mean Mean 
Model 
Error Squared Error Absolute error 
3 Interactions: TL*error type + TL*proficiency + proficiency*error type 
Training set 
Mean of frequency 307.46 17.53 12.16 
Model (3 interactions) 0.82 0.91 0.72 
Ratio (model error / mean-based error) 0.003 0.05 0.06 
Testing set 
Mean of frequency 143.85 11.99 9.37 
Model (3 interactions) 259.84 16.12 11.76 
Ratio (model error / mean-based error) 1.81 1.34 1.25 
2 Interactions: TL*error type + TL*proficiency 
Training set 
Mean of frequency 198.16 14.08 10.30 
Model (2 interactions) 5.81 2.41 1.84 
Ratio (model error / mean-based error) 0.03 0.17 0.18 
Testing set 
Mean of frequency 404.16 20.10 12.71 
Model (2 interactions) 582.87 24.14 15.46 
Ratio (model error / mean-based error)  1.44 1.20 1.22 
1 interaction: TL *Error type 
Training set    
Average frequency 184.45 13.58 9.93 
Model (1 interaction) 10.35 3.22 2.45 
Ratio (model error / mean-based error) 0.06 0.24 0.25 
Testing set    
Average frequency 438.86 20.95 2.45 
Model (1 interaction) 604.02 24.58 14.95 
Ratio (model error / mean-based error) 1.38 1.17 1.12 
  
The issue of overfitting may reflect both the adequacy of the model itself and the 
ratio of the number of observations to the number of predictors. Especially considering 
that proficiency was mostly not predictive of error rates, it may be the case that it may be 
127 
 
best operationalized as a variable with fewer levels than in the present study. Moving 
forward, explorations of this topic should include more data or condense the number of 
categories investigated. This can be achieved by combining the levels of the error type 
variable and by collapsing CEFR distinctions, each of which would reduce the number of 
factors and their combinations in the model. It would have fallen outside of the scope of 
the present study to test alternative ways of leveling the variables?which could all 
potentially have consequences for the theoretical implications of the findings and the 
interpretation of results. Identifying that natural ?break? in the data along the proficiency 
dimension would, no doubt, be enough to merit an entire study of its own. Instead, I 
opted for setting up the validation maximally close to the original regression models 
estimated in the previous section. 
  
128 
 
 
Chapter 6: Results?Production of Verbal Inflection in 
German: Phonological Environments 
Research on the production of inflectional morphology by L2 learners repeatedly 
shows difficulties with producing inflectional endings, with learners either omitting them 
altogether or modifying them in some way. Recently it has been enriched by the 
consideration of phonological factors, which influence how morphological marking is 
realized overtly. This has led to claims that at least some of the observed problems with 
producing inflectional morphology are due to phonological processes. Some of these 
factors have included the differences between L1 and L2 syllable structure (e.g., Goad, 
White, & Steele, 2003) and the application of target-language or first-language (Beebe, 
1980; Itakura, 2002; Lee, 2000; Schmidt, 1977; Yu, 2004) sociolinguistic variation 
patterns by second language learners. Considering that phonological processes are 
sensitive to grammatical constraints in native speakers (Labov, 1968, 1989; Neu, 1980; 
Wolfram, 1976), variation in learners? phonological production can indirectly expose the 
status of different morphological and syntactic environments in learners? interlanguages 
(Young-Scholten, 1997). Thus, addressing phonological processes, such as consonant 
cluster simplification, deletion, and assimilation, can strengthen the conclusions of 
studies on learners? production of morphology, especially where oral samples are 
concerned, since they would be particularly prone to phonological influences.  
In the literature on suprasegmental processes in L2 learning, one distinguishes 
between universal developmental tendencies and transfer of L1 phonotactic properties 
(Major, 2001), both of which operate in learners. Among the universal influences, 
129 
 
markedness (Eckman, 1991) and sonority have been proposed to describe the relative 
prevalence of syllable types across languages and the differences in their learning 
difficulty. For example, simple codas are both more prevalent in the world?s languages 
and preferred by learners over codas with more consonants, regardless of L1 syllable 
structure (Tarone, 1982).  
Sonority (Cairns & Feinstein, 1982), in turn, complements the notion of 
markedness by elaborating the preferred and dispreferred sequences of segments. 
Namely, a syllable?s nucleus is typically formed by elements that are the highest on the 
sonority hierarchy?vowels, and only in exceptional cases by nasals or liquids. Leading 
up to the nucleus, segmental units are generally expected to rise in sonority, which 
captures the preference of many languages for onset clusters such as stop + nasal over 
nasal + stop. By contrast, this preference is reversed in codas, which are required to 
sequence the segments in the order of falling sonority. However, the specific ways in 
which learners deal with marked material, such as clusters, seem to be influenced by the 
L1, with some learners preferring epenthesis (e.g., Abrahamsson, 1999), others deletion, 
and yet others feature assimilation. 
Most research on the interactions between phonology and morphology has been 
conducted on L2 English. English verbs provide fertile ground for such research: 
consonant clusters are created when past-tense morphology (realized as /t, d/) is applied 
to stems that may already be consonant-heavy due to the permissiveness of English 
codas. The findings have been conflicting: in native speakers; deletion of /t, d/ is more 
likely where these phonemes do not carry grammatical meaning?that is, when they are 
part of a monomorphemic cluster, rather than when they express past tense (Labov, 
130 
 
1989). By contrast, L2 learners have been reported to show the opposite pattern, being 
more likely to delete /t, d/ in past tense clusters (Bayley, 1996) and /s/ in the third person 
singular than on plural nouns (Saunders, 1987), especially when length of residence in a 
target language community has been short (Wolfram, 1985; Wolfram & Hatfield, 1984). 
This suggests that early in development deletion may coincide with lack of acquisition of 
past tense marking. However, as learners acquire patterns of sociolinguistic variation on 
top of the morphosyntax of their target language, they may also start deleting more, not 
less, converging with native speaker tendencies (Hansen, 2001, 2005).  
These divergent findings stem from differences in first languages among these 
studies, as well as from sampling from different points along proficiency trajectories, 
reflecting different levels of mastery of the grammatical features expressed by 
consonantal morphemes (e.g., tense, agreement, plural marking in English). Another 
source of variation in the findings lies in the properties of the English inflectional system, 
which expresses inflectional endings through obstruents and, therefore, conflates learner 
difficulties stemming from the presence of inflection and those stemming from the 
consonantal nature of it.  
Cross-linguistic evidence becomes indispensable in this situation: by examining 
languages that realize inflection through different phonological means, one can separate 
the effects of incomplete grammatical acquisition, universal processes in phonological 
development, and the gradual learning of sociolinguistically conditioned variation 
approaching its use by the target language community. Examining learners? success on 
the same feature cross-linguistically essentially is equivalent to experimentally 
manipulating those properties of morphemes that have been found to contribute to 
131 
 
learning difficulty. For example, if, as on Goldschneider and DeKeyser?s (2001) account, 
morphemes expressed as full syllables are acquired more easily, it may be fruitful to 
contrast learners? accuracy on the same grammatical feature in a target language X that 
expresses it syllabically with the accuracy of learners of target language Y that expresses 
that feature non-syllabically. This approach complements comparisons conducted within 
a single language?when, for example, the learning difficulty of non-syllabic third-
person ?s is compared to that of the syllabic the. In such comparisons, accounting for the 
semantic and syntactic differences of the morphological features studied is far from 
straightforward. 
This chapter provides such cross-linguistic evidence by testing the effects of 
variables first identified based on English as a target language in the context of German. 
Focusing on the production of inflectional endings on verbs and the phonological 
environments that promote its variability, such an examination stands to offer insights not 
only on the learning of German, but also to enter into a dialogue with the English data. 
German as a target language offers a number of advantages for clarifying the 
contributions of phonological and phonotactic factors to morphological accuracy.  In its 
present-tense paradigm German includes, among other inflectional endings, the same 
phonological marker as one of the English past-tense allomorphs, /t/. This takes away the 
confounding influence of the mastery of tense and aspect semantics from the examination 
of morphological marking using /t/. In addition, the agreement paradigm of German 
contains inflectional endings spanning opposite poles on the sonority continuum, from 
the first person ?e through the plural (first and third persons) ?en to the consonantal 
cluster ?st (second person singular). Another advantage of German is that it has no 
132 
 
phonological process comparable to /t, d/ deletion in English. Therefore, if the underlying 
difficulty in producing inflectional endings is grammatical in nature and reflects the 
gradual acquisition of agreement, these morphemes should show similar degrees of 
accuracy?similar both among each other and to the difficulty of English ?s and ?ed. The 
operation of syllable structure constraints, by contrast, would be captured in higher 
accuracy of production on the morphemes expressed as more sonorous strings (-e, -en) 
than less sonorous ones (-t, -st). 
While most of the research has been conducted on oral productions, investigating 
learner behavior through other tasks provides more ways to tease apart the effects of 
variables such as monitoring?engaged to different degrees in tasks varying in formality 
and style (Adamson & Regan, 1991; Tarone, 1982), perception (McAllister, 1997; Segui, 
Frauenfelder, & Hall?, 2001), and motoric output constraints (Flege, 1995; Leather & 
James, 1996). While this is by no means impossible when examining oral production data 
alone, broadening the scope of this research to include written data can bring to light the 
role of familiar factors in novel combinations.  
The present chapter examines written production data by learners of German as a 
second or foreign language. The data come from essays written as part of language 
proficiency assessments. This places them even closer to the monitored end of the 
spectrum. Though not meta-linguistic, the task?s written mode and, in particular, the 
purpose of demonstrating proficiency in a target language, would likely involve 
considerable attention to form and tap both declarative and procedural knowledge. It is 
also likely that oral and written communication engage linguistic competence and 
performance differently by virtue of their differing time requirements and opportunities 
133 
 
for revision. In addition, the written channel of communication removes some of the 
motor difficulty associated with spoken production. Thus, examining data from written 
production, in addition to the traditional spoken channel, can offer a way to study the old 
?ingredient? variables?such as monitoring, degree of meta-linguistic control, style 
shifts, or motor facility?combined in novel ways. 
6.1 Methods 
Corpus. From the online corpus of learner productions written for proficiency 
exams (Merlin), I extracted texts rated by the Merlin project staff at the CEFR A1 level. 
This returned 55 learner texts, written by test takers of the A1 CEFR exam, with the 
exception of nine at the A2 and four at the B1 exam levels. In addition to these, 20 texts 
were randomly selected from those written for the A1 examination but rated as A2 level 
overall. This was done to increase the sample and include a broader range of learner 
ability and performance, while keeping task characteristics constant. Care was taken to 
select samples rated at A2 overall but not uniformly for all aspects of performance. For at 
least some aspects (such as linguistic range, grammatical accuracy, or vocabulary 
control) these additional texts were rated A1.  
The decision to limit the examination to mostly the A1 and a few instances of the 
A2 levels stemmed from a number of considerations. First, only limited metadata were 
available for learners in the corpus, and length of study, type of instruction, or length of 
residence in target-language communities were unknown. Since proficiency overlaps 
with time on task, focusing on the lowest proficiency levels locks in this variation in 
learner backgrounds before it fans out even further. If reaching the A1 level requires 
highly variable?and, for our participants, unknown?lengths of study, then the paths of 
134 
 
the learners who achieved A2 or B1 would be even more divergent. Second, this chapter 
investigates the effects of phonology indirectly, mediated by writing and orthographic 
skills, and growing literacy skills in the L2 would make this connection even more 
fragile. 
Within the texts of this subcorpus a search was conducted for all instances of the 
?sentence? annotation. Within each sentence, all verbs and participles were manually 
highlighted and coded for a number of variables. Each verb constitutes one token for the 
purposes of the analyses reported below. 
Learners. The first language of learners in the sample varied (Table 20) or was 
not reported for 24 out of 78 learners in the sample, accounting for 107 out of 479 tokens. 
All reported L1s allow consonant clusters in the syllable coda. Unless otherwise noted, 
the analyses reported below are based on all L1s, including unknown ones. 
Table 20 First language backgrounds of learners in the sample 
  Count of speakers 
L1 N of Tokens Total Rated A1 Rated A2 
Arabic 53 8 8 - 
English 88 11 4 7 
French 7 1 1 - 
Hungarian 3 1 1 - 
Polish 14 2 - 2 
Portuguese 61 10 6 4 
Russian 30 7 7 - 
Spanish 69 7 3 4 
Turkish 47 7 4 3 
not reported 107 21 21 - 
Total 479 75 55 20 
 
135 
 
Coding and independent variables. First, each token was coded for 
correctness?one variable reflected correctness in a strict sense and considered roots and 
spelling; the other variable focused on the correctness of the inflectional ending itself. 
Considering that the focus of this chapter is on phonotactic influences on the production 
of inflectional endings, it is this second outcome variable that is of primary interest. 
Unless otherwise noted, the analyses reported pertain to the correctness of the ending. 
Tokens that contained orthographic errors, such as: using a single vs. double 
consonant (hofe instead of hoffe; hoerren instead of hoeren); missing the h after an e 
signifying a long vowel (get instead of geht); missing an apostrophe (?wie gehts? instead 
of wie geht?s); using i to denote the diphtong /ay/ (ei)?were considered correct on the 
inflectional ending variable, as long as the identity of the lexical item could still be 
determined with confidence. If the root change or spelling error resulted in homonymy 
with a different word, the case was omitted. For example, Wie gut es dir? was dropped, 
even though gut was most likely a misspelled instance of geht. By contrast, a token such 
as Ich *brache deine Hilfe (correct: brauche) was retained, considering that brache is not 
a real existing word; such instances were coded as correct on the inflectional ending 
variable but as incorrect on the ?strict? correctness variable.  
Each token was classified with respect to the type of predicate, taking into 
account its syntactic role as an auxiliary, standalone verb, or complement, but also its 
belonging to broad lexical groups of closed-class (or functional) and open-class (or 
?content?, thematic verbs).9 There were four classifications that relied on these two 
                                                 
9 Thematic verbs are contrasted in the syntactic literature with functional verbs and sometimes 
colloquially referred to as ?lexical?. In this chapter, ?thematic? is used preferentially to avoid 
implying that functional verbs somehow are not part of the lexicon or do not have lexical entries 
of their own. 
136 
 
considerations in different ways. On the first classification, only lexical group 
membership was taken into account, and predicates were classified as either ?thematic? 
or ?functional?. This is consistent with cognitive accounts of L2 learning emphasizing the 
roles of frequency (functional verbs) and semanticity (thematic verbs). On the second 
classification, syntactic function was taken into account: thematic verbs appearing alone 
were classified as forming a simple predicate; functional verbs?auxiliaries (used in 
analytic tenses) and modal verbs?were coded as auxiliaries; and non-finite verbs were 
coded as complements (participles in perfect tense and infinitives used for the analytical 
future and with modal verbs). On the third, hybrid, classification, lexical class 
membership was combined with the way a verb was used syntactically in context. For 
example, uses of closed-class haben and sein with a complement (e.g., as part of analytic 
tense forms), were coded as auxiliaries, similar to the second classification; but their use 
with their primary lexical meanings (possession and existence) was classified as thematic 
(?Auxiliary thematic?). Finally, the fourth classification was the most detailed: it 
separated functional verbs into modal verbs, auxiliaries, and copulas, and further divided 
complements into infinitives and participles. The differences are summarized in Table 21. 
  
137 
 
Table 21 Classification schemes for predicate type 
N of 
Scheme Levels 
levels 
Lexical 2 Functional Thematic 
  
Syntactic 3 Auxiliary Simple Complement 
   
Hybrid 4 Auxiliary Auxiliary thematic Simple Complement 
  
Detailed 7 Modal Auxiliary Copula Auxiliary thematic Simple Infinitive Participle 
 
 
Then, each token was coded for the inflectional ending needed in its context and 
for the ending actually supplied by the learner. This means that simple, one-verb 
predicates used without an overt subject could not be analyzed for their appropriateness. 
For example, Zu Hause gratuliert zur bestandenen Pruefung was excluded. By contrast, 
where there was enough context to establish what a target-like completion should have 
been, tokens were included. For example, a sentence such as Wie geht dir? was included 
despite missing the required expletive subject: it can be identified as a frequent 
conversation formula and includes dative case on the experiencer, which rules the 
experiencer out as a possible subject and indicates that ?es? was implied but omitted.  
Tokens were annotated for tense: once narrowly and a second time broadly. For 
simple one-verb predicates the two annotations were identical, with ?present? and 
?preterite? being the only two possibilities. However, for auxiliaries and complements, 
they differed. In the narrow annotation scheme, the tenses of auxiliaries were annotated 
without reference to the tense expressed through the auxiliary-complement combination. 
Instead, the grammatical tense of the auxiliary was marked: have in ?I have finished the 
book? was annotated as ?present?. Participles and infinitives bear no tense, so ?participle? 
and ?infinitive? were also used as their tense annotations. The broad annotation of tense 
138 
 
referenced the tense encoded by the entire complex predicate, and both the auxiliary and 
its complement receive the same annotation: have (perfect) finished (perfect). 
Finally, tokens were coded for a number of phonological variables, describing the 
tokens themselves as well as their phonological environments. The variables included 
were: the segments preceding the inflection, the segment immediately following the 
inflection, the syllabicity of the required inflection, and the syllabicity of the inflectional 
ending used by the learner. Preceding and following segments were coded with variables 
of different levels of detail. On the broadest one, there were three categories: sonorant, 
obstruent, or sentence (clause) boundary. The next classification incorporated manner of 
articulation and included vowels, fricatives, stops, affricates, nasals, and approximants. 
Approximants included glides and liquids, which were grouped together because 
individually their counts were low. 
Syllabicity was coded for the inflectional ending required and that supplied by the 
learner. For example, if the ending required by context was -st but -e was supplied, 
Syllabicity Needed was coded as ?no?, because the ending would not have created a new 
syllable, and Syllabicity Used was coded as ?yes?, because the supplied ending 
introduced a new syllable. 
Predictions. If the difficulties with producing inflectional endings, as 
documented in learners of English, arise due to non-felicitous syllable structures, learners 
of German should exhibit similar patterns of inflection omission. In particular, their 
production of inflected forms should be more accurate when the form required is one 
bearing a syllable-forming ending, rather than one creating a consonant cluster: -e, -en > -
t, -st. Within the group of inflections realized as consonants, the one with a simpler 
139 
 
phonotactic structure, -t, should be produced at higher rates of accuracy than the more 
complex one, -st. Moreover, the nature of errors matters as well. Inaccuracies are 
expected to go in the direction from more complexity (of the required inflected form) to 
less phonotactic complexity (in the supplied inflected form). Therefore, not only should 
non-syllabic inflections be produced less accurately, the errors should either involve 
dropping the ending altogether or substituting it with a more sonorous, syllabic one, and 
not vice versa. 
Data analysis. The data were analyzed using the lme4 package for estimating 
generalized linear models, implemented in R software (R Core Team, 2013). The 
outcome variable of interest was the correctness of inflectional endings used by learners. 
This approach follows variable rule analysis, in that it investigates the factors that are 
conducive to the correct production of inflection. 
Models of different complexity were estimated and included predicate type, 
required inflection, syllabicity of required inflection, and phonological classes of 
previous and following segments. Most of the analyses involved different coding 
schemes of these variables, ranging from the most general to detailed. Unless indicated 
otherwise, separate models were run on: a. all data?involving tokens in the indicative, 
imperative, or subjunctive moods; b. data in the indicative only; c. tokens in the present 
tense of the indicative only. This was done to remove the effects of learners? mastery of 
the different verbal moods and their semantics.  
6.2 Results 
Predicate type. According to the most general classification, which separated 
thematic from functional verbs, there was no observable difference in accuracy. This held 
140 
 
true regardless of which data were included: with or without the subjunctive, with or 
without imperative forms, restricted to the present tense or including the preterite (Table 
22).  
The model with a more detailed, syntactic classification, however, revealed a 
significant overall effect of predicate type in the dataset with all moods, as indicated by a 
Wald test: ?2 (3) = 10.6, p = .01. When restricted to the indicative (?2 (2) = 3.2, p  = .2) or 
the present tense only (?2 (1) = 2.7, p = .1) this effect did not hold. The effect of predicate 
was driven by the contrast between imperatives and simple predicates, with imperatives 
being less likely to be produced accurately than simple predicates (?2 (1) =  9.4, p = .002) 
or auxiliaries (as indicated by the negative regression coefficient in the model). 
The hybrid formulation of predicate type did not change the results: overall 
effects of predicate were significant in the model run on data in all moods: (?2 (4) = 12, p 
= .017)?but not on the subsets of indicative (?2 (3) = 3.6, p = .31) or present-tense 
tokens (?2 (2) = 3, p = .22). The baseline in this model was the category of thematic uses 
of functional verbs, and auxiliary uses of functional verbs and imperatives were both 
significantly less likely to be supplied accurately than the baseline. Simple thematic 
predicates trended in the direction of less accuracy than thematic functional verbs. 
However, pairwise Wald tests did not reveal any significant difference among the non-
baseline categories: auxiliaries were not significantly different from simple thematic 
predicates or complements, and complements did not differ measurably from simple 
thematic predicates. 
 
141 
 
Table 22 Effects of predicate type (four coding schemes) on inflection accuracy 
 Coding Scheme for Predicate Type 
Lexical Syntactic Hybrid Detailed 
All modes and tenses 
Intercept (= baseline) 1.92*** 1.53*** 3.18*** 3.87*** 
 Closed-class Auxiliary Aux thematic use Aux thematic use 
Coefficient (p): all   Aux: -1.51 (< .05) Aux: -2.44 (.03) 
other categories -   Copula: -2.047 (.07) 
relative to the intercept   Modal: -2.262 (.03) 
Complement: 0.42 (.29) Complement: -1.14 (.15) Infinitive: -1.74 (.11) 
Participle: -2.08 (.07) 
 - Imperative: -0.89 (.06) Imperative: -2.54 (.002) Imperative: -3.23 
(.003) 
 Thematic: -0.27 (.33) Simple: 0.53 (.10) Simple: -1.41 (.06) Simple: -2.10 (.04) 
Null deviance (df) 386.04 (461) 376.77 (454) 372.9 (453) 372.59 (452) 
Residual deviance (df) 385.07 (460) 389.07 367.01(451) 359.5 (449) 355.24 (445) 
AIC 375.01 369.5 371.24 
Indicative only (incl. present, past tense) 
Intercept (= baseline) 1.91*** 1.44*** 2.97*** 3.66*** 
Coefficient (p): all   Aux: -1.35 (.08) Aux: -2.24 (.04) 
other categories    Copula: -1.84 (.10) 
relative to baseline    Modal: -2.19 (.04) 
 Complement: 0.59 (.16) Complement: -0.94 (.24) Infinitive: -1.53 (.16) 
   Participle: -1.87 (.10) 
Thematic: -0.05 (.87) Simple: 0.57 (.10) Simple: -1.20 (.11) Simple: -1.89 (.07) 
Null deviance (df) 312.89 (399) 311.75 (395) 311.75 (395) 311.46 (394) 
Residual deviance (df) 312.86 (398) 308.73 (393) 307.15 (392) 303.13 (388) 
AIC 316.86 314.73 315.15 317.13 
Present tense only 
Intercept (= baseline) 1.83*** 1.43*** 2.92*** 3.61*** 
Coefficient (p): all   Aux: -1.33 (.08) Aux: -2.18 (< .05) 
other categories    Copula: -1.85 (.10) 
relative to baseline Thematic: -0.05 (.88) Simple: 0.56 (.10) Simple: -1.15 (.13) Modal: -2.16 (< .05) 
Simple: -1.84 (.08) 
Null deviance (df) 242.37 (297) 241.76 (295) 241.76 (295) 241.76 (295) 
Residual deviance (df) 242.35 (296) 239.18 (294) 237.74 (293) 234.39 (291) 
AIC 246.35 243.18 243.74 244.39 
Note. In the Detailed classification, ?copula? refers to uses of the verb sein (?be?) with 
nominal or adjectival complements (e.g., ?I am tired?, ?His eyes are blue?), whereas the 
?auxiliary? category includes analytical forms with verbal complements (e.g., ?I have 
traveled?). 
142 
 
 
Finally, the set of models run with the most detailed classification of the predicate 
revealed differences among the factor levels (of predicate type) but no overall significant 
effect of the variable. In the data set that included all moods and tenses, there was a trend 
approaching significance: (?2 (7) = 13, p = .07). In the data set of indicative tokens only, 
there was no measurable effect of predicate type: (?2 (6) = 5.4, p = .49). Nor was it 
present in the data set of present-tense tokens: (?2 (4) = 4.4, p = .35).  
However, individual levels of predicate type did sometimes differ from each 
other. In the data set with tokens in all moods, thematic uses of functional verbs were 
significantly more likely to be used correctly than non-thematic, auxiliary uses, as well as 
modal verbs, imperatives, and simple predicates with one thematic verb (Table 22). In the 
data set with indicative tokens only this effect extended to auxiliaries (used as such) and 
modals being produced less accurately than thematically used auxiliaries, whereas simple 
thematic predicates only trended in that direction. These differences still held when the 
examination was restricted to present-tense tokens. 
Thus, one of the most robust findings in this set of analyses was the advantage of 
predicates formed by functional verbs used with their primary lexical meanings over their 
auxiliary uses and imperative, while tending, less reliably, to also be more accurate than 
simple one-verb predicates. This may reflect the beneficial role of semantic load on 
learners? production of grammatical elements and its corollary?redundancy. Even 
though more accurate production of auxiliaries is typically interpreted as reflecting the 
overlearning of closed-class elements, it was noteworthy that performance on this group 
of verbs varied depending on how they were used in context. The advantages of the 
auxiliaries? high frequency were only apparent when they also carried some semantic 
143 
 
function in the sentence, while disappearing when in the presence of a semantically 
superior complement. 
Syllabicity of ending. Inflectional endings were not more likely to be produced 
correctly in environments requiring a syllabic ending than in the environments requiring a 
non-syllabic ending (Table 23). Thus, the endings that form a new syllable (-e and -en) 
were as likely to be supplied accurately as those that are expressed as consonants or 
consonant clusters (-st, -t). The reverse also held: when syllabic and non-syllabic endings 
were supplied by learners, they were equally likely to be the correct ones (Table 23). 
Table 23 Effects of syllabicity on accuracy of production 
 All moods and tenses Indicative mood Present 
only tense only 
Syllabicity of Required Ending: Baseline?Non-syllabic ending required 
Intercept (= baseline) 1.95*** 1.85 *** 1.85*** 
Coefficient: all other Syllabic Needed: -0.17 (.53) -0.03 (.92) -0.15 (.64) 
categories relative to baseline 
Null deviance (df) 351.68 (443) 320.32 (399) 245.96 (297) 
Residual deviance (df) 351.29 (442) 320.31 (398) 245.75 (296) 
AIC 355.29 324.31 249.75 
Syllabicity of Supplied Ending: Baseline?Non-syllabic ending supplied 
Intercept (= baseline) 1.69*** 1.73*** 1.80*** 
Coefficient: : all other Syllabic supplied: 0.13 (.61) 0.19 (.5) -0.06 (.86) 
categories relative to baseline 
Null deviance (df) 385.73 (460) 320.61 (400) 245.96 (297) 
Residual deviance (df) 385.46 (459) 320.16 (399) 245.93 (296) 
AIC 389.46 324.16 249.93 
 
Previous phonological segment. When examining the role of the previous 
segment, we excluded instances of irregular verbs. Irregular verbs form their different 
person-number combinations not by deriving them from the ?dictionary? entry of the 
verb but through suppletion. Without a clear ?inflection?, it cannot be clearly determined 
144 
 
what ?precedes? it. We also conducted separate analyses on the data, with forms of the 
subjunctive included or excluded. The subjunctive presents a curious case for testing the 
effects of syllabicity and sonority: in German it is formed with the suffix -te, followed by 
the same endings as in the indicative, except for the first and third persons singular, 
which are not followed by an ending. Thus, phonological considerations would predict 
higher accuracy on the subjunctive than indicative forms. 
The broadest classification of segments preceding an inflection included the factor 
levels ?obstruent? and ?sonorant?, and ?subjunctive? as a separate class. There was no 
effect of preceding segment on inflection accuracy: with subjunctive included, the 
coefficients for obstruents and sonorants were not significant, and neither was a Wald test 
for the overall effect of previous segment (?2 (2) = 4, p = .13). No effect of preceding 
segment was observed when only indicative and imperative tokens were included, nor 
when the analysis was restricted to the present tense (Table 24). 
Table 24 Effects of previous segment class on accuracy of inflectional ending: Obstruents 
versus sonorants 
 All moods and tenses Indicative mood only Present tense only 
(except irregular) (except irregular) (except irregular) 
Obstruents vs Sonorants vs Subjunctive 
Intercept (= baseline) 3.04** 1.75*** 1.62*** 
 Subjunctive Obstruent_Ending Obstruent_Ending 
Coefficient: all other Obstruent: -1.55 (.13)   
categories relative to Sonorant: -1.12 (.28) Sonorant: 0.19 (.54) Sonorant: 0.33 (.37) 
baseline 
Null deviance (df) 348.02 (408) 279.52 (348) 205.77 (248) 
Residual deviance (df) 343.09 (406) 279.15 (347) 204.95 (247) 
AIC 349.09 283.15 208.95 
Subjunctive removed: Obstruents vs Sonorants 
Intercept (= baseline) 1.49 ***   
 Obstruent_Ending   
145 
 
Coefficient: all other Sonorant: 0.43 (.13)   
categories relative to 
baseline 
Null deviance (df) 337.24 (386)   
Residual deviance (df) 334.95 (385) 
AIC 338.95 
 
A more detailed classification of preceding segments separated vowels and 
consonants, while further subdividing the consonants according to manner of articulation. 
There was only one token where the inflectional ending was preceded by an affricate, 
which was excluded. The baseline category for these comparisons was approximants. 
None of the models showed an overall effect of the manner of articulation of the 
preceding segment (Table 25). However, there were significant differences between 
factor levels in the analyses of present-tense tokens: the regression coefficient for 
fricatives differed significantly from that of vowels (?2 (1) = 4.5, p = .03), and 
approached a significant difference from stops (?2 (1) = 3.7, p = .056), whereas vowels 
and stops did not differ. 
Table 25 Effect of previous segment on inflection accuracy: Manner of articulation 
 All moods and tenses Indicative mood only Present tense only 
(regular) (regular) (regular) 
Baseline: approximant 
Intercept 1.61* 2.30* 2.30* 
Coefficient Fricative: -0.18 (.82) -0.80 (.45) -1.08 (.32) 
Nasal: 0.13 (.88) -0.63 (.56) -0.85 (.44) 
Stop: -0.05 (.95) -0.15 (.89) -0.02 (.99) 
 
Vowel: 0.49 (.55) -0.19 (.86) -0.03 (.98) 
Null deviance (df) 336.89 (385) 279.23 (348) 205.45 (247) 
146 
 
Residual deviance (df) 333.67 (381) 275.58 (347) 198.19 (243) 
AIC 343.67 285.58 208.19 
Wald test (overall effect) ?2 (4) = 3, p = .55 ?2 (4) = 3.6, p = .46 ?2 (4) = 7, p = .14 
 
Following segment. Following segments were coded at two levels of detail: the 
most general classification involved obstruents, sonorants, and sentence or clause 
boundary; the detailed classification included a division into vowels and consonants, 
which were further subdivided based on manner of articulation. Approximants were 
excluded due to their low number (three tokens). 
In the most inclusive data set with tokens of all moods, no effect was observed for 
the class of following phonological segment: this was reflected in the regression 
coefficients for factor levels ?obstruent? and ?sonorant?, neither of which was 
statistically significant. This was mirrored by the results of the Wald test, which did not 
reveal an overall effect for this variable. The detailed classification based on manner of 
articulation did not yield a statistically significant effect. No measurable effect was found 
in the datasets of indicative tokens or present-tense tokens either (Table 26). 
Table 26 Effect of following segment on inflection accuracy 
 All moods and tenses  Indicative mood only Present tense only 
Obstruents vs Sonorants: baseline?sentence (clause) boundary 
Intercept (= baseline) 2.18*** 2.44*** 2.20 (.04) 
Coefficient: all other Obstruent: -0.39 (.35) -0.55 (.25) -0.37 (.73) 
categories relative to baseline Sonorant: -0.65 (.12) -0.82 (.09) -0.42 (.69) 
Null deviance (df) 384.76 (457) 311.75 (395) 237.24 (292) 
Residual deviance (df) 382.10 (455) 308.48 (393) 237.06 (290) 
AIC 388.1 314.48 243.06 
Wald test (overall effect) ?2 (2) = 2.5, p = .28 ?2 (2) = 3, p = .23 ?2 (2) = 0.16, p = .92 
Manner of articulation: baseline?affricate 
Intercept (= baseline) 1.25 (p > .1) 1.79 (p < .10) 1.79 (p < .10) 
147 
 
Coefficient: all other Boundary: 0.93 (.29) 0.65 (.57) 0.40 (.79) 
categories relative to baseline Fricative: 0.50 (.57) -0.02 (.98) -0.21 (.85) 
Nasal: -0.04 (.97) 0.15 (.90) 0.51 (.70) 
Stop: 0.60 (.48) 0.15 (.89) 0.14 (.90) 
Vowel: 0.37 (.65) -0.22 (.84) -0.11 (.92) 
Null deviance (df) 384.76 (457) 311.75 (395) 237.24 (292) 
Residual deviance (df) 380.87 (452) 308 (390) 235.88 (287) 
AIC 392.87 320 247.88 
Wald test (overall effect) ?2 (5) = 3.9, p = .56 ?2 (5) = 3.5, p = .63 ?2 (5) = 1.3, p = .94 
 
Therefore, in the present data inflectional endings were not more or less likely to 
be produced correctly depending on the phonological segment that followed them. 
Composite models. The predictors were also tested in multiple-predictor models, 
to account for the possibility that some of the effects are only apparent in the presence of 
the others. Because the predictors are factors, the number of their levels included in these 
models was kept to a minimum to create only broad classifications, to prevent data 
sparseness. Irregular verbs that employ suppletion in their paradigms were excluded. The 
models were estimated based on tokens in the indicative and imperative moods and 
excluded forms of the subjunctive (see section ?Previous Phonological Segment? above 
for rationale). 
Table 27 Joint effects of phonological environment on inflection accuracy 
Parameter Predictor Level Estimate (p)* Wald test 
Additive effects of preceding and following segment 
Intercept (= baseline) Obstruent_Ending_# 2.36***  
Previous Segment Sonorant_Ending_# 0.19 (.55)  
 
Following Segment (baseline: Obstruent_Ending_Obstruent -0.52 (.30) ?2 (2) = 3.5, p = .18 
sentence boundary) Obstruent_Ending_Sonorant -0.87 (.07)  
Interacting effect: Preceding x following segment 
Intercept (= baseline) Obstruent_Ending_# 2.99***  
Previous Segment (baseline: 
obstruent) 
148 
 
 Sonorant_Ending_# -1.01 (.26)  
Following Segment (baseline: 
sentence boundary) 
Obstruent_Ending_Obstruent -1.18 (.14) ?2 (2) = 4.6, p = .1 
Obstruent_Ending_Sonorant -1.63 (.03) 
Interaction Sonorant_Ending_Obstruent 1.26 (.22)  
 Sonorant_Ending_Sonorant 1.52 (.14)  
 
Model comparison ?2 (2) = 2.37, p = .30   
 
The interaction between the nature of the following and previous segments did not 
reach statistical significance, but the inclusion of this interaction term made the effect of 
following segment statistically significant (Table 27; Figure 10). Despite this, in neither 
model was the overall effect of following segment class significant. The model with the 
interaction did not fit the data better than the additive one (Table 27). 
 
149 
 
 
Figure 10. Interaction between class of following phonological segment (x axis) and 
previous phonological segment (y axis) in affecting inflection accuracy. 
  
Another set of models combined phonological environment with the syllabicity of 
required ending. The effects of these three predictors (syllabicity, previous segment, 
following segment) were tested with and without an interaction between the previous and 
following segment (Table 28). 
Table 28 Combined effects of syllabicity of ending and phonological environment on 
inflection accuracy 
Parameter Predictor Level Estimate Wald test 
(p)* 
Syllabicity of Required Ending + Environment (Syllabicity + Preceding Segment + Following Segment) 
Additive model 
Intercept Obstruent_Ending [Non-syllabic]_# 2.25***  
Syllabicity of Inflection Syllabic 0.10 (.79)  
(baseline: non-syllabic) 
150 
 
Previous Segment Sonorant 0.24 (.51)  
(baseline: obstruent) 
Following Segment Obstruent -0.50 (.33) ?2 (2) = 3.4, p = .18 
(baseline: sentence Sonorant -0.86 (.08)  
boundary) 
With two-way interaction (Syllabicity + Previous x Following Segment) 
Intercept Obstruent_Ending [Non-syllabic]_# 2.81***  
Syllabicity of Inflection Syllabic 0.19 (.62)  
(baseline: non-syllabic) 
Previous Segment Sonorant -0.97  (.28)  
(baseline: obstruent) 
Following Segment Obstruent -1.14 (.16) ?2 (2) = 4.7, p = .09 
(baseline: sentence Sonorant -1.63 (.04) 
boundary) 
Interaction Sonorant_E_Obstruent 1.29 (.22)  
 Sonorant_E_Sonorant 1.61 (.12)  
Model comparison ?2 (2) = 2.59, p = .27   
 
Looking at the exact nature or learners? departure from expected inflections the 
morphemes expressed by segments that arguably create phonotactic difficulties, ?t and ?
st, were used correctly in 23/27 and 16/16 cases, respectively. By contrast, ?e  and ?en, 
both salient by virtue of forming a separate syllable and by including vowels, as well as 
easy to articulate, were supplied less accurately: 26/37 and 35/44 cases, respectively. 
When one examines what forms were used in their stead, one sees that ?e and ?en  were 
sometimes used interchangeably, and substitutions of one for the other are the most 
frequent error class for both morphemes: for ?e, four out of nine errors were uses of ?en; 
for ?en, six out of nine errors were uses of ?e. Both ?e and ?en were also omitted (2/34 
and 2/42, or 2/9 errors on ?e and 2/9 on ?en)?in contrast to ?st, which was always 
produced correctly (13/13) and ?t, which was omitted in three instances out of 27 against 
an overall highly accurate backdrop with only five total errors. Admittedly, a salience-
151 
 
based account of learning can accommodate these findings if one specifies that learning 
may be reflected in the suppliance versus omission of inflection and not only in the 
suppliance of a correct inflectional ending. 
6.3 Conclusions 
In contrast to English, all of whose overt tense-marking endings would be 
considered infelicitous, German possesses a range of endings in its inflectional paradigm 
that vary in their phonological properties. The analysis of both syntactic and phonological 
environments allowed me to rule out phonological processes as an exhaustive explanation 
of learners? difficulties with morphology, at least as far as written production is 
concerned.                                                                  
Overall, learners did not appear to incorrectly produce the inflectional endings 
considered phonotactically disfavored at disproportionate rates. There were no effects of 
the syllabicity of required ending on production accuracy, and the following or preceding 
segments did not appear to influence accuracy. Of interest was the finding that functional 
verbs, which belong to a closed class, are produced at a higher accuracy?an observation 
that mirrors prior research. However, this general pattern was refined in the present study, 
where it only applied if the closed-class elements were used thematically. This likely 
illustrates the benefits of semantic emphasis, which is standard for general-cognitive 
proposals (e.g., DeKeyser, 2001, 2005; VanPatten, 2005), compounded, potentially, by 
the higher probability of rote learning of these verbs (due to their being closed-class). On 
an account featuring semanticity only, there should have been no difference between 
thematically used auxiliaries and open-class thematic verbs. On an account that attributes 
the learning benefits to the closed-class verbs? higher overall frequency, no difference 
152 
 
should be expected between auxiliary and thematic uses of these closed-class verbs. 
Furthermore, imperatives were less likely to be produced correctly, despite being formed 
by means considered to be felicitous?bare verb stems or, for some verbs ending in 
consonant clusters, the ending -e.  
The variable rule approach has been applied in the past to data from learners of 
English, but its applicability to other target languages, such as in this case?German, is 
less straightforward. The data from learners of German analyzed in the present chapter 
imply that the variation in learners? production of inflectional morphology may be more 
nuanced than what variable rule analysis presupposes. In this sample, inaccurate 
productions were hardly a simple matter of suppliance or omission. They included not 
only bare and infinitival forms, but also finite forms substituted for one another, 
inaccurate application of root alternations or inflectional patterns associated with a 
different verb class. To capture even a fraction of these options in learner behavior, I had 
to create multiple outcome variables that described varying degrees of accuracy, while 
focusing selectively on the endings, to the exclusion of omitted main verbs and 
complements, as well as overapplication of root processes, or selection errors.  
These results may mean that phonological processes do not fully account for the 
full scope of learners? inaccuracies in the use of inflectional morphology. At the same 
time, it is entirely plausible that phonological competence may impose a ceiling on what 
can be articulated in production, or, to a more limited extent, expressed in writing. The 
written channel of the task enabled an analysis of samples obtained in a high-stakes 
context, in which learners can be expected to recruit monitoring processes and 
demonstrate maximum possible accuracy. However, a disadvantage of this lies in the fact 
153 
 
that the phonology of learners? interlanguages was not studied directly but through the 
lens of writing. In addition to monitoring, writing also provides more opportunities for 
revision and multiple checks of one?s production output, in contrast to oral spontaneous 
speech. Therefore, oral speech data could be expected to show the lower bounds of 
learners? potential accuracy. Despite these differences in the demands of oral and written 
production, those differences do not fully explain why syntactic environments would and 
phonological environments would not affect production accuracy. 
  
154 
 
Chapter 7: Discussion and Conclusions 
This chapter will recapitulate key research aims and findings, concentrating above 
all on the takeaways they offer for theories of L2 learning and their place within other 
insights into L2 morphological development accumulated up to this point. I will then 
discuss the limitations of this research and its contributions to the field.  
7.1 Key research aims and findings 
In this dissertation, I set out to accomplish two major goals. The first one was to 
provide a set of facts of morphological learning that would be simultaneously cross-
linguistic and rooted in learner productions. The second aim was to use these cross-
linguistic data to test the predictions and assumptions extended from theories of L2 
learning, including the learning mechanisms that are and are not compatible with them 
and the effect that the complexity of the linguistic system to be learned might have on the 
rate of learning. 
The empirical findings (reported in Chapters 5 and 6) are recapitulated below in 
condensed form. Their interpretation in light of current theoretical models of learning is 
the subject of the following section. 
Learners of Czech were as successful as, and learners of Italian?more 
successful than, learners of German at using inflected, finite forms when those were 
required by context. 
Learners of all TLs were equally likely to substitute inflected forms for one another, but 
learners of German were more prone to use uninflected and non-finite forms in place of 
finite ones, compared to learners of Italian, and were as likely to use them as were 
learners of Czech. Between Czech and Italian, learners of Czech were more likely to use 
155 
 
infinitival forms (when finite were required) and equally likely to use bare uninflected 
forms. This pattern is surprising in that learners of Czech do not perform significantly 
worse, compared to learners of German, despite the demonstrably higher number of 
morphological contrasts expressed through distinct morphemes in Czech. Learners of 
Italian had an edge over learners of Czech, likely due to the slightly higher number of 
distinct inflections in Czech and the less predictable system of verb classes. Overall, the 
lower than expected prevalence of these errors in learners of Czech and Italian would 
support an account of richer paradigms pushing learners to acquire the principle of 
needing some form of agreement marking. 
Learners of Italian and Czech were less likely to overuse inflection than 
learners of German. Learners of German were also less able to constrain inflection 
appropriately, overusing it in contexts where non-finite forms were required at a higher 
rate than learners of Italian and Czech. Learners of Italian and Czech were equally likely 
to make these errors. 
Learners of German made fewer verb class errors than learners of Czech 
and were at least as good as learners of Italian. In a contrast with errors of infinitive 
and bare form use, errors pertaining to verb classes did show evidence of increasing 
hand-in-hand with the complexity of the TL?s verb class system. For learners of German, 
a language with a simpler, two-way class system provided an advantage, compared to the 
learners of Czech, which has the most complex verb-class system of all TLs. Learners of 
Italian were, for the most part, as good as learners of German on this attribute (with the 
exception of one CEFR level, B1, where they were outperformed by the learners of 
German). 
156 
 
Learners of German made fewer phonological errors than learners of Czech 
and performed similarly to learners of Italian. Phonological errors involved 
incorrectly supplied segments of the root or inflectional ending that resulted in a non-
existent form. By contrast, an incorrectly supplied ending that resulted in an existing 
form would be classified as a substitution error, whereas an incorrectly supplied root 
segment that resulted in an existing form would be classified as a root process error. 
These errors possibly signify a failure to retrieve the correct inflected form or inflectional 
ending by learners of Czech and, less so, Italian. This, in turn, may reflect the processing 
costs associated with storing a more diverse set of inflected forms in the mental lexicon. 
In German, there was no accuracy advantage for syllabic, sonorous 
inflectional endings (compared to non-syllabic, obstruent endings). While 
phonological factors are undoubtedly at play during the learning of morphology and its 
production, Chapter 6 showed on the basis of German data that phonological 
explanations alone are insufficient: the nature of phonological segments preceding and 
following inflectional endings in German did not appear to influence the accuracy of 
inflection use. Nor was there a measurable difference in accuracy between endings that 
are expressed as full syllables (-e, -en) and those expressed as obstruents and clusters of 
obstruents (-t, -st). 
In German, non-syllabic inflections were not replaced with syllabic ones. 
Learners? errors did not follow the patterns predicted by accounts positing phonological 
simplification as the cause of morphological errors. In particular, marked endings were 
not omitted or substituted with less marked ones. Most of the substitution errors involved 
157 
 
substitutions between ?e and ?en in both directions, whereas -t and -st were supplied 
correctly10. 
In German, learners were more accurate on functional (closed-class) verbs 
used in their primary lexical meaning than on auxiliaries, copulas, and modals. 
Going beyond a simple distinction between closed- and open-class elements, accurate 
production was facilitated by the syntactic function of closed-class elements. When they 
were used as thematic verbs (i.e., in their primary lexical meanings, cf. I have a dog vs. I 
have bought a dog), accuracy was higher. 
The findings related to developmental patterns and their differences among the 
TLs were the least clear. The proficiency variable did not meaningfully participate in 
interactions with TL and error type, with a couple of exceptions among the many 
contrasts explored11. On the one hand, this may imply that the cross-linguistic differences 
described (Chapter 5) are a steady influence throughout development. On the other, this 
absence of differences may be a type II error stemming from the cross-sectional nature of 
the data (see Limitations below). It may also be the case that the construct of proficiency 
adopted in the CEFR framework was too broad to correlate reliably with just one aspect 
of grammatical performance. 
7.2 Theoretical implications and takeaways 
Overall, different types of errors formed two broad clusters with respect to 
differences among TLs: errors of substitution, infinitive use, bare form use, and inflection 
                                                 
10 It is possible that the overall distributional properties of the TL desensitize learners to marked 
phonological material, and consonantal clusters in inflectional endings lose their difficulty as 
learners become more familiar with clusters across the board. However, this is unlikely in this 
sample of learners who were at the extreme low end of the proficiency spectrum.  
11 As detailed in Chapter 5, the analyses incorporated corrections for multiple comparisons. 
158 
 
overuse patterned in cluster; and errors of verb class and phonological processes formed 
the other. This implies that they might rely on shared learning mechanisms. The first 
group engages rule-like compositional properties that could be more amenable to 
facilitation at an abstract level. The second group shares the similarity of being grounded 
in morpholexical learning, which is item-specific and, therefore, closer to the lexical end 
of the continuum and less prone to benefitting from abstract generalization.  
For the first, morphosyntactic, group of processes, German did not show a 
learnability advantage over Czech, and both were disadvantaged compared to Italian. 
Therefore, higher paradigm complexity is at least not detrimental to learning and can 
even be facilitative to acquiring the need for agreement marking. For the second, 
morpholexical, group of processes, learners of German were at an advantage over 
learners of Czech and at no disadvantage compared to learners of Italian. It may also be 
the case that this study did not have sufficient power to capture a difference between 
German and Italian, which only differ by one verb class. 
Thus, on morphosyntactic dimensions of inflectional morphology use learners of 
TLs with more complex paradigms were at an advantage, and their errors occurred at a 
rate that was out of proportion to the TL complexity. In other words, their error rates 
were lower than what would be expected simply from extrapolating error rates from L2 
English data. On the morpholexical dimensions, learners? errors increased accordingly if 
their TL had a more complex system in this respect (morphophonological alternations, 
verb classes).  
The patterning of errors in this way may suggest that the types of knowledge 
underpinning them may require different amounts and kinds of evidence from the input, 
159 
 
while also being differentially responsive to this evidence or even instruction. Even 
though syntactic accounts do not explicitly posit such a split, this pattern is broadly 
consistent with some of the ideas inherent to them. The lack of a noticeable handicap for 
learners of Italian and Czech (relative to German) can be taken as a learnability benefit 
resulting from those TLs? higher number of distinct (non-homophonous) morphemes, 
which provide unambiguous evidence every time agreement marking is encountered in 
the input. By contrast, learners of German, while also receiving input with overt 
agreement marking (on all person-number combinations), are presented with forms that 
are homophonous with the infinitive for one-third of the paradigm. While this 
homophony is still situated in a paradigm where all person-number combinations are 
overtly marked?in contrast to English, for example?homophony with the infinitive in 
particular may be especially damaging to grammar building. 
Syntactic accounts do posit the necessity for morpholexical learning, however, 
and the disadvantage on errors of verb class and phonological errors for learners of Czech 
is broadly consistent with this notion. Since such learning is often deemed to be outside 
of the scope of syntax, theories of L2 syntactic learning do not go any further in 
specifying just how it occurs.  
In explaining the German data (Chapter 6) on the role of syntactic environments 
in the production of inflection, syntactic accounts are less successful. The distinction 
between free and bound morphology that they employ translates in this study into the 
categories of ?auxiliary? and ?simple predicate?, which did not sufficiently explain 
learners? accuracy. Instead, a finer-grained classification proved predictive of learners? 
performance, in which closed-class freestanding morphemes (?auxiliaries?) were treated 
160 
 
differently depending on their syntactic function (as thematic, or ?main?, verbs versus 
true auxiliaries). 
With respect to general-cognitive views of learning difficulty, this dissertation?s 
results run counter to these views where morphosyntactic learning is concerned?that is, 
the use of bare and infinitival forms and overuse of inflection. The results concerning 
morpholexical learning, however, are broadly consistent with them: a higher number of 
verb classes in a TL coincided with more learning difficulty and higher error rates.  
In addition, the data on the influence of syntactic environments on the production 
of inflection in L2 German (Chapter 6) have the strongest affinity with the GC account 
among all the findings in this paper. Verbs belonging to the so-called ?closed-class??
that is, auxiliaries and copulas?were more likely to be inflected correctly when they 
were used in their primary lexical meanings and not as auxiliaries or copulas. They were 
also used more accurately than lexical verbs that do not belong to the closed class (so-
called thematic verbs). Viewed through a syntactic lens, this is highly surprising: after all, 
on a few syntactic accounts free morphemes (i.e., auxiliaries) are learned before bound 
morphemes (i.e., inflections on thematic verbs). Since it is computing syntactic 
relationships that drives difficulty, according to these accounts, any verb used 
thematically should be more, not less, difficult to produce. Conversely, if one were to 
argue that these closed-class verbs still benefit from their ?free? nature over time, then 
there should be no difference depending on the manner of their use in any given sentence.  
Alternatively, this discrepancy may also reflect layered influences of processing and 
longer-term learnability during the production of any given token of a closed-class verb. 
For instance, long-term learnability may, in fact, benefit from the salience associated with 
161 
 
their being free-standing. Yet, at the same time, the correct production in instances of 
thematic use may be additionally facilitated by learners? attention directed towards them 
when their semantic meaning is essential to the utterance. 
On the GC account, by contrast, auxiliaries are considered more salient by virtue 
of their free-standing nature, similarly to their treatment in syntactic accounts (even 
though it is not clear whether the inflectional endings on auxiliaries benefit from this fact 
as well). Additionally, the GC model allows for auxiliaries to benefit from their overall 
high frequency, which is combined over both types of use?as true auxiliaries and in 
their primary lexical meanings. Finally, the semantic dimension of salience would be 
facilitative as well, due to the lower redundancy of these elements once they are used in 
primary lexical meanings, compared to uses where the carrier of lexical meaning is the 
complement (e.g., infinitive or participle). 
Finally, the phonological data from L2 German run counter to some of the 
predictions of the general cognitive account, on which morphological elements that are 
syllabic are more phonologically salient and, thus, learnable. In this study, endings 
realized syllabically were not produced any more accurately by learners of German than 
were non-syllabic ones, at least in the written channel. This also serves as a counterpoint 
to data from L2 English, which have been explained through the lens of salience (in 
particular, with respect to the production of third-person singular -s). Learners of German 
show that, in principle, there is nothing impossible about producing a non-syllabic 
ending, which is just as redundant in German as it is in English. 
Phonological factors have also been considered by theoretical approaches other 
than the general-cognitive account of grammatical learning, including the literature on 
162 
 
interlanguage phonology (and its interactions with morphology). These have included 
consonantal cluster simplification and examinations of segments preceding and following 
inflectional endings. Again, the German data examined in Chapter 6 did not find 
evidence consistent with these explanations. In particular, learners did not substitute 
inflections in the predicted pattern (more sonorous or syllabic endings for less sonorous, 
non-syllabic ones). 
On balance, the lack of a ?slowdown? in learning observed among learners of 
Italian and, to some degree?Czech, compared to learners of German strongly suggests 
that it is premature to discard syntactic explanations of morphological learning. The 
current pattern of findings can be most economically explained if one posits some degree 
of abstract knowledge at the level of syntactic features or at the level of presence/absence 
of inflection. While the general-cognitive approach can account for the findings related to 
morpholexical errors, its current formulation as difficulty scores assigned to individual 
morphemes does not accommodate any abstract facilitation, nor does it explain the 
learning of zero-marked forms. That is not to say that a general-cognitive account of 
these cross-linguistic data cannot be formulated in principle; only that to put forward 
such an account one would need to address the non-linear patterns in learning and to 
explain the acquisition of contrasts between marked and zero-marked forms, and the 
acquisition of constraints that prevent the overuse of inflected forms. 
7.3 Limitations to consider in future research 
The first group of limitations concerns the methodological tradeoffs that were 
made inevitable by time and resource constraints. The second group of limitations 
pertains to conceptual issues, such as the choices that were made with respect to 
163 
 
operationalizing key constructs and the limits that places on the interpretation of the 
results. 
First among the methodological tradeoffs was the use of existing learner corpus 
data in the service of the cross-linguistic focus of the study. This use of existing corpora 
entailed a reliance on cross-sectional data as a stand-in for longitudinal data. Learner 
proficiency was used as a proxy for points along the development trajectory. A better 
way, of course, would be to use length of residence or amount of instruction in the target 
language. The approach taken in the present study poses at least two problems. First, 
using proficiency as a proxy for time on task, length of learning, or amount of linguistic 
input creates a circularity: grammatical accuracy, of which the mastery of inflection is a 
part, is one of the elements in the construct definition of proficiency. Therefore, learners 
who have received the same ?quantities? of input or instruction and show differences in 
accuracy would be rated at different proficiency levels, which would then erase any 
differences in error rates between the proficiency levels. Second, the role of proficiency 
in the analysis as an ordinal variable, rather than a ratio or even an interval, limits the 
interpretations that can be drawn from the data. Any regression coefficients associated 
with proficiency levels would only indicate that accuracy is ?different?, without showing 
a stepwise increase in accuracy going along with a stepwise increase in proficiency. 
Finally, there is the issue of proficiency being based on raters? judgments and 
their interpretation of what it means to be an ?A2? speaker of their language. This opens 
the door to the possibility that raters in different target languages treat inaccurate 
production of inflection differently, depending on a learner?s use of other linguistic 
164 
 
resources and the learner?s success at communicating propositional and even pragmatic 
meaning?achieved through whatever means. 
The use of previously collected corpus data also meant having limited data about 
learner language backgrounds. More information related to learner L1s and the contexts 
in which they had learned the TLs, length of residence or instruction, and other languages 
spoken would all be highly desirable. With respect to the diversity of L1s, this limitation 
was somewhat mitigated by the regularities found in the distributions of L1s in the three 
TL groups (Chapter 4). On the one hand, the three groups of learners differed with 
respect to L1s and, more importantly, the linguistic distance between the L1 and the 
target language. For example, among learners of Czech there was a sizeable group of 
native speakers of Slavic languages, and among learners of Italian?of Romance 
languages, whereas among learners of German there was no similarly situated group.   
Despite these imbalances in the composition of the learner groups, the facilitation 
predicted by simple transfer was not traceable in the results. For learners of Italian and 
Czech (each with a group of speakers of closely related languages), one would predict 
higher familiarity with the roots, inflectional endings, and, more generally, the 
phonological properties of the target languages, resulting in fewer phonological errors. 
Conversely, learners of German?who are speakers of L1s not closely related to it?
would be expected to make more phonological errors, due to the lack of such familiarity. 
However, it was the learners of Italian and Czech whose productions contained a higher 
number of phonological errors.  
On the other hand, despite the imbalances in the specific learner L1s or L1 
groups, the majority of those L1s belonged to what would be typically considered 
165 
 
inflectionally ?moderately rich? to ?rich? languages. In previous research (c.f., Murakami 
& Alexopolou, 2015) the extent to which L1 properties were considered as potential 
influences on L2 morphological production was rather basic?limited to the presence or 
absence of the feature of interest, such as ?Tense?, in the L1. Judged against this baseline, 
this study?s distribution of L1s, although not ideal, can be considered adequate, since the 
vast majority of L1s cleared the bar of having inflected forms to mark agreement on 
verbs. 
I viewed these methodological decisions as part of a tradeoff between scale and 
descriptive detail, resolving it in the end in favor of achieving scale and capturing a high 
number of productions and individual learners. While scale took priority over rich 
description of learning contexts and histories in this study, future research on this topic 
might set different priorities. 
The use of learner production data?as opposed to controlled elicitation 
techniques?makes it possible for learners to avoid using any features of the TL they 
have not fully acquired. This may be doubly true of productions captured in the context 
of a language proficiency examination. Even though the examinations in question were 
not high-stakes, one can presume that they still resulted in a fair amount of monitoring 
and attention to the formal aspects of TL use by learners. In this respect, it would be 
informative to test the findings of the present dissertation against learner performance on 
tasks of differing monitoring demands. 
Finally, the cross-sectional nature of the data also entails a certain (and 
unknowable) amount of self-selection by learners into the TL groups. While I prioritized 
ecological validity when developing the data collection strategy, the possibility of 
166 
 
experimentally controlling assignment to the target languages had to be given up. Future 
research may resolve this tension differently and test the hypotheses that this 
dissertation?s findings have spawned experimentally, rather than cross-sectionally. For 
example, artificial and semi-artificial language learning paradigms may prove 
instrumental to isolating some of the various influences on morphological learning that 
the present research has identified. 
Other limitations were more conceptual in nature. With the exception of a few 
error types that have appeared in the literature, error categories were developed bottom-
up from the data. A principled taxonomy that would draw on mental operations proposed 
under multiple approaches would merit a separate study of its own. In a perfect world, the 
error taxonomy would have been independently validated with respect to the mental 
processes it claims to reflect.  
In particular, one should consider how psycholinguistic theories would fit into this 
space and what error types could be mapped onto some of the mental processes they 
posit. Even though I make a number of assertions as to what errors originate from faulty 
processing, I have not demonstrated a connection to processing data. Additional research 
could show whether, in fact, the errors that I have conceptually linked to processing 
warrant this interpretation. This would, however, amount to a multiple-experiment 
research program that would have well exceeded the feasibility of any single dissertation. 
Whereas this study concentrated on only three target languages, they were 
conceived as representing continuous variation in paradigm richness but treated as levels 
of a factor variable for the purposes of the analyses. Future research might usefully 
approach this variation from the angle of quantitative typology: instead of treating TLs as 
167 
 
exemplars of ?rich? and ?poor? languages, degrees of richness/poverty would be related 
as continuous predictors to observed learning difficulty. Such treatment of typological 
differences as continuous has been successfully applied in studies on child language 
acquisition (Chapter 1) and allowed for more robust conclusions to be drawn.  
This graded view of complexity should ideally apply both at the paradigmatic 
level, i.e. the number of distinct inflectional endings a language has, and to the number of 
verb classes in a language. It is possible that it is not only verb groupings designated as 
classes (by descriptive grammars) that meaningfully relate to learning, but also any other 
sub-regularities that are no longer productive in a language?such as, for example, quasi 
rules of the German ablaut (cf., English catch ? caught, swipe ? swept, ride ? rode ? 
ridden). Any number of patterned regularities among lexical items could be subject to 
statistical learning, and these need not be limited to those recognized in linguistic 
descriptions. While the current study took a simplified view of these matters, such 
nuances might be pursued in future research. This would amount to comparing the 
predictive power of different operationalizations of a TL?s complexity. Applied to the 
realm of verb classes, for instance, the traditional partition in German that posits a two-
way grouping (?strong? and ?weak?) would be pitted against one that recognizes further 
distinctions within the ?strong? verbs based on the no longer productive ablaut system. 
Extending this reasoning a step further, one may also conceive of complexity as a 
property of all of a language?s inflectional paradigms, and not merely the complexity of 
the domain related to a particular part of speech?such as, in the case of this study, verbal 
inflection. Thus, learning verbal morphology could be facilitated not only by the richness 
of the verbal paradigm itself but could also be affected by the variety in the adjectival and 
168 
 
nominal paradigms (e.g., gender, number, case agreement marking on nouns and 
adjectives). Sweeping examinations of this kind have not been pursued, to my 
knowledge. Limiting the study of morphological learning to just one part-of-speech 
domain has been more a matter of following established research procedures than an 
independently justified methodological choice. There is only one obvious reason to 
restrict an examination of morphological learning in this way: namely, if the purpose of a 
study is to focus strictly on abstract syntactic features, the domain has to be limited to one 
part of speech, or else the forms studied will have no features in common (e.g., I see the 
neighbor [case] vs. He jogs [tense, person, number] in the park every day). There is no 
conceptual reason to presume that morphological knowledge or processing are modular 
in this way.This issue, however, was not pursued in this dissertation and will have to be 
left up to future research to sort out. 
Another obvious difficulty with cross-linguistic comparisons lies in the 
differences in the ways that languages realize morphological contrasts. The phonological 
factors influencing the production of inflectional morphology are the subject of a whole 
research literature. This study attempted to do it justice by including a separate 
investigation into the role of phonological learner processes in the production of L2 
inflectional morphology in German (Chapter 6). Even though it revealed that the patterns 
of learner errors in German cannot be reduced to the phonological processes posited for 
the interlanguage, similar analyses were not carried out for the other target languages. To 
mitigate this limitation, an effort was made to describe the phonological properties of the 
TLs as they relate to permissible syllable structures and consonant clusters, along with 
the properties of learners? L1 represented in the sample (Chapter 3).  
169 
 
The results of this research cannot speak to some of the debates within the 
research literature. Due to the focus being squarely on inflectional morphology, this study 
does not speak to the issue of its relative developmental sequencing vis-?-vis the word 
order operations that are associated with morphology (Chapter 2). Similarly, the different 
dimensions of salience proposed under the general cognitive account have largely fallen 
outside the scope of this study, which concentrated, instead, on extending the GC account 
cross-linguistically. 
Furthermore, it is impossible to know with any certainty just what the status of 
any inflected forms used by learners might be in their grammars and mental lexicons. 
While some psycholinguistic studies and data were noted briefly, these approaches were 
only marginally incorporated into the present study. Thus, any references made to the 
retrieval of inflected forms could apply in equal measure to the retrieval of inflectional 
elements (roots and endings), and this study is emphatically agnostic on this point. 
Another limitation of this study is the lack of directly comparable English data, 
even though it was the literature on morphological L2 learning in English that motivated 
it and has remained a red thread and point of comparison. Considering how deeply rooted 
in English data much of theory building is when it comes to L2 morphological learning, 
the inclusion of English data would lend more force to this dissertation?s arguments 
centering around the role of paradigm complexity in learning.  
7.4 Contributions 
This dissertation combined a cross-linguistic approach with a focus on learner 
errors to yield data that allowed me to test two theoretical approaches simultaneously 
(syntactic, general-cognitive). In doing so, it supplied data from less commonly 
170 
 
researched and taught languages (Italian, Czech), which provided points of comparison to 
the more widely known patterns in L2 English. German, while not formally recognized as 
a ?less commonly taught? language, is also not widely represented in the research on L2 
and foreign language learning, especially in studies utilizing production data. Despite not 
being the central focus of the study, the fine-grained error data on these languages can 
inform future research and instructional materials design. More importantly, however, 
this study showcases the vital role that cross-linguistic comparisons have to play in the 
building of SLA theories meant to uncover the forces behind the learning of any 
language, as long as the TLs are carefully selected for their specific properties. 
If the findings of this study stand the test of replication, they may inform foreign 
language instruction and the development of instructional materials. Exposing learners to 
a diverse range of inflected forms, or presenting only a subset of the diverse forms 
offered by the target language, or, conversely, inflating the diversity of forms learners 
encounter in input (beyond what would be natural in the language at large)?are only a 
few of the choices that this research opens up. Even though the present research did not 
manipulate input directly (as it was experienced by learners), its results speak to the 
possible superiority of input that is richer in inflected forms, at least as far as suppliance 
of inflection is concerned.  
However, optimal courses of action may differ for different domains of 
knowledge: it may be that the input that promotes suppliance of any inflection (as 
opposed to omitting it or using an infinitive) is not the same kind of input that is 
beneficial for learning the intricacies of morpholexical alternations. If the two are indeed 
at odds, then their diverging developmental timelines will also need to be respected in 
171 
 
instruction. For example, early exposure to the full set of person-number combinations 
would be balanced against a curated approach to introducing lexical items belonging to 
different classes, or vice versa. 
Further implications for instruction may arise from exploring whether the 
relationship between complexity and learnability is linear or whether any benefits of 
complexity taper off after a certain point. It is also possible that this point is specific to 
each learner and is determined not only by the properties of the target language but also 
by learner-internal factors and processing capacity. 
In its data analysis, this dissertation employed not only Poisson regression 
modeling but also out-of-sample validation, which involved testing the models against 
data that had been withheld during model building. It is my hope that out-of-sample 
validation of statistical models will be adopted in future research on this topic and that 
this dissertation will offer a blueprint for doing so. The availability of learner corpus data 
may allow researchers to take advantage of this approach, in addition to the ongoing push 
in the field to step up replication efforts. 
Conceptually, the present study adds to the literature on morphological learning 
its characterization of the ?product to be learned? as the morphological paradigm, or 
system of oppositions among morphemes?and not a set of self-sufficient morphemes. 
This aspiration dictated the choice to track learners? overall accuracy on inflection and 
not to limit the study to a particular morphological feature. This approach is closer 
aligned with structuralist views of language. 
This study also follows the longstanding tradition of drawing inspiration from 
theoretical work on child language acquisition. The recent work on grammar competition 
172 
 
originating in that field has proven quite stimulating and applicable to L2 learning. 
Nevertheless, in the process of developing its extensions to L2 learning, I have 
discovered important modifications that it needs in order to account for L2 data, as well 
as its limits. The modifications are necessary due to the sheer variety of learner errors in 
L2: rather than focusing on suppliance versus omission, I examined the full gamut of 
departures from target-likeness. Some parts of this spectrum were better served by the 
general-cognitive model, as detailed above (Theoretical implications and takeaways)?
such as the gradual, item-specific learning of morpholexical regularities and any 
alternations as they affect specific lexical items. However, other parts of non-targetlike 
production, such as overuse of inflection, are not easily explained by general-cognitive 
principles. If all inflections have positive difficulty ?scores?, then supplying one should 
be more difficult than supplying a non-finite form. Similarly, it is not clear why learners 
would supply ?bare? forms instead of providing a non-finite form (if it is the default)?
considering that their experiential basis with bare forms may be zero, if such forms are 
not allowed in their target language. 
As one of the main contributions of the study, I want to highlight its taxonomy of 
learner errors that I attempted to connect to different manifestations of learner 
knowledge. In doing so, I was forced to confront the layered meanings behind what it 
means to have ?learned? inflectional morphology and to acknowledge the many facets of 
knowledge. While certain error types were deemed to be irrelevant to the purposes of this 
dissertation, it is nevertheless useful to be cognizant of the full range of learner behaviors 
in the face of morphological complexity, and other researchers may direct their attention 
to those parts of it that were outside of the scope of this study.  
173 
 
Despite the limitations (outlined above) related to the process of developing the 
error taxonomy, I count among this dissertation?s contributions the attempt to examine 
different aspects of learner performance on the same task, rather than gathering evidence 
from multiple tasks. Whether the typology of errors is validated in the exact form it was 
proposed in this dissertation or undergoes refinement, the overall approach of testing the 
predictions originating from different theories on the same data merits further application. 
After all, in the words of George E. P. Box, all models are wrong, but some are useful. 
  
174 
 
References 
Aksu-Ko?, A., Ketrez, F. N., Laalo, K., & Pfeifer, B. (2007). Agglutinating languages: 
Turkish, Finnish, and Yucatec Matya. In S. Laaha & S. Gillis (Eds.), Typological 
perspectives on the acquisition of noun and verb morphology: Antwerp Papers in 
Linguistics 112, (pp. 47?57). Antwerp: University of Antwerp. 
Amaral, L., & Roeper, T. (2014). Multiple grammars and second language representation. 
Second Language Research, 30(1), 3-36. 
Anderson, J. (1987). The markedness differential hypothesis and syllable structure 
difficulty. In G. Ioup & S. Weinberger (Eds.), Interlanguage phonology: The 
acquisition of a second language sound system. New York, NY: Harper and Row. 
279-291. 
Bates, E., & MacWhinney, B. (1982). Functionalist approaches to grammar. In E. 
Wanner & L. Gleitman (Eds.), Language acquisition: The state of the art. New 
York: Cambridge University Press. 
Bayley, R. (1996). Competing constraints on variation in the speech of adult Chinese 
learners of English. in R. Bayley & D. R. Preston (Eds.), Second Language 
Acquisition and Linguistic Variation. Amsterdam: John Benjamins. pp. 97-120. 
Bley-Vroman, R. (1997, October). Features and patterns in foreign language learning. 
Plenary talk presented at the Second Language Research Forum. Michigan State 
University. 
Bonilla, C. L. (2015). From number agreement to the subjunctive: Evidence for 
Processability Theory in L2 Spanish. Second Language Research, 31(1), 53-74. 
175 
 
Bowden, H. W., Gelfand, M. P., Sanz, C., & Ullman, M. T. (2010). Verbal inflectional 
morphology in L1 and L2 Spanish: A frequency effects study examining storage 
versus composition. Language Learning, 60(1), 44-87. doi: 10.1111/j.1467-
9922.2009.00551.x 
Brunh de Garavito, J. (2003). The (Dis)association between morphology and Syntax: The 
case of L2 Spanish. In S. Montrul & F. Ordo?ez (Eds.), Linguistic Theory and 
Language Development in Hispanic Languages (pp. 398-417). Cambridge, MA: 
Cascadilla Press. 
Bybee, J. L., & Slobin, D. I. (1982). Rules and schemas in the development and use of 
the English past tense. Language, 58, 265-289. 
Chew, P. (2003). A computational phonology of Russian. Universal-Publishers. 
Clahsen, H. (1988). Parameterized grammatical theory and language acquisition. In S. 
Flynn & W. O?Neil (Eds.), Linguistic theory in second language acquisition (pp. 
47-75). Dordrecht: Kluwer. 
Clahsen, H., & Felser, C. (2006). Grammatical processing in language learners. Applied 
Psycholinguistics, 27, 3-42. doi:10.1017/S0142716406060024.  
Coughlin, C. E., & Tremblay, A. (2015). Morphological decomposition in native and 
non-native French speakers. Bilingualism: Language and Cognition, 18(3), 524-
542. 
Davidson, L., & Roon, K. (2008). Durational correlates for differentiating consonant 
sequences in Russian. Journal of the International Phonetic Association, 38(2), 
137-165. 
176 
 
DeKeyser, R. M. (2000). The robustness of critical period effects in second language 
acquisition. Studies in Second Language Acquisition, 22(04), 499-533. 
DeKeyser, R. M., Alfi-Shabtay, I., Ravid, D., & Shi, M. (2017). The role of salience in 
the acquisition of Hebrew as a second language: interaction with age of 
acquisition. In S. Gass, P. Spinner, & J. Behney (Eds.), Salience and SLA (pp. 
131-146). London: Routledge. 
DeKeyser, R., Alfi-Shabtay, I., & Ravid, D. (2010). Cross-linguistic evidence for the 
nature of age effects in second language acquisition. Applied 
Psycholinguistics, 31(3), 413-438. 
Di Biase, B., & Kawaguchi, S. (2002). Exploring the typological plausibility of 
Processability Theory: Language development in Italian second language and 
Japanese second language. Second Language Research, 18(3), 274-302. 
Dressler, W. U., Stephany, U., Aksu-Ko?, A., & Gillis, S. (2007). Discussion and 
conclusion. Typological Perspectives on the Acquisition of Noun and Verb 
Morphology. Antwerp Papers in Linguistics, 112, 67-71. 
Dryer, M.S. (2013). Prefixing vs. suffixing in inflectional morphology. In M.S. Dryer & 
M. Haspelmath (Eds.), The World Atlas of Language Structures Online. Leipzig: 
Max Planck Institute for Evolutionary Anthropology. (Available online at 
http://wals.info/chapter/26, Accessed on 2017-04-26.) 
Dyson, B. (2009). Processability Theory and the role of morphology in English as a 
second language development: a longitudinal study. Second Language 
Research, 25(3), 355-376. 
177 
 
Eckman, F. (1986). The reduction of word-final consonant clusters in interlanguage. In J. 
Leather & A. James (Eds.), Sound patterns in second language acquisition, (pp. 
143-162). Dordrecht: Foris. 
Ellis, R. (2015). Researching acquisition sequences: Idealization and de?idealization in 
SLA. Language Learning, 65(1), 181-209. 
Eubank, L. (1994). Optionality and the initial state in L2 development. In Hoekstra, T., & 
Schwartz, B. (Eds.), Language acquisition studies in generative grammar. 
Amsterdam; Philadelphia: John Benjamins. 369-388. 
Fedzechkina, M., Jaeger, T. F., & Newport, E. L. (2012). Language learners restructure 
their input to facilitate efficient communication. Proceedings of the National 
Academy of Sciences, 109(44), 17897-17902. 
Fodor, J. D. (1998). Parsing to learn. Journal of Psycholinguistic Research, 27(3), 339-
374. 
Foote, R. (2015). The storage and processing of morphologically complex words in L2 
Spanish. Studies in Second Language Acquisition, 1-
33.  http://dx.doi.org/10.1017/S0272263115000376 
Franceschina, F. (2001). Morphological or syntactic deficits in near-native speakers? An 
assessment of some current proposals. Second Language Research, 17(3), 213-
247. 
Frank, V. (2000). Impact of in-country study on language ability: National Security 
Education Program undergraduate scholarship and graduate fellowship recipients. 
Technical Report, The National Foreign Language Center. 
178 
 
Gagliardi, A., & Lidz, J. (2014). Statistical insensitivity in the acquisition of Tsez noun 
classes. Language, 90(1), 58-89. doi: 10.1353/lan.2014.0013  
Geyken, A. (2007). The DWDS corpus: A reference corpus for the German language of 
the 20th century. In Fellbaum, Ch. (Ed.), Collocations and Idioms: Linguistic, 
lexicographic, and computational aspects. London, UK. pp. 23?41. 
Goldschneider, J. M., & DeKeyser, R. M. (2001). Explaining the ?natural order of l2 
morpheme acquisition? in English: A meta?analysis of multiple 
determinants. Language Learning, 51(1), 1-50. 
Gor, K., & Chernigovskaya, T. (2005). Formal instruction and the acquisition of verbal 
morphology. In A. Housen & M. Pierrard (Eds.), Investigations in instructed 
second language acquisition (pp. 131?164). Berlin: Mouton De Gruyter.  
Gor, K., & Jackson, S. (2013). Morphological decomposition and lexical access in a 
native and second language: A nesting doll effect. Language and Cognitive 
Processes, 28(7), 1065-1091. 
Gregg, K. (1996). The logical and developmental problems of second language 
acquisition. In W.C. Ritchie & T.K. Bhatia (Eds.), Handbook of second language 
acquisition. London: Academic Press. 
Gregov?, R. (2015). The CVX theory of syllable: The analysis of word-final rhymes in 
English and in Slovak. In Belgrade English language & literature studies, Vol. III 
(pp. 111-127).  
Grijzenhout, J., & Joppen, S. (1998). First steps in the acquisition of german phonology: 
A case study. In Theory des Lexikons; Arbeiten Sonderforschungsbereichs 282, 
Nr. 110.  
179 
 
 
Grodzinsky, Y. (1984). The syntactic characterization of agrammatism. Cognition, 16(2), 
99-120. 
Halle, M. (1959). The sound pattern of Russian: a linguistic and acoustical investigation. 
Gravenhage: Mouton. 
Hamdi, R., Ghazali, S., & Barkat-Defradas, M. (2005). Syllable structure in spoken 
Arabic: A comparative investigation. In INTERSPEECH-2005 (pp. 2245-2248). 
Hawkins, R. (2001). Second language syntax: A generative introduction. Wiley-
Blackwell. 
Haznedar, B., & Schwartz, B. D. (1997). Are there optional infinitives in child L2 
acquisition? In Proceedings of the 2first annual Boston University Conference on 
Language Development. Somerville, MA: Cascadilla Press (pp. 291-302). 
Herschensohn, J. (2001). Missing inflection in second language French: Accidental 
infinitives and other verbal deficits. Second Language Research, 17(3), 273-305. 
Hilbe, J.M. (2016). COUNT: Functions, Data and Code for Count Data. R package 
version 1.3.4. https://CRAN.R-project.org/package=COUNT  
Hoover, J. R., Storkel, H. L., & Rice, M. L. (2012). The interface between neighborhood 
density and optional infinitives: Normal development and specific language 
impairment. Journal of Child Language, 39(4), 835-862. 
Institut f?r Deutsche Sprache (2017). Deutsches Referenzkorpus / Archiv der Korpora 
geschriebener Gegenwartssprache 2017-I (Release vom 08.03.2017). Mannheim: 
Institut f?r Deutsche Sprache. PID: 10932/00-0373-23CD-C58F-FF01-3.   
180 
 
Ionin, T., & Wexler, K. (2002). Why is ?is? easier than ?-s??: Acquisition of 
tense/agreement morphology by child second language learners of English. 
Second Language Research, 18(2), 95-136. 
Janda, L., & Townsend, C. (2002). Czech. Slavic and East European Language Research 
Center (SEELRC), Duke University. 
Jia, G., & Fuse, A. (2007). Acquisition of English grammatical morphology by native 
Mandarin-speaking children and adolescents: Age-related differences. Journal of 
Speech, Language, and Hearing Research, 50(5), 1280-1299. 
Johnson, J. S., & Newport, E. L. (1989). Critical period effects in second language 
learning: The influence of maturational state on the acquisition of English as a 
second language. Cognitive psychology, 21(1), 60-99. 
Kehayia, E., Jarema, G., & K?dzielawa, D. (1990). Cross-linguistic study of 
morphological errors in aphasia: Evidence from English, Greek, and Polish. In 
Morphology, phonology, and aphasia. New York, NY: Springer. 
Kempe, V., & MacWhinney, B. (1998). The acquisition of case marking by adult learners 
of Russian and German. Studies in Second Language Acquisition, 20(4), 543-587. 
 Kirkici, B., & Clahsen, H. (2013). Inflection and derivation in native and non-native 
language processing: Masked priming experiments on Turkish. Bilingualism: 
Language and Cognition, 16(4), 776-791. 
Kop?ivov?, M., Luke?, D., Komrskov?, Z., Poukarov?, P., Waclawi?ov?, M., Bene?ov?, 
L., & K?en, M. (2017). ORAL: korpus neform?ln? mluven? ?e?tiny, verze 1 z 2. 6. 
?stav ?esk?ho n?rodn?ho korpusu FF UK: Prague, Czech Republic. Accessed 
online at http://www.korpus.cz. 
181 
 
Kornfilt, J. (2013). Turkish. NY: Routledge. 
Krause, T. & Zeldes, A. (2016). ANNIS3: A new architecture for generic corpus query 
and visualization. in: Digital Scholarship in the Humanities, 31. Retrieved from 
http://dsh.oxfordjournals.org/content/31/1/118 
K?en, M., Cvr?ek, V., ?apka, T., ?erm?kov?, A., Hn?tkov?, M., Chlumsk?, L., Jel?nek, 
T.,  Kov???kov?, D., Petkevi?, V., Proch?zka, P., Skoumalov?, H., ?krabal, M., 
Trune?ek, P., Vond?i?ka, P., Zasina, A. (2016). SYN2015: Representative corpus 
of contemporary written Czech. In Proceedings of the Tenth International 
Conference on Language Resources and Evaluation (LREC'16). Portoro?: ELRA. 
pp. 2522?2528. 
Laaha S., Gillis S. (Eds.) (2007). Typological perspectives on the acquisition of noun and 
verb morphology. Antwerp Papers in Linguistics 112. Antwerp, 
Belgium: University of Antwerp. 
Laaha, S., Gillis, S., Kilani-Schoch, M., Korecky-Kr?ll, K., Xanthos, A., & Dressler, W. 
U. (2007). Weakly inflecting languages: French, Dutch, and German. Typological 
perspectives on the acquisition of noun and verb morphology. Antwerp Papers in 
Linguistics, 112, 21-33. 
Lardiere, D. (1998). Case and tense in the ?fossilized? steady state. Second Language 
Research, 14(1), 1-26. 
Lardiere, D. (1998). Dissociating syntax from morphology in a divergent L2 end-state 
grammar. Second Language Research, 14(4), 359-375. 
Legate, J. A., & Yang, C. (2007). Morphosyntactic learning and the development of 
tense. Language Acquisition, 14(3), 315-344. 
182 
 
Lehtonen, M., & Laine, M. (2003). How word frequency affects morphological 
processing in monolinguals and bilinguals. Bilingualism: Language and 
Cognition, 6(3), 213-225. 
Lehtonen, M., Niska, H., Wande, E., Niemi, J., & Laine, M. (2006). Recognition of 
inflected words in a morphologically limited language: Frequency effects in 
monolinguals and bilinguals. Journal of Psycholinguistic Research, 35(2), 121-
146. 
Leonard, L. B., Camarata, S. M., Brown, B., & Camarata, M. N. (2004). Tense and 
agreement in the speech of children with Specific Language Impairment: Patterns 
of generalization through intervention. Journal of Speech, Language, and Hearing 
Research, 47(6), 1363-1379. 
Lenth, R. (2019). emmeans: Estimated Marginal Means, aka Least-Squares Means. R  
package version 1.3.3. https://CRAN.R-project.org/package=emmeans  
Long, M. H. (1990). The least a second language acquisition theory needs to explain. 
TESOL Quarterly, 24(4), 649-666. 
Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. 
Foreign language research in cross-cultural perspective, 2(1), 39-52. 
Luk, Z. P. S., & Shirai, Y. (2009). Is the acquisition order of grammatical morphemes 
impervious to L1 knowledge? Evidence from the acquisition of plural ?s, articles, 
and possessive ?s. Language Learning, 59(4), 721-754. 
Lupker, S. J. (1982). The role of phonetic and orthographic similarity in picture?word 
interference. Canadian Journal of Psychology/Revue canadienne de 
psychologie, 36(3), 349-367. 
183 
 
Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell'Orletta, F., 
Dittmann, H., Lenci, A., Pirrelli, V. (2014). The PAIS? corpus of Italian web 
texts. In Proceedings of the 9th Web as Corpus Workshop (WaC-9), Association 
for Computational Linguistics. Gothenburg, Sweden.  
MacWhinney, B. (1987). Applying the competition model to bilingualism. Applied 
Psycholinguistics, 8, 315-327. 
MacWhinney, B., Bates, E., & Kliegl, R. (1984). Cue validity and sentence interpretation 
in English, German, and Italian. Journal of Verbal Learning and Verbal Behavior, 
23, 127-150. doi: 10.1016/S0022-5371(84)90093-8 
Maddieson, I. (2013). Syllable structure. In M. S. Dryer & M. Haspelmath (Eds.), The 
World Atlas of Language Structures Online. Leipzig: Max Planck Institute for 
Evolutionary Anthropology. (Available online at http://wals.info/chapter/12, 
Accessed on 2017-04-26.) 
Marslen-Wilson, W., & Tyler, L. K. (1980). The temporal structure of spoken language 
understanding. Cognition, 8(1), 1-71. 
Marslen-Wilson, W., & Zwitserlood, P. (1989). Accessing spoken words: The importance 
of word onsets. Journal of Experimental Psychology: Human perception and 
performance, 15(3), 576-585. 
Meisel, J. M., Clahsen, H., & Pienemann, M. (1981). On determining developmental 
stages in natural second language acquisition. Studies in Second Language 
Acquisition, 3(2), 109-135. 
184 
 
Meisel, J. M. (1997). The acquisition of the syntax of negation in French and German: 
Contrasting first and second language development. Second Language 
Research, 13(3), 227-263. 
Mezzano, G. G. (2003). The development of Spanish verbal inflection in early stages of 
L2 acquisition. Undergraduate honors thesis. University of Illinois, Urbana-
Champaign. 
Meyer, A. S., & Schriefers, H. (1991). Phonological facilitation in picture-word 
interference experiments: effects of stimulus onset asynchrony and types of 
interfering stimuli. Journal of Experimental Psychology: Learning, Memory, and 
Cognition, 17(6), 1146-1160. 
Miceli, G., Mazzucchi, A., Menn, L., & Goodglass, H. (1983). Contrasting cases of 
Italian agrammatic aphasia without comprehension disorder. Brain and Language, 
19(1), 65-97. 
Mimouni, Z., & Jarema, G. (1997). Agrammatic aphasia in Arabic. Aphasiology, 11(2), 
125-144. 
Montrul, S., Foote, R., & Perpi??n, S. (2008). Gender agreement in adult second 
language learners and Spanish heritage speakers: The effects of age and context of 
acquisition. Language Learning, 58(3), 503?553. https://doi.org/10.1111/j.1467-
9922.2008.00449.x 
Morales, A. (2014). Production and comprehension of verb agreement morphology in 
Spanish and English child L2 learners: Evidence for the effects of morphological 
structure (Unpublished doctoral dissertation). University of Illinois at Urbana-
Champaign. 
185 
 
Murakami, A., & Alexopolou, T. (2015). L1 influence on the acquisition order of English 
grammatical morphemes. A learner corpus study. Studies in Second Language 
Acquisition. doi:10.1017/S0272263115000352 
Niemi, J., Laine, M., H?nninen, R., & Koivuselk?-Sallinen, P. (1990). Agrammatism in 
Finnish: Two case studies. Agrammatic aphasia: A cross-language narrative 
sourcebook, 2, 1013-1085. 
Omaki, A., & Lidz, J. (2015). Linking parser development to acquisition of syntactic 
knowledge. Language Acquisition, 22, 158-192. 
doi:10.1080/10489223.2014.943903 
Paradis, J., Rice, M. L., Crago, M., & Marquis, J. (2008). The acquisition of tense in 
English: Distinguishing child second language from first language and specific 
language impairment. Applied Psycholinguistics, 29(4), 689-722. 
Paradis, M. (2004). A neurolinguistic theory of bilingualism. Amsterdam: John 
Benjamins. 
Phillips, C. (1995). Syntax at age two: Cross-linguistic differences. MIT Working Papers 
in Linguistics, 26, 325-382. 
Phillips, C., & Ehrenhofer, L. (2015). The role of language processing in language 
acquisition. Approaches to Bilingualism, 5(4), 409-453. 
Pienemann, M. (2003). Language processing capacity. In C. J. Doughty &  M. H. Long 
(Eds.), The Handbook of Second Language Acquisition. Oxford, UK: Blackwell 
Publishing Ltd. doi: 10.1002/9780470756492.ch20 
186 
 
Pienemann, M. (2015). An outline of Processability Theory and its relationship to other 
approaches to SLA. Language Learning, 65(1), 123-151. 
DOI:10.1111/lang.12095 
Pienemann, M. (Ed.). (2005). Cross-linguistic aspects of Processability Theory (Vol. 30). 
John Benjamins Publishing. 
Pinker, S. (1984). Language learnability and language learning. Cambridge, MA: 
Harvard University Press. 
Portin, M., Lehtonen, M., & Laine, M. (2007). Processing of inflected nouns in late 
bilinguals. Applied Psycholinguistics, 28(1), 135-156. 
Pouplier, M., & Be?u?, ?. (2011). On the phonetic status of syllabic consonants: 
Evidence from Slovak. Laboratory phonology, 2(2), 243-273. 
Presson, N., Sagarra, N., MacWhinney, B., & Kowalski, J. (2013). Compositional 
production in Spanish second language conjugation. Bilingualism: Language and 
Cognition, 16(4), 808-828. 
Pr?vost, P., & White, L. (2000). Missing surface inflection or impairment in second 
language acquisition? Evidence from tense and agreement. Second Language 
Research, 16, 103-133. 
R Core Team (2019). R: A language and environment for statistical computing. R 
Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-
project.org/.  
Rayson, P. (2008). From key words to key semantic domains. International Journal of 
Corpus Linguistics, 13(4), 519-549. DOI: 10.1075/ijcl.13.4.06ray 
187 
 
Rayson, P. and Garside, R. (2000). Comparing corpora using frequency profiling. In 
proceedings of the workshop on Comparing Corpora, held in conjunction with the 
38th annual meeting of the Association for Computational Linguistics (ACL 
2000). 1-8 October 2000, Hong Kong, 1-6. 
Rispoli, M., & Hadley, P. (2012). Input effects on the acquisition of finiteness. GALANA 
Rizzi, L. (1993). Some notes on linguistic theory and language development: The case of 
root infinitives. Language Acquisition, 3(4), 371-393. 
Rossini Favretti, R., Tamburini, F., & De Santis, C. (2002). CORIS/CODIS: A corpus of 
written Italian based on a defined and a dynamic model. In A. Wilson, P. Rayson, 
&T. McEnery (Eds.), A Rainbow of Corpora: Corpus Linguistics and the 
Languages of the World, pp. 27?38. Munich: Lincom-Europa. 
Sadowska, I. (2012). Polish: a comprehensive grammar. Routledge. 
S?nchez, L., Camacho, J., & Ulloa, J. E. (2010). Shipibo-Spanish: Differences in residual 
transfer at the syntax-morphology and the syntax-pragmatics interfaces. Second 
Language Research, 26(3), 329-354. 
Sato, C. J. (1984). Phonological processes in second language acquisition: Another look 
at interlanguage syllable structure. Language Learning, 34(4), 43-58. 
Saussure, F. de (1966). In Bally, C., & Sechehaye, A. (Eds.) Course in general 
linguistics. New York: McGraw-Hill. 
?im??kov?, ?., Podlipsk?, V. J., & Chl?dkov?, K. (2012). Czech spoken in Bohemia and 
Moravia. Journal of the International Phonetic Association, 42(2), 225-232. doi: 
10.1017/S0025100312000102  
188 
 
Slobin, D. I. (Ed.). (1985). The crosslinguistic study of language acquisition: Theoretical 
issues (Vol. 2). Hillsdale, NJ: Lawrence Erlbaum Associates. 
Sorace, A., & Filiaci, F. (2006). Anaphora resolution in near-native speakers of 
Italian. Second Language Research, 22(3), 339-368. 
Sorace, A. (2011). Pinning down the concept of "interface" in bilingualism. Linguistic 
Approaches to Bilingualism, 1(1), 1-33. 
Stephany, U., Voeikova, M., Christofidou, A., Gagarina, N., Kova?evi?, M., Palmovi?, 
M., & Hr?ica, G. (2007). Early development of nominal and verbal morphology 
from a typological perspective-strongly inflecting languages: Russian, Croatian, 
and Greek. Antwerp Papers in Linguistics,112, 35-47. 
Thompson, S. P., & Newport, E. L. (2007). Statistical learning of syntax: The role of 
transitional probability. Language Learning and Development, 3(1), 1-42. 
Timmermans, M., Schriefers, H., Dijkstra, T., & Haverkort, M. (2004). Disagreement on 
agreement: person agreement between coordinated subjects and verbs in Dutch 
and German. Linguistics 42(5), 905-930. 
Tkachenko, E., & Chernigovskaya, T. (2010). Input frequencies in processing of verbal 
morphology in L1 and L2: Evidence from Russian. In Gronn, A., & Marijanovic, 
I. (Eds.), Russian in Contrast: Lexicon. Oslo Studies in Language 2(2), 281-318. 
T?rkenczy, M. (2004). The Phonotactics of Hungarian. Doctoral dissertation. Budapest: 
Hungarian Academy of Sciences. 
Tsapkini, K., Jarema, G., & Kehayia, E. (2001). Manifestations of morphological 
impairments in Greek aphasia: A case study. Journal of Neurolinguistics, 14(2), 
281-296. 
189 
 
Tsapkini, K., Jarema, G., & Kehayia, E. (2002). A morphological processing deficit in 
verbs but not in nouns: A case study in a highly inflected language. Journal of 
Neurolinguistics, 15(3), 265-288. 
Ullman, M. T. (2004). Contributions of memory circuits to language: The 
declarative/procedural model. Cognition, 92(1), 231-270. 
doi:10.1016/j.cognition.2003.10.008 
Vainikka, A., & Young-Scholten, M. (1996). Gradual development of L2 phrase 
structure. Second Language Research, 12(1), 7-39. 
VanPatten, B. (2004). Input processing in SLA. In B. VanPatten (Ed.) Processing 
instruction: Theory, research, and commentary. Mahwah, NJ: Lawrence Erlbaum 
Associates, Publishers. 
Wei, T., & Simko, V. (2017). R package "corrplot": Visualization of a correlation matrix 
(Version 0.84). Available from https://github.com/taiyun/corrplot  
Wexler, K. (1994). Optional infinitives, head movement and the economy of derivations. 
In D. Lightfoot & N. Hornstein (Eds.) Verb movement (pp. 305-350). Cambridge, 
UK: Cambridge University Press. 
 White, L. (1991). Adverb placement in second language acquisition: Some effects of 
positive and negative evidence in the classroom. Second Language Research,7(2), 
133-161. 
White, L. (2003). Second language acquisition and Universal Grammar. Cambridge, 
UK: Cambridge University Press. 
Wisniewski, K., Sch?ne, K., Nicolas, L., Vettori, C., Boyd, A., Meurers, D., Abel, A., & 
Hana, J. (2013). MERLIN: An online trilingual learner corpus empirically 
190 
 
grounding the European Reference levels in authentic learner data. In: ICT for 
Language Learning, Conference Proceedings. Libreriauniversitaria.it Edizioni. 
Retrieved from http://conference.pixel-
online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-
Wisniewski-ICT2013.pdf  
Yang, C. D. (2002). Knowledge and learning in natural language. Oxford, UK: Oxford  
            University Press. 
Yuan, B. (2001). The status of thematic verbs in the second language acquisition of 
Chinese: against inevitability of thematic-verb raising in second language 
acquisition. Second Language Research, 17(3), 248-272. 
Zobl, H., & Liceras, J. (1994). Functional categories and acquisition orders. Language 
Learning, 44(1), 159-180.