ABSTRACT Title of Dissertation: INVESTIGATIONS INTO THE NEURAL BASIS OF STRUCTURED REPRESENTATIONS Carol Susan Whitney, Doctor of Philosophy, 2004 Dissertation directed by: Professor Amy Weinberg Departments of Computer Science and Linguistics The problem of how the brain encodes structural representations is investi- gated via the formulation of computational theories constrained from the bottom- up by neurobiological factors, and from the top-down by behavioral data. This approach is used to construct models of letter-position encoding in visual word recognition, and of hierarchical representations in sentence parsing. The problem of letter-position encoding entails the speci cation of how the retinotopic representation of a stimulus (a printed word) is progressively con- verted into an abstract representation of letter order. Consideration of the ar- chitecture of the visual system, letter perceptibility studies, and form-priming experiments led to the SERIOL model, which is comprised of ve layers: (1) a (retinotopic) edge layer, in which letter activations are determined by the acuity gradient; (2) a (retinotopic) feature layer, in which letter activations conform to a monotonically decreasing activation gradient, dubbed the locational gradient; (3) an abstract letter layer, in which letter order is encoded sequentially. (4) a bigram layer, in which contextual units encode letter pairs that re in a particular order; (5) a word layer. Because the acuity and locational gradients are congruent to each other in one hemisphere but not the other, formation of the locational gradient requires hemisphere-speci c processing. It is proposed that this processing underlies visual- eld asymmetries associated with word length and orthographic-neighborhood size. Hemi eld lexical-decision experiments in which contrast manipulations were used to modify activation patterns con rmed this account. In contrast to the linear relationships between letters, a parse of a sentence requires hierarchical representations. Consideration of a xed-connectivity con- straint, brain imaging studies, sentence-complexity phenomena, and insights from the SERIOL model led to the TPARRSE model, in which hierarchical relation- ships are represented by a prede ned distributed encoding. This encoding is constructed with the support of working memory, which encodes relationships between phrases via two synchronized sequential representations. The model explains complexity phenomena based on speci c proposals as to how informa- tion is represented and manipulated in syntactic working memory. In contrast to capacity-based metrics, the TPARRSE model provides a more comprehensive account of these phenomena. INVESTIGATIONS INTO THE NEURAL BASIS OF STRUCTURED REPRESENTATIONS by Carol Susan Whitney Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful llment of the requirements for the degree of Doctor of Philosophy 2004 Advisory Committee: Professor Amy Weinberg, Chairman/Advisor Professor Christopher Cherniak Professor Don Perlis Professor Colin Phillips Professor James Reggia c Copyright by Carol Susan Whitney 2004 DEDICATION To my husband. ii ACKNOWLEDGEMENTS First, I would of course like to thank my committee members. Amy Weinberg, my advisor, was a perfect t in terms of computational, cognitive, and psycholinguistic interests. Her down-to-earth attitude and sense of humor have always been a pleasure. Colin Phillips has provided encouragement and detailed comments on my research. Amy and he have both helped me to present my research, identify impor- tant issues and implicit assumptions, and to position my work in the big picture. As these matters are not my strong point, hopefully their valuable instruction has rubbed o onto me somewhat. Jim Reggia was my M.S. adviser and has stuck with me on my Phd. committee. David Poeppel served as a committee member through my Phd stud- ies, but was not able to attend my defense. Don Perlis agreed to ll in, and Christopher Cherniak took on the role of Dean?s Represen- tative. Corey Washington initially sponsored my application to the Neural and Cognitive Sciences program, and was my adviser for the iii early part of my graduate work. My thanks to all these professors for contributing their time and expertise to my Phd studies. I am also grateful to my European colleagues, for their interest in my work in visual word recognition. Michal Lavidor re-ignited my interest in that work by inviting me to participate in a symposium. Thanks to her willingness to investigate my crazy ideas, I have been able to ob- tain experimental results to support my computational model. Michal and Tatjana Nazir also arranged a workshop which provided valuable interaction with the organizers themselves, and other researchers such as Marc Brysbaert, Andrew Ellis, Vincent Walsh, and Laurent Cohen, Padraic Monaghan and Richard Shillcock. During that trip, Jonathan Grainger also invited me for an interesting visit to his lab. Recently, I have also had the pleasure of corresponding with Piers Cornelissen, who is propelling me to consider new avenues of research, such as reading acquisition and dyslexia. I would also like to thank Rita Berndt, for whom I worked prior to starting my Phd studies. She gave me the freedom to investigate some interesting data, which launched me on the present path. Finally, my appreciation to my husband, Udaya Shankar, whose sup- port has sustained me in myriad ways. iv Contents List of Tables xi List of Figures xiii 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Introduction to The Problem of Letter-Position Encoding 5 2.1 De nition of LPE . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Why Study LPE? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Neurobiological Constraints on LPE 10 3.1 Terminology and Overview of the Visual System . . . . . . . . . . 10 3.2 Retina to V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Higher Cortical Areas . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Behavioral Results on LPE 19 4.1 Word-Level Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.1 Masked Form Priming . . . . . . . . . . . . . . . . . . . . 20 4.1.2 Positional Patterns . . . . . . . . . . . . . . . . . . . . . . 25 4.1.3 Seriality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 v 4.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Letter-Level Experiments . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Fixation at String Center . . . . . . . . . . . . . . . . . . 32 4.2.2 Non-central Fixation within a String . . . . . . . . . . . . 34 4.2.3 Unilateral Presentation . . . . . . . . . . . . . . . . . . . . 36 4.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5 Models of LPE 41 5.1 Desiderata for an LPE Model . . . . . . . . . . . . . . . . . . . . 41 5.2 Review of Modeling Basics . . . . . . . . . . . . . . . . . . . . . . 43 5.3 Models of LPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 Interactive Activation Model . . . . . . . . . . . . . . . . . 46 5.3.2 Print-to-Sound Models Trained by Back-Propagation. . . . 47 5.3.3 BLIRNET . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.4 A Split Fovea Model Trained by Back-Propagation . . . . 50 5.3.5 SOLAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.6 LEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 The SERIOL Model of LPE 59 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1.1 Highest Prelexical Orthographic Representation . . . . . . 59 6.1.2 Nature of Pre-Bigram representation . . . . . . . . . . . . 60 6.1.3 Induction of Serial Encoding . . . . . . . . . . . . . . . . . 61 6.1.4 Creation of the Locational Gradient . . . . . . . . . . . . . 63 6.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 SERIOL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 vi 6.2.1 Edge Layer to Feature Layer . . . . . . . . . . . . . . . . . 67 6.2.2 Feature Layer to Letter Layer . . . . . . . . . . . . . . . . 70 6.2.3 Letter Layer to Bigram Layer . . . . . . . . . . . . . . . . 71 6.2.4 Bigram Layer to Letter Layer . . . . . . . . . . . . . . . . 73 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7 Account and Simulations of LPE Behavioral Results 76 7.1 Word Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1.1 Bigrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1.2 Letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.2 Letter Perceptibility Patterns . . . . . . . . . . . . . . . . . . . . 95 7.2.1 Mathematical Model . . . . . . . . . . . . . . . . . . . . . 98 7.2.2 Short Strings . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . 109 8 Asymmetry of the Length E ect 112 8.1 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.2 SERIOL Account of the Length E ect . . . . . . . . . . . . . . . 116 8.3 Length Investigation . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 9 Asymmetry of the N e ect 127 9.1 The N e ect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.2 The SERIOL Account of the N e ect . . . . . . . . . . . . . . . . 129 9.3 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 9.4 N-e ect Investigation 1 . . . . . . . . . . . . . . . . . . . . . . . . 137 9.5 Further Predictions . . . . . . . . . . . . . . . . . . . . . . . . . 143 vii 9.6 N-e ect Investigation 2 . . . . . . . . . . . . . . . . . . . . . . . . 144 9.7 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.8 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10 SERIOL Speculations 156 10.1 Innate versus Learned Aspects of the SERIOL Model . . . . . . . 156 10.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 158 10.3 Feature-Level Processing and Dyslexia . . . . . . . . . . . . . . . 161 10.3.1 Simulation of Learning to Form the Locational Gradient . 162 10.3.2 Dyslexia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 10.3.3 Magnocellular De cit . . . . . . . . . . . . . . . . . . . . . 165 10.3.4 Possible Experimental Tests of these Proposals . . . . . . . 167 10.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 11 The Parsing Problem 171 11.1 Speci cation of the Problem . . . . . . . . . . . . . . . . . . . . . 171 11.2 Computational Constraints . . . . . . . . . . . . . . . . . . . . . . 173 11.3 Neurobiological Constraints . . . . . . . . . . . . . . . . . . . . . 178 12 Behavioral Results on Parsing 181 12.1 Complexity Phenomena . . . . . . . . . . . . . . . . . . . . . . . 182 12.1.1 Center-Embedding versus Crossed-Serial Dependencies . . 182 12.1.2 Di erent types of English doubly center-embedded clauses 183 12.1.3 Interference in Working Memory . . . . . . . . . . . . . . . 184 12.1.4 NP-type e ects . . . . . . . . . . . . . . . . . . . . . . . . 186 12.1.5 The RC/RC V2-drop e ect . . . . . . . . . . . . . . . . . 188 12.1.6 V2-drop x N3-type Interaction . . . . . . . . . . . . . . . . 189 viii 12.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 12.2 Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 12.2.1 Vosse & Kempen [Vos00] . . . . . . . . . . . . . . . . . . . 190 12.2.2 Interference in Working Memory . . . . . . . . . . . . . . . 191 12.2.3 Dependency Locality Theory . . . . . . . . . . . . . . . . . 192 12.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 13 Parsing Models 197 13.1 Representation of the Thematic Tree on a Computer . . . . . . . 198 13.1.1 How . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 13.1.2 Di erence from Neural Networks . . . . . . . . . . . . . . 201 13.2 Possible Neural Network Representations of the Thematic Tree . . 202 13.2.1 Production of an New Pattern . . . . . . . . . . . . . . . . 202 13.2.2 Temporal Encoding . . . . . . . . . . . . . . . . . . . . . . 207 13.2.3 Summary and Conclusions . . . . . . . . . . . . . . . . . . 210 13.3 Parsing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 13.3.1 SRNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 13.3.2 LTSMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 13.3.3 Pulvermuller . . . . . . . . . . . . . . . . . . . . . . . . . 217 13.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 14 The TPARRSE Model 220 14.1 RR encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 14.1.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 14.1.2 Representation of the Thematic Tree . . . . . . . . . . . . 223 14.1.3 Generating the RR encoding . . . . . . . . . . . . . . . . . 228 14.2 Temporal Working Memory . . . . . . . . . . . . . . . . . . . . . 233 ix 14.2.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 14.2.2 Representation of Syntactic Information . . . . . . . . . . 240 14.3 Processing Center-embedded Clauses . . . . . . . . . . . . . . . . 243 14.4 Partial Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 14.5 Arbitrary Hierarchical Structure . . . . . . . . . . . . . . . . . . . 248 15 Computational Demonstrations 250 15.1 Decoding an RR encoding . . . . . . . . . . . . . . . . . . . . . . 250 15.2 Temporal WM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 15.3 Parsing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 255 15.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 255 15.3.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 16 Complexity 265 16.1 Center Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 266 16.1.1 RC/RC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 16.1.2 Noun Complements . . . . . . . . . . . . . . . . . . . . . . 269 16.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 16.2 Crossed-Serial Dependencies . . . . . . . . . . . . . . . . . . . . . 272 16.3 Interference in Working Memory . . . . . . . . . . . . . . . . . . . 274 17 Conclusion 276 17.1 Future TPARRSE Research . . . . . . . . . . . . . . . . . . . . . 276 17.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Bibliography 280 x List of Tables 4.1 Results from Exp. 1a - 1c of [Hum90]. Each group of rows rep- resents a sub-experiment. Fac =(accuracy for prime accuracy for control), where 0 denotes no signi cant facilitation. Stimuli with the same facilitation were not statistically di erent from each other; the given value re ects their average. . . . . . . . . . . . . 21 4.2 Results from experiments 4 through 6 from [Hum90]. Each group of rows represents the results from a single experiment. Fac=(accuracy for prime accuracy for control), where 0 denotes no signi cant fa- cilitation. Stimuli with the same facilitation were not statistically di erent from each other; the given value re ects their average. . 23 7.1 Simulated and experimental results for priming conditions from [Gra04a]. Act denotes the activation of the target node in the simulation for the given prime. Fac denotes the the facilitation for that prime in the experimental results (di erence between reaction times for the control condition (dddd or ddddd) and the prime condition), where * denotes facilitation is statistically signi cant. The top group is ve-letter targets; the middle group is seven-letter targets and the bottom group is nine-letter targets. . . . . . . . . 81 8.1 Results for word targets. . . . . . . . . . . . . . . . . . . . . . . . 122 xi 8.2 Results for non-word targets. . . . . . . . . . . . . . . . . . . . . . 123 9.1 Stimuli for N-e ect investigations. . . . . . . . . . . . . . . . . . . 138 9.2 Results for N-e ect investigation 1. . . . . . . . . . . . . . . . . . 141 9.3 Results for N-e ect investigation 2. In the dimmed condition, the outer two letters were dimmed for RVF and LVF presentation, while only the rst letter was dimmed for CVF presentation. . . . 146 14.1 WM variables after each item x is processed from sentence 39. The relative pronoun that introduces the predicate C and starts a new clause, giving TotRR = sue + likes@(the + vase). It also causes its referent, the + vase, to be stored, so that it can be accessed when a gap is encountered. During processing of the relative clause, the parser determines that the object of bought is a gap, corresponding to the referent of the relative pronoun. At the end of the sentence, chunking is invoked, yielding the nal value of TotRR given in the text. . . . . . . . . . . . . . . . . . . . . . 232 xii List of Figures 4.1 Results from [Wol74], with LVF/RH on left and RVF/LH on right. Each line represents a xed retinal location. As string position is increased (i.e., more letters occur to the left), performance de- creases. The pattern of decrease varies with visual eld. . . . . . . 37 4.2 Results from [Est76], for the 2400 ms exposure duration. . . . . . 38 5.1 Basic components of an implemented model. Each node has an activation value (shown in the center of the node). At the lowest level of the model, activation values are clamped to particular val- ues. Each connection has an associated weight. The input to a node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.1 Interaction of input level and timing of ring for a cell with un- dergoing a sub-threshold oscillation of excitability. When a rela- tively high level of input (top curving line) is added to the base oscillation, the cell crosses threshold at time 1 (action potential not illustrated). If less input were received, the cell would cross threshold later in the cycle, such as at time 2. . . . . . . . . . . . 62 xiii 6.2 Architecture of the letter, bigram, and word levels of the SERIOL model, with example of encoding the word CART. At the letter level, simultaneous graded inputs are converted into serial ring, as indicated by the timing of ring displayed under the letter nodes. Bigram nodes recognize temporally ordered pairs of letters (con- nections shown for a single bigram). Bigram activations (shown above the nodes) decrease with increasing temporal separation of the constituent letters. Activation of word nodes is based on the conventional dot-product model. . . . . . . . . . . . . . . . . . . 64 6.3 Formation of the locational gradient at the feature layer, for the centrally xated stimulus CASTLE. The horizontal axis represents retinal location, while the vertical axis represents activation level. The bold-face letters represent bottom-up input levels, which are higher in the RH than the LH. In each hemisphere, activation decreases as a eccentricity increases, due to the acuity gradient. The italicized letters represent the e ect left-to-right inhibition within the RH, and RH-to-LH inhibition. In the RH, C inhibits A, and C and A inhibition S, creating a decreasing gradient. The RH inhibits each letter in the LH by the same amount, bringing the activation of T lower than that of S. As a result, activation monotonically decreases from left to right. . . . . . . . . . . . . . 69 7.1 Comparison of simulated score and amount of facilitation using data from Table 7.1 (r=.87; p < .0001). . . . . . . . . . . . . . . 82 xiv 7.2 Experimental [Whi99] and simulated results for the aphasic error pattern. The percent retained refers to the percentage of erroneous trials in which the letter in the ith position in the target occurred in the ith position the response (n = 201 for experiment; n = 363 for simulation). Data are collapsed over target lengths of three to six. (In the both the experimental data and the simulation, there was also a decreasing pattern within each target length.) . . . . . 84 7.3 Simulation results under backward scoring, and no inhibition. In backward scoring, the target and response are aligned at the nal letter, and scored from right to left. In this case, position 1 corre- sponds to the nal letter, 2 corresponds to the next-to-last letter, etc. The backward results are from the same simulation run as Figure 7.2. For the no-inhibition condition, a new simulation was run with Cinh = 0, and scored in the forward manner. Because backward scoring yielded a relatively at pattern, and no inhibi- tion yielded a V-shaped pattern, this shows that the decreasing pattern in Figure 7.2 was not merely an artifact of the scoring method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.4 Experimental reaction times (in milliseconds) for the rotated-string lexical-decision task. Each line represents one angle of rotation, where the lower lines correspond to 0 through 80 , and the upper lines correspond to 100 to 180 . . . . . . . . . . . . . . . . . . . . 87 7.5 Simulated reaction times for the rotated-string, lexical-decision task. Notation is the same as Figure 7.4. . . . . . . . . . . . . . . 94 xv 7.6 Schematic of locational gradients for the stimulus CART at three di erent presentation locations. The vertical axis represents acti- vation, while the horizontal axis represents retinal location. For central presentation, the gradient is smoothly and rapidly decreas- ing. For RVF presentation, the gradient is shallower because the acuity gradient is shallower. For LVF presentation, the initial let- ter strongly inhibits nearby letters, but the gradient attens out as acuity increases. . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.7 Experimental (top) and modeled (bottom) results of [Wol74], with LVF presentation on the left and RVF on the right. Each graph shows the e ect of string position on perceptibility at a given reti- nal location (speci ed in R units of letter width). . . . . . . . . . 99 7.8 Experimental results from [Est76] for a four-letter string embedded in $?s, occurring at two di erent retinal locations in each visual eld. Exposure duration was 2400 ms. (Subjects were trained to maintain central xation, and their gaze was monitored.) . . . . 104 xvi 7.9 Locational gradient and resulting ring pattern for LVF/RH pre- sentation (normal font) and RVF/LH presentation (bold italics). Top: Comparison of locational gradient for string CDFG under RVF/LH presentation and LVF/RH presentation. Bottom: Car- toon of resulting ring pattern at the letter level. The point in the oscillatory cycle at which the down phase prevents further ring is marked *. In the LVF/RH, the rst letter res faster and longer than the other letters, because it receives a much higher level of input. The variations in the amount of bottom-up input create de- creasing activation across the string. The nal letter starts ring late in the cycle, and is soon cut o by the end of the oscillatory cycle, giving no nal-letter advantage. In the RVF/LH, each letter rapidly cuts o ring of the previous letter, allowing the nal letter to re a long time. As a result, activation is at across the string and rises for the nal letter. These ring patterns account for the perceptibility patterns at the larger eccentricities in Figure 7.8. . . 106 7.10 Results from Experiment 2 of [Leg01] for the two largest eccentric- ities, grouped by exposure duration, with 95% con dence intervals. 108 xvii 8.1 Example of proposed LVF/RH locational gradient for normal pre- sentation (bold face) and under contrast manipulation (italics, shifted to the right for clarity) for a six-letter word. Horizontal axis represents retinal location, while vertical axis represents acti- vation level at the feature layer. For normal presentation, the lo- cational gradient is not smooth, becoming quite at near xation. Increasing the contrast of the second and third letters raises their activation levels, and decreases the activation levels of the fourth and fth letters due to increased left-to-right inhibition. Decreas- ing the contrast of the sixth letter decreases its activation level. As a result, the locational gradient is more smoothly decreasing. 118 8.2 Results for word targets. . . . . . . . . . . . . . . . . . . . . . . . 124 9.1 Outer dimming in the LVF/RH. The normal locational gradient is shown in bold-face. The results of outer dimming are shown in italics (shifted to the right for clarity.) Reducing the contrast of the rst letter reduces its activation level, and decreases inhibition to the second and third letters, increasing their activation levels. As a result, the locational gradient is shallower across the rst three letters. Reducing the contrast of the fourth letter reduces its activation level. As a result, the locational gradient is smoother across the last three letters. . . . . . . . . . . . . . . . . . . . . . 134 9.2 Predicted pattern for Experiment 2. . . . . . . . . . . . . . . . . . 136 9.3 Results for N-e ect investigation 1. . . . . . . . . . . . . . . . . . 142 9.4 Results for N-e ect investigation 2. . . . . . . . . . . . . . . . . . 146 xviii 11.1 Examples of nite state machines (FSMs). Each recognizer con- sists of a start state, S, and an accept state, A, and intermediate (numbered) states. Transitions occur between states for speci c input tokens, where e represents the end-of-string token. The top FSM accepts strings of the form anbm, for n 1 and m 1. For example, the string a1b1b2b3would activate the following sequence of states: S,1,2,2,2,A. The bottom FSM accepts strings of the form of (ab)n, for n 1. For example, the string a1b1a2b2 would activate the following sequence of states: S,1,2,1,2,A. . . . . . . . . . . . . 175 11.2 Example of using a stack to recognize strings of the form anbn . A stack S provides the push(S,x) operation, which puts x on the top of the S, the pop(S) operation, which removes the top item from S and returns it, and the empty(S) operation, which is true only if there are no items on S. The string anbn can be recognized using the following algorithm for token x : . . . . . . . . . . . . . . . . 176 xix 13.1 Example of encoding Mary knows that Ted likes Sue in computer memory. The left column represents memory addresses, which systematically increase. The right column represents registers. The programmer would declare a record having Agent, Verb, and Theme variables. For each instance of this record the compiler would map these variables onto speci c consecutive addresses. Here the record Main starts at 1200 and the record Sub starts at 1392. The value of Main?s Theme variable is a pointer to Sub. Mary, knows, Ted, etc. correspond to numbers that have been associated with each token. (For simplicity, the problem of how to deter- mine whether a register?s value should be interpreted as a memory address is ignored. ) . . . . . . . . . . . . . . . . . . . . . . . . . 200 xx 13.2 Example of network that learns to form an RR encoding. Each box represents a group of nodes of the same size, and each arrow represents full interconnectivity between two groups of nodes. For each training item, the input and output layers are set to the same value. Using the back-propagation training algorithm, the network learns to recreate the input on the output layer. As a result, the hidden layer (in conjunction with the learned weights) forms a con- densed representation of the input. This condensed representation could then be used as one of the values on the input layer. For ex- ample, in the Mary knows Ted likes Sue example, the patterns for Ted, likes, and Sue would rst be activated over the corresponding sets of input nodes. The resulting pattern on the hidden layer con- stitutes an RR encoding of this information. Then the input layer is set to Agent = Mary, Verb = knows, and Theme = the hidden layer pattern. The new hidden layer pattern then represents the encoding of the entire sentence. Such an encoding is decoded by activating the pattern on the hidden layer to get the component values on the output layer. An output item that is itself an RR encoding can then be fed back to the hidden layer again to be decoded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.3 Example of bind and merge operations. . . . . . . . . . . . . . . . 205 13.4 Example of temporal encoding of Ted = Agent and Sue = Theme. The lines to the right of each node represent the ring pattern for that node. For simplicity, each word and role is represented here as a single node. However, the same type of encoding could be used for a distributed representation of each item. . . . . . . . . . 208 xxi 13.5 Architecture of a recurrent network. The hidden units connect into the context units, which feed back to the hidden units. Thus the hidden units? previous activations can a ect their subsequent activations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 13.6 Example of detector S which recognizes sequence A B, from [Pul03].218 14.1 Basic algorithm for generating the RR encoding of a sentence hav- ing only right-branching clauses. . . . . . . . . . . . . . . . . . . . 234 14.2 Illustration of timing of ring of list elements A, B, and C. Each new element is activated at the peak of the oscillatory cycle. Pre- viously activated items move forward with respect to the cycle, due to the ADP. Over time, A, B,and C come to re successively within a single cycle. . . . . . . . . . . . . . . . . . . . . . . . . . 236 14.3 Proposed architecture for a WM list, illustrated for positions N to N+2. In this example, 100, 110, and 001 are encoded across those positions on successive oscillatory subcycles. Each large circle rep- resents a bank of nodes coding for the same value and position. A subset of those nodes is shown by the small circles. Each col- umn represents a vector position. The top row encodes 0?s, while the bottom row encodes 1?s. The number in each node re ects the oscillatory subcycle in which in res. Fast connections coordi- nate ring within a sub-cycle, while slower inhibitory connections separate subcycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 239 xxii 14.4 Proposed architecture of deletion network. The tag eld is com- prised of syntactic features F1, F2, F3 ... Fn, with multiple in- stances of each feature (two instances shown here). Each feature has inhibitory connections to the corresponding feature in Dtag, and each feature in Dtag inhibits the node which drives the dele- tion process. When the tag- eld features inhibit all of the Dtag features, the perform-deletion node is activated and deletion is ini- tiated. Deletion is sustained via the self-excitatory connection. The gating node becomes activated only if it receives excitation from both the perform-deletion node and the list node. In that case, the list node is inhibited. Thus inhibition only applies to ac- tive list nodes, and does not a ect list nodes that red prior to the initiation of deletion. (Only a single list node is shown. A similar circuit is required for each list node.) . . . . . . . . . . . . . . . . 247 15.1 Chunking and branching procedures for the full RR encoding al- gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 15.2 Full RR encoding algorithm, using Chunk and Branch operations speci ed in Figure 3. . . . . . . . . . . . . . . . . . . . . . . . . . 258 xxiii Chapter 1 Introduction 1.1 Overview The ultimate goal of computational neuroscience is to specify how cognition arises from neural activity. This requires understanding how neurons represent information about the world. One of the more challenging aspects of such an in- vestigation is the question of how structured information is represented. That is, how are the sub-parts of an entity encoded? It is not su cient to simply encode their identities. Rather, the relationships between sub-parts must be represented. My overarching interest is to investigate the nature of such structured represen- tations. I will rst address this question in a limited visual domain (visual word recognition) and then in a linguistic domain (creation and representation of a syntactic tree). In this work, I will distinguish between implemented and theoretical models. An implemented model refers to a simulation or a mathematical demonstration. In contrast, a theoretical model is a framework specifying the nature of the com- putations that are carried out in the brain. All or part of a theoretical model can be implemented to demonstrate the validity of related claims, essentially by o er- ing an existence proof. This requires choosing speci c functions and parameters 1 for the implementation. Thus an implemented model is but a single instantiation of a more general theoretical framework. Reasoning about a theoretical model can often be more fruitful than building an implemented model. I am primarily interested in formulating theoretical models, using implementations of portions of the resulting models as a proof of concept. Throughout this work, model will refer to either a theoretical model or an implemented model when there is no ambiguity as to which is meant. Otherwise, the type of model will be speci ed. In particular, I seek to understand what representations and transformations are used by the brain once a task has been learned. Formulation of such a theoret- ical model requires in-depth consideration of all available sources of information. Behavioral data indicate what algorithms the brain is using. Neurobiological and anatomical data constrain how these algorithms are realized in tissue. The goal is to create a model that explains the behavioral data and can be mapped onto a neural network. Ideally, such a theory should lead to novel, experimentally veri able predictions. I have used this approach in two domains. The rst is the question of how the brain encodes letter position in a string during visual word recognition. This model speci es how a retinotopic representation is progressively converted into an abstract encoding of letter order. Location-invariance is achieved by creating a temporal (serial) representation of letter order. The model is consistent with the neuroanatomy of the lower levels of the visual system, and explains a wide range of letter perceptibly and form-priming data. Moreover, the model has generated precise predictions concerning the source of visual- eld asymmetries, which have been experimentally con rmed. As we will see, there are many novel aspects to this model: The e ect of visual acuity is explicitly considered. 2 The retinotopic representation is initially split (across the hemispheres). Hemisphere-speci c processing is proposed at the feature level. A location-invariant representation is created by mapping space onto time. Representational units based on ordered letter pairs are proposed. The model has provided new insights into the source of visual- eld asym- metries at both the letter and word levels. The second domain that I have investigated is the question of how the brain creates the representation of a parse of a sentence. The letter-position model has informed this parsing model; there is a serial representation of phrases in stored working memory. In addition to this serial representation, there is also a distributed representation of sentence structure. The interaction of these two types of representations allows a comprehensive account of phenomena related to sentence complexity. Novel aspects of the model include: The proposal that working memory uses dual, synchronized sequences to encode syntactic information. Speci cations of a parsing algorithm and a hierarchical representation which are based on the computational properties of a prede ned distributed rep- resentation. An account of complexity phenomena that is not based on storage limi- tations, but rather arises from the way in which syntactic information is encoded in working memory. These computational theories demonstrate the feasibility of bridging the neural and cognitive levels via the close integration of modeling and experimental work. 3 A similar method of presentation will be used for both models. First, the computational problem is introduced and speci ed. Then the anatomical and neurobiological constraints are addressed, followed by a review of the relevant behavioral data. Previous models are presented, and their ability to meet these constraints is discussed. My model is then overviewed at a high level, and then given in detail. Implementations of portions of the model are presented, followed by experimental results (for the letter-position encoding model). A description of future work concludes the discussion of each model. 4 Chapter 2 Introduction to The Problem of Letter-Position Encoding In this chapter, I de ne the problem of letter-position encoding, and discuss why it is an excellent problem for investigating structured neural representations. 2.1 De nition of LPE Letter-position encoding (LPE) is required during visual word recognition due to the existence of anagrams. That is, letter identities are not su cient to uniquely identify a word because there may be several words comprised of the same letters. For example, the letters A,I,R,L can be used to form the words LAIR, LIAR, RAIL (and others). Thus there must also be some encoding of the position or order in which the letters occurred. This encoding of the input is then compared against stored representations in order to recognize the word. Therefore, at the highest level, the problem of LPE is the question of what type of sublexical orthographic encoding maps onto the lexical level during visual word recognition. For example, such an encoding might be position-speci c, with separate representations for each letter in each position. Under this scheme, the input liar is represented by activating units L1, I2, A3 and R4, whereas the input rail is represented by activating R1, A2, I3 and L4. Here L1,L2,L3 etc., 5 are encoded by di erent sets of neurons. In contrast, there may be position- independent letter units which can dynamically represent positional information in some way. Alternatively, a letter?s position may not be represented explicitly, but rather its context may be encoded. For example, the input liar could activate *LI, LIA, IAR and AR* units, where * denotes a word boundary. Thus, this representation speci es relationships between letters (i.e., each unit speci es the letters immediately to the right and left of the central letter). However, in understanding how the brain represents structured information, it is not su cient to merely address the nature of the high-level representations. It is also necessary to understand how those representations are formed. Thus, taking a comprehensive approach to LPE, I also investigate the question of how such an abstract sub-lexical representation is created from an early, location- speci c representation. It is well-known that the earliest cortical visual levels are retinotopically organized. That is, each letter occurs at a speci c spatial location on the retina, and this spatial organization is maintained into the cortex. How is an abstract representation of letter position created from input that is tied to speci c retinal locations? In the following, position will refer to a letter?s position within a string, while location will refer to a letter?s location in the visual eld and hence on the retina. One aspect of processing not included in the question of LPE is how letters are recognized. Rather, the question is more abstract. Given the ability to recognize letters, how is string-centered positional information calculated, and combined with letter identity information? This work assumes an experienced reader, under the assumption that brains solve the problem in a standard way, as discussed below. The details of how the relevant transformations and representations are learned during reading acquisi- 6 tion is left for future work, although some speculations on this topic are included in Chapter 10. Note also that a full model of visual word recognition is not being sought. For example, phonological and frequency e ects will not be considered. Rather, the focus is on an orthographic route to the lexicon. However, I do note that the encoding of letter order must also subserve the learning of grapheme-phoneme correspondences, and thus it must also be suitable for this task. This topic is also brie y addressed in Chapter 10. A viable theoretical model of LPE should be consistent with relevant neuro- biological and behavioral data. The lowest level of the model should employ a retinotopic representation, and the highest should employ a lexical representa- tion. The model should specify the nature of the representations at in-between levels, and the transformations between levels. An important criterion is that the transformations should be biologically plausible. That is, they should employ known neural encoding mechanisms, or be compatible with the type of local, nu- meric computations that can be carried out by a network of simple, neuron-like units. 2.2 Why Study LPE? LPE is an ideal arena for studying structured neural representations, because it is complex enough to be interesting, but simple enough to be tractable. More- over, the processing must tap into basic neural mechanisms because there can be no speci c adaptation for reading due to its recent appearance on the cognitive scene. One central, outstanding issue in cognitive science is the binding problem. 7 How are separate features combined within a single object? For example, consider color and shape. It is known that these attributes of an object are processed in separate areas of the brain. When you see a red square and a yellow triangle, how are the correct associations between color and shape encoded (so that you don?t perceive a yellow square and a red triangle)? A similar problem exists in LPE. How are a letter?s identity and its position bound together? Another important issue is how location invariance is obtained during object recognition. How are we able to recognize the same object at di erent locations and sizes? Given that the input to the visual system is retinotopic, either a recognizer must be duplicated over and over for di ering retinal locations, or there must be a mechanism to abstract away from retinal location before input reaches the recognizer. It is clear that the replication method is used by the visual system for low-level features, such as edges. Such replication is highly ine cient for complex objects, since so many di erent objects are possible. Thus it is unlikely that it is employed for high-level object recognition. However, it has been claimed that a single detector could recognize an ob- ject in a location-invariant way without an explicit abstraction capability, as receptive- eld size and featural complexity increase through the processing hi- erarchy. This approach has been implemented for recognition of simple objects, where the identity of low-level features (i.e., types of line intersections) is su - cient to recognize an object [Fuk88, Ore00]. It has also been used in implemented model of visual word recognition [Moz91], but as discussed in section 5.3.3, this model relied on an unrealistic jump in receptive- eld size. As discussed by Hum- mel and Biederman, a recognizer that is based on feature conjunctions has the inherent limitation that it is susceptible to illusory recognition, wherein recogni- tion is erroneously triggered by a set of jumbled features that have the correct 8 identities, but not the correct relationships to each other [Hum92]. This di culty occurs because relationships between sub-parts are not explicitly represented. In contrast, an abstraction mechanism that speci cally maintains relational infor- mation while removing locational information would not have this problem. As discussed above, a similar problem arises in LPE. How is a representation that is initially tied to retinal location converted to an abstract letter-order encoding that can be matched against a stored representation? Thus the question of LPE involves key problems in cognitive science. At the same time, it is a very circumscribed problem, allowing ease of investigation. It involves a small number of known, basic units (i.e., letters) which can organized along a single dimension (i.e., string position). Bottom-up aspects of stimuli can be experimentally varied, by manipulating retinal location, letter order, and contrast levels. Top-down factors can also be selected for, such as lexicality, frequency, length, reading direction, etc. Thus, the problem is easily investigated experimentally. Recent brain imaging studies have identi ed a left-hemisphere, inferotemporal cortical area that seems to be involved in the abstract encoding of letter order, dubbed the Visual Word Form Area (VWFA) [McC03]. Interestingly, there is very little variation across subjects in the location of the VWFA [Coh02]. This suggests that brains solve the problem of LPE in a standard way [McC03]. However, this solution must rely on general representational mechanisms, due to the recency of reading on an evolutionary timescale. Thus, understanding how the brain solves the problem of LPE should reveal binding and abstraction mechanisms that are relevant to other domains. 9 Chapter 3 Neurobiological Constraints on LPE The architecture of the visual system from the retinas to early cortical areas determines the characteristics of the input into the functional LPE network. I rst discuss these constraints, and then review brain-imaging and neurological studies on higher cortical areas implicated in visual word recognition. 3.1 Terminology and Overview of the Visual System The cortex is divided into the right and left hemispheres, and the bers which connect the hemispheres are known as the corpus callosum. Each hemisphere is comprised of four lobes. Occipital cortex lies at the back of the head. Parietal cortex lies above the occipital area, while temporal cortex lies forward of the occipital area. Frontal cortex lies in front of the parietal and temporal areas. The visual image is initially projected onto the retinas. This visual informa- tion is processed through several layers of cells, and leaves the retina via ganglion cells. Ganglion cell axons extend through the optic tracts to the lateral geniculate nucleus (LGN), the visual area of the thalamus. LGN cell axons then extend to the cortex. In the retina, there are two major classes of ganglion cells, magnocells and par- 10 vocells. The larger magnocells process information more quickly than the smaller parvocells. Magnocells are sensitive to motion and low spatial frequencies (i.e. overall shape), while parvocells are sensitive to color and high spatial frequencies (i.e., ne detail). Separate magnocellular and parvocellular pathways are main- tained through the LGN into the cortex. The rst cortical area to receive visual inputs lies in occipital cortex and is known as V1. V1 connects to V2, and then the visual pathway splits into two streams. The ventral stream extends though region V4 into lower temporal (inferotemporal) cortex. The ventral stream han- dles object recognition, and receives inputs from both the the parvocellular and magnocellular pathways [Fer92]. The dorsal stream extends through region V5 into parietal cortex. The dorsal stream handles motion processing, spatial lo- calization, and attention, and receives inputs primarily from the magnocellular pathway [Mau90]. We next consider the connectivity from the retina to the cortex in more detail, because the architecture of the early part of the visual system has rami cations for visual word recognition. 3.2 Retina to V1 Light coming into the eye is focused onto the retina, where it is transduced by photoreceptor cells (rods and cones) into electrical signals. Cones provide the high spatial resolution which is necessary for letter identi cation during reading. The center of the retina, the fovea, only contains cones and is free of blood vessels. Therefore, this area provides the highest acuity. It corresponds to about 1:5 of visual angle. (For reference, 4 or 5 letters occupy about 1 under normal reading conditions). Cone density (and therefore visual acuity) is highest at 11 the very center of the fovea (corresponding to the xation point), and rapidly falls o away from the center. For example, at an eccentricity of 0:17 from xation, cone density is decreased by 25% [Wes87]. The rate of decrease in cone density is highest closest to xation, and falls o as eccentricity increases [Wes87].1 Resolution remains elevated into the parafovea, the retinal region surrounding the fovea, corresponding to a diameter of about 5 . Each cone cell projects to about three ganglion cells [Was95]. Ganglion cell axons from both eyes converge in the optic chiasm. There, the bers from each eye split. Imagine a vertical line dividing each retina in half through the center of the fovea. Those bers originating from the nasal side of this line cross the optic chiasm to enter the contralateral (opposite) optic tract, while those origi- nating from the outer side of this line remain in the ipsilateral (same) optic tract. Therefore, after the optic chiasm, information is split by visual eld, not by eye. Information from the left half of the visual eld (LVF) is carried in the right optic tract, and information from the right half of the visual eld (RVF) is carried in the left optic tract. The spatial relationships between cells are maintained from the retina through the LGN and V1. Thus V1 is retinotopically organized, with nearby cells repre- senting nearby points in space. Due to the routing of bers at the optic chiasm, each visual eld is projected onto the contralateral cortical hemisphere. That is, the LVF projects to the right hemisphere (RH) portion of V1, while the RVF 1This acuity pattern is commonly misrepresented as \acuity falls o rapidly outside the fovea", implying that acuity is uniformly high across the fovea and then falls o . This is not the case. Rather, acuity falls o rapidly within the fovea, so that acuity is substantially reduced by the fovea / parafovea boundary (but still remains higher than outside the parafovea). The rate of decrease in acuity is actually sharper across the fovea than the parafovea. 12 projects to the left hemisphere (LH) portion of V1. The pattern of spatial reso- lution is magni ed into V1. The number of cells representing a xed amount of visual space is highest at xation and decreases as eccentricity increases. As a result, a disproportionate amount of V1 is devoted to representing the fovea and the parafovea [Ino09, Bri68]. There has been some controversy regarding whether information is precisely split by visual half- eld in humans, primarily due to the phenomenon of macu- lar sparing. Often brain damage to one hemisphere of V1 obliterates vision in the contralateral visual eld except for the foveal area. This suggests that the entire fovea may be bilaterally represented. That is, foveal vision may be spared because it also also represented in the undamaged hemisphere [Ino09]. Alterna- tively, foveal vision may be spared because damage to the lesioned hemisphere is incomplete, due to the large cortical area devoted to representing the fovea. This issue is highly relevant to visual word recognition, because if the visual elds do not overlap, the representation of a xated string is initially split across the cere- bral hemispheres, requiring downstream integration of the representations of the two halves of the string [Bry04]. (In fact, a special edition of Brain and Language was devoted to this topic [Bra04]). Mounting evidence from several lines of investigation indicates that the rep- resentation of the fovea is indeed initially split across the hemispheres [Lav04b]. Behavioral experiments have shown that length and orthographic-neighborhood e ects, which occur in the LVF but not the RVF under unilateral presentation, are also speci c to the LVF portion of the string under central presentation [Bry94, Bry96, Lav01a, Lav04a]. Transcranial magnetic stimulation was used to disrupt neural function over either left or right V1 during processing of cen- trally presented strings. Unilateral disruption caused e ects speci c to the half of 13 the word presented in the contralateral visual eld, as would be expected under a split fovea [Lav03]. Le [Lef04] discusses several arguments against bilateral representation. There is no evidence in humans for the white matter pathways that would be required for such a representation. Also, \there has been no direct demonstration of this extra representation of ipsilateral central vision in human visual cortex, which, given the resolution of modern non-invasive techniques and the amount of cortex these regions must occupy if they are to support high acuity vision, is damning." ([Lef04], p.276.) Furthermore, about 30% of hemianopia vic- tims do not experience macular sparing, and so su er from complete obliteration of a visual half- eld [Lef04]. If the fovea were truly bilaterally represented, such a de cit pattern should not occur under unilateral damage. Thus, there is no positive evidence for bilateral representation of the fovea. The most likely source of macular sparing is incomplete damage to the a ected hemisphere, due to the extensive cortical area devoted to representing the fovea [Lef04, Lav04b]. Therefore, available evidence indicates that the representation of the fovea is initially split across the hemispheres. This information must be then be integrated into a unitary representation of a letter string. Recent studies indicate that speci c cortical areas become specialized for this task. 3.3 Higher Cortical Areas Neuroimaging studies have provided converging evidence that areas of left occipital and inferotemporal cortex play a special role in reading. In an EEG study, normal readers showed a LH-speci c increase in theta-band power (5 to 10 Hz) at occipital sites during reading, while dyslexics show reduced, bilateral theta-band activity [Kli01]. An MEG study [Tar99] has identi ed an early string- 14 speci c response at approximately 150 ms post-stimulus in the posterior region of occipitotemporal cortex, where activation was stronger for letter strings than for strings of symbols. Response strength and latency in this area correlated with the speed with which subjects were able to read words aloud. For dyslexic subjects, this area did not show preferential activation for letter strings [Hel99]. EEG and fMRI studies have revealed a more anterior string-speci c response in the LH beginning at about 180 ms post-stimulus. This activity has been localized to speci c cortical coordinates (x = -43, y = -54, z= -12 mm to the left, posterior and below the anterior commissure, respectively) [Coh02]. This area corresponds to an activation peak in about 90% of subject scans, with a standard deviation of 5 mm [McC03]. Thus the location of this response is remarkably uniform across subjects. This area has been dubbed the Visual Word Form Area (VWFA) [McC03]. Although there is some debate on whether this area should be so labeled since it also responds to other types of stimuli and other areas also respond to letter strings [Pri03], there is strong evidence that this area becomes preferentially tuned to processing letter strings [McC03]. VWFA response is preferentially tuned to letter strings (as compared to ar- rays of pseudo-letters) [Nob94], but is insensitive to surface features of letters, such as font and case [Pol02, Deh04], and to their string position and retinal loca- tion [Deh04]. Activation is also insensitive to lexical features, such as frequency [Fie02], and whether a orthographically legal string is a real word or a pseudoword [Deh02]. However, activation is reduced in response to strings consisting only of consonants [Coh02]. VWFA activation is modality-speci c, showing no response to passive listening of spoken words [Deh02]. The activation of the VWFA is independent of the location of the stimulus. For unilaterally presented strings, 15 fMRI showed contralateral activation up to an area probably corresponding to V4. Then, starting at the VWFA, activation was lateralized to the LH, independently of stimulus location [Coh00]. Damage to the region of the VWFA is associated with pure alexia, wherein lex- ical access via orthography is selectively impaired [Bin92, Bro72, Ges95, War80]. This impairment often does not cause a total inability to read, but rather causes slowed reading that is abnormally sensitive to word length (dubbed letter-by- letter reading). The abilities to write and to recognize orally spelled words are preserved. Lesions that are limited to the callosal connections between RH visual areas and the VWFA result in pure alexia that is speci c to LVF stimuli [Coh00]. Thus the VWFA seems to convert a visually presented letter string into a location-invariant representation based on abstract letter identities. The results of [Coh00, Deh04] indicate that this prelexical representation is assembled in the LH, and that lexical access occurs in the LH. It is thought that letter-by- letter readers perform lexical access by representing a letter sequence in verbal working memory, rather than by the more e cient, direct route usually provided by the VWFA [Coh03]. Because writing and lexical access via indirect routes are preserved, it seems that the VWFA does not actually encode how words are spelled. A pattern of acquired dyslexia (i.e., resulting from brain damage) observed in two Hebrew subjects su ering from left occipitoparietal lesions suggests that the encoding of letter identity can be separated from the encoding of position. These subjects made reading errors that were characterized by migration errors within a word; that is, errors were predominately anagrams of the target word [Fri01]. Such a dyslexia has not been encountered in more commonly studied lan- guages, such as English. However, Hebrew orthography is particularly conducive 16 to revealing a de cit of this sort, since vowels are not explicitly represented. Therefore, if letter order is misperceived, there is a high probability that a word corresponding to the erroneous ordering exists for some combination of vowels. Thus, lexical constraints are reduced, allowing a pure de cit in position-encoding to be revealed. The lesions in the above subjects occurred along the dorsal route of the visual system. A role for the dorsal pathway in encoding letter order is consistent with a study showing that ability to detect coherent motion in two-dimensional arrays of moving dots was correlated with accuracy in letter-position encoding (for lexical decision involving nonwords formed by transposing two letters of actual words) [Cor98]. This result is also consistent with evidence that developmental dyslexia is associated with subtly impaired magnocellular function [Lov93, Ste97]. However, it remains unclear whether such visual impairment is a causal factor in developmental dyslexia. In contrast to these patterns of dyslexia, damage to the left angular gyrus results in complete illiteracy (global alexia). Such patients cannot read or write, or even name letters [Bro72, Ges95]. The angular gyrus is located at the junction of the occipital, temporal, and parietal cortices. Thus it seems to be a multi- modal association area. In the case of reading, the left angular gyrus is thought to subserve the translation of the orthographic encoding of a word into its phono- logical and semantic representations [Dem92, Ges95]. Therefore, this area seems responsible for encoding how words are spelled, and may provide the orthographic lexicon. 17 3.4 Summary Letters to the right of xation are initially projected to the LH, and letters to the left are projected to the RH. The representation in V1 is location-speci c; each cell represents a stimulus occurring at a speci c retinal location. The number of cortical cells representing a letter depends on the letter?s eccentricity, following an acuity gradient which originates in the density of cones in the retina. Acuity is highest near xation, and falls o as eccentricity increases. The rate of decrease in acuity decreases as eccentricity increases. At about 150 ms post-stimulus, cor- tical activation becomes left lateralized in response to letter strings (in normal readers). Areas of occipitotemporal and occipitoparietal cortex encode an ab- stract representation of letter order, which may contact lexical representations via the angular gyrus. Thus, the location-speci c representation of a string, which is initially split across the hemispheres in V1, is integrated into a location-invariant, letter-based encoding in the LH. However, neurological investigations and brain-imaging tech- niques cannot reveal how this transformation is performed. For clues to the answer to this question, I turn to the results of behavioral experiments. 18 Chapter 4 Behavioral Results on LPE In this chapter, I review experimental evidence from behavioral studies. I rst consider those studies relevant to the issue of what type of prelexical representa- tion contacts the word level. In these studies, the target stimuli were words. I then consider studies in which targets were random letter strings. Such studies can reveal patterns at the letter level under reduced lexical in uences. As we will see, the studies indicate the following: The relative order of letters is important in word recognition. There are position-independent letter units. That is, there are abstract letter representations that are not speci c to string position or retinal lo- cation. Letter perceptibility varies with string position and retinal location, and these patterns di er from those of non-letter symbols. The presence or absence of a length e ect on RTs cannot reliably indicate whether lexical access proceeds serially or in parallel. There is a serial readout of the visual image. 19 4.1 Word-Level Studies 4.1.1 Masked Form Priming The most informative experiments on the nature of the prelexical encoding have used the masked-priming procedure, wherein a mask (visual noise) is dis- played, then a brie y presented lower-case prime (for 40 ms or less), then a mask, and then an upper-case target word [Eve81]. Such brief prime exposures lead to orthographic priming, but not semantic priming. Thus such experiments are ideal for investigating the nature of orthographic encoding. In the description of such experiments, the following notation is used for describing the relationship of the prime to the target. A target of length n is represented by 123...n where 1 denotes the rst letter, 2 the second letter, etc., and each letter is unique. The prime is speci ed in terms of these numbers, with \d"representing a letter not in the target. For example, the prime \rqgdzn" for the target GARDEN is denoted 3d14d6. This means that the rst letter of the prime is the third letter of the target, the second letter of the prime is not in the target, etc. Humphreys, Evett and Quinlan carried out an extensive series of masked form- priming experiments where the task was perceptual identi cation [Hum90]. The target word was brie y presented (for approximately 40 ms), and performance was measured in terms of accuracy in identifying the word, where responses were typed. In Experiment 1, absolute-position e ects were investigated. All targets and primes were four letters. Facilitation was measured with respect to a dddd prime. Primes with 1, 2, and 3 matching letters in di ering positions were used. The signi cant e ects are given in Table 4.1. In summary, when 1 letter matched, priming was only observed when the match occurred in the rst position. When 2 letters matched, priming was strongest when they were the rst and fourth 20 Prime Fac (% points) 1ddd 8 d2dd 0 dd3d 0 ddd4 0 12dd 6 d23d 6 dd34 6 1dd4 15 123d 20 1d34 20 12d4 20 123d 20 Table 4.1: Results from Exp. 1a - 1c of [Hum90]. Each group of rows represents a sub-experiment. Fac =(accuracy for prime accuracy for control), where 0 denotes no signi cant facilitation. Stimuli with the same facilitation were not statistically di erent from each other; the given value re ects their average. letters; matches in other positions gave reduced, equivalent levels of priming. When 3 letters matched, priming was independent of position. In experiment 2, the e ects of scrambled letters were investigated. Primes in which order was completely violated (e.g., 3142) produced no facilitation. Primes of the form 1324 and 1dd4 produced equivalent levels of facilitation. An analysis of the errors to the dddd trials from Experiments 1 and 2 showed that letters in positions 1 and 2 of the target were more likely to be correctly 21 retained than letters in positions 3 and 4. Thus, while there was an external letter advantage for primes matching on two letters (in Experiment 1), this pattern was not replicated in the error data, where the nal letter had no advantage. Experiments 4 through 6 employed primes and targets of di ering length, in order to investigate the e ect of maintaining letter order, but not absolute posi- tion. For example, a prime of the form 1245 includes the fourth and fth letters in the correct order, but in the incorrect positions. The results are displayed in Table 4.2. In summary, priming was greatest when the rst and nal letters remained in those positions and order was maintained among the internal letters. Primes matching on two contiguous letters gave equivalent levels of priming, while a prime matching on two non-contiguous internal letters did not produce priming. Letters did not have to match on absolute position in order for priming to occur. These results are considered evidence for a relative-position encoding, where the rst and last letters are encoded as such, and the order of the internal letters is represented. Peressotti and Grainger [Per98] investigated the properties of relative-position priming further. They used the lexical-decision task, wherein the subject speci es whether or not the target string is a word. Priming was measured in terms of decreased reaction times (RTs). This task is now generally preferred to the perceptual task because it is taken to isolate processing at the lexical level. That is, in the perceptual task, priming may occur at the letter level because letters are di cult to perceive due to the short exposure duration. This is not a factor in lexical decision, where target exposure duration is on the order of 200 ms. Thus any e ects that do occur are more likely to be at a higher level. For six-letter (word) targets, they found that a prime of the form 1346 speeded RTs as compared to a dddd control prime, whereas primes of the form 1436 and 22 Prime Fac (% points) 1245 14 1425 7 1dd5 7 d24d 0 12dd 11 d23d 11 d34d 11 dd45 11 1ddd5 9 d1d5d 0 Table 4.2: Results from experiments 4 through 6 from [Hum90]. Each group of rows represents the results from a single experiment. Fac=(accuracy for prime accuracy for control), where 0 denotes no signi cant facilitation. Stimuli with the same facilitation were not statistically di erent from each other; the given value re ects their average. 23 6341 did not yield facilitation. Thus unlike the results from Experiment 4 of [Hum90], where 1425 yielded some facilitation for ve-letter words, 1436 did not yield facilitation for six-letter words. This may be a result of using di erent tasks, or may re ect the larger percentage of retained letters for ve-letter targets. To test whether maintaining absolute position yields any advantage, they compared primes of the form 1346 with primes which included the\-"character in positions 2 and 5 (i.e., 1-34-6). There was no di erence in the amount of facilitation provided by these two types of primes. Thus priming only occurred when relative-position was respected, and absolute-position information did not increase the facilitation. In further investigations, Granier and Grainger explored positional e ects in longer targets (seven- and nine-letter words) [Gra04a]. Primes consisting of the initial or nal four or ve letters of the target all produced facilitation with respect to dddd or ddddd primes. Across ve experiments, a small numerical advantage for initial primes over nal primes always occurred (ranging from 3 ms to 8 ms), but this di erence was not statistically signi cant. They also performed a series of experiments with ve-letter primes and seven-letter targets in which primes matched on the rst and last letters and the positions of the missing letters was varied. Those primes having no more than one positional gap within the three central letters (3,4, and 5) induced priming; those primes which included more than one such gap did not. For example, 12457 (gap at position 3) and 13467 (gap at 5) produced facilitation, while 12467 (gaps at 3 and 5), and 12367 (gaps at 4 and 5) did not. Thus the proximity of the internal letters to each other seems to be important. This is line with the nding that d23d and d34d primed ve-letter words, while d24d did not (Experiment 4 of [Hum90]). In other experiments, the e ects of transposing letters were investigated. For ve-letter targets, primes of the form 12435 produced facilitation with respect 24 to 12dd5 primes [Per03]. This is in contrast to the results of [Hum90], where 1324 and 1dd4 were equivalent for four-letter targets, and 1425 and 1dd5 were equivalent for ve-letter targets. However, primes of the form 12354 did not produce priming, indicating a special status for the nal letter (and presumably the initial letter, but 21345 was not actually tested) [Per03]. Transposition of non-contiguous letters can also produce priming [Per04]. For six-letter targets, 125436 provided facilitation, while 12d4d6 did not. However, this result only held when 3 and 5 were both consonants. It is not clear if this speci city for consonants is due to a qualitative di erence in processing consonants and vowels, or to statistical di erences stemming from the fact that there are only 6 possible vowels. Overall, these results show that it is unlikely that the brain uses a prelex- ical encoding based on absolute position. Rather, the encoding represents the relationships between letters. There does seem to be some positional in uence, with greater priming when external letters are matched (as compared to internal letters) and an advantage for the rst letter over the nal letter when only one letter matches the target. In contrast to the priming data, the error data shows an retention advantage for the rst and second letters over the third and fourth letters. 4.1.2 Positional Patterns Another potential source of information about how letter-position is encoded is the error pattern in a perceptual task. The probability of retaining a target letter in the response may be related to how the position of that letter is encoded. We have already seen that this probability was higher for the rst and second letters than for the third and fourth letters in Experiments 1 and 2 of [Hum90]. 25 In an experiment where words were presented very brie y (33 ms) without a prime, retention probability decreased monotonically across the string for ve- and six-letter words [Mon98]. For longer words, there was an advantage for the nal letter. Thus for four- to six- letter words, retention probability decreases across the string, showing no advantage for the nal letter. This pattern has also been observed in aphasic patients su ering from acquired dyslexia. An analysis of their reading errors showed that retention probability decreased with increasing letter position [Whi99]. This pattern was robust under a number of di erent scoring measures, and did not obtain when the response and target were aligned at the nal letter and scored from right to left. Because this pattern is similar to normals? performance under very brief presentations, it likely re ects some aspect of normal processing, rather than being a result of altered processing due to brain damage. 4.1.3 Seriality A key question is whether lexical access proceeds serially (letter by letter) or in parallel (all letters activating the word level at the same time). It has generally been assumed that this issue can be decided via the presence or absence of a length e ect. That is, if RTs were to increase with word length, this would indicate serial access; if RTs were independent of word length, this would indicate parallel access. Before discussing the experimental results, I wish to point out that this assumption is not necessarily warranted. Length may contribute multiple, even opposing, in uences to RTs. For example, serial access could yield constant RTs with word length if the increased time that it takes for the nal letter to re for longer words is canceled out by decreased settling time at the word level. That is, for longer words, it may take less time for the lexical network to settle (following 26 activation by the nal letter) than for shorter words, possibly due to an increased amount of bottom-up input from more letters. Conversely, if a length e ect were observed, it could be a result of parallel access in conjunction with some other factor. For example, the reduced acuity of the outer letters in longer words could lead to increased RTs despite parallel access. Thus, the presence or absence of a length e ect cannot de nitively inform us as to whether lexical access proceeds letter-by-letter or in parallel. For centrally presented words of three to six letters, it was found that string length has no e ect lexical-decision RTs [Fre76]. This nding, in conjunction with the popularity of parallel-processing models (e.g. the Interactive Activation Model [McC81]), has led to the general assumption that lexical access proceeds in parallel. However, a recent study has yielded a more complicated picture. New et al. [New04] undertook an investigation of the length e ect based on the En- glish Lexicon Project [Bal94], which is a on-line database of lexical-decision RTs for over 40,000 words. Once the e ects of frequency, number of syllables and orthographic-neighborhood size [Col77] were factored out, they found that RTs actually decrease with increasing string length for words of three to ve letters1, are constant with string length for words of ve to eight letters, and increase with string length for words of eight or more letters. Thus string length has 1It is likely that the reason that this facilitatory e ect of word length has not been previously observed is that the e ect of orthographic-neighborhood size (N) was not factored out. N is the number of words that can be formed by changing one letter of the target to another letter [Col77]. High N is actually facilitatory [And97, New04] for words in lexical decision. Because longer words generally have lower N values than shorter words, the lack of N facilitation longer words may have masked the facilitatory e ect of more letters. The N e ect is discussed in more detail in Chapter 9. 27 di ering e ects over di erent lengths. It is highly unlikely that these di ering e ects re ect di erences in the method of lexical access. Rather, these results most likely indicate that the e ect of length is the sum of opposing forces, where the relative strength of the opposing forces varies with length. For example, in- creased bottom-up input (from more letters) may contribute a facilitatory e ect, which reaches a ceiling level beyond a certain word length. Serial access and/or decreased acuity may contribute an inhibitory e ect, which dominates at longer lengths. The results of an EEG study [Hau04] are also consistent with the notion that there are di ering components in the e ect of length. In occipital regions, longer words gave increased amplitudes (as compared to shorter words) during the 80- 125 ms period. After about 150 ms, this pattern reversed, with shorter words giving larger amplitudes than longer words. E ects of word frequency were seen at about 150 ms, indicating that lexical access had begun by that point. This suggests that there may be di erential e ects of string length for prelexical versus lexical processing. The in uence of length can also be varied by presenting stimuli in non- canonical formats. An inhibitory length e ect occurs when the letters of a word are not horizontally aligned, or when they are presented in MiXeD cAsE [Lav02c]. A length e ect can also be induced by rotating the stimuli. This phenomenon was examined in a Hebrew lexical-decision experiment where two- to ve-letter strings were centrally presented and rotated as a whole in increments of 20 , from 0 (normal presentation) to 180 (upside-down) [Kor85]. For angles of 60 or less, RTs did not increase with string length. For 80 , RTs were similar for two- to four-letter words, but increased for ve-letter words. For 100 , two- and three- letter words had similar RTs, with increasing RTs for four- and ve-letter words. 28 For angles of 120 to 180 , RT varied approximately linearly with word length, with each additional letter adding about 200 ms. The non-word data showed a similar pattern with rotation angle, but with larger length e ects. Due to the size of the per-letter increment at the larger angles, it is likely that this increment does actually re ect serial processing. However, the data cannot be explained by supposing that processing switches from parallel to serial at some rotation angle, due to the intermediate region (80 and 100 ) where RTs are neither constant nor linear with rotation angle. Note that it cannot be supposed that a such a switch occurs at di ering angles for di ering lengths. If that were the case, the RT for an n-letter word should either be close to that of the smaller angles or the larger angles, but not in between2. In fact, the authors state, \it is di cult to propose an interpretation of the results in terms of one unitary principle" (p. 504). For canonical presentation conditions, the best way to investigate the issue of seriality is to use time directly. Harcum and Nice [Har75] used this approach in a clever experiment in which pairs of eight-letter compound words were very brie y sequentially presented. The pairs were selected to allow meaningful blends. For example, the words headache and backrest could be recombined to give headrest or backache. When xating on the center of the string, subjects tended to report the rst half of the rst word, and the second half of the second word (e.g. for headache then backrest, headrest was reported). This result unambiguously shows sequential readout. The rst half of the rst word was processed rst. By the 2This assumes a unimodal distribution of RTs; if RTs were bimodally distributed between these two extremes, their average would fall in between the two values. Although the authors do not explicitly state that RTs were unimodal for 80 and 100 , the nature of their discussion implies that they were. 29 time that the second half of the stimulus was reached, the stimulus had changed and the second half of the second word was processed. They also included trials where xation fell within the rst half or the second half of the stimulus. For xation within the second half, the same response pattern was observed as for central presentation. However, for xation within the rst half, the pattern reversed (e.g., backache tended to be reported instead of headrest). The authors took these results as evidence for left-to-right processing for central xation, and peripheral-to-central processing for non-central xation. However, there is a more parsimonious explanation, based entirely on left-to- right processing. As we discuss in more detail in section 8.1, xation within the rst half of a word provides the Optimal Viewing Position (OVP) and the fastest processing, as compared to other xation locations [Ore84]. When xation was at the OVP, there may have been time to process the rst word in its entirety. Then the second word would have been processed starting at the beginning, overwriting the representation of the rst word. The second word was presented more quickly than the rst, so there may only have been enough time to process the its rst half. Therefore, the response was comprised of the rst half of the second word, and the second half of the rst word. Thus RT patterns cannot faithfully inform us whether lexical access occurs serially or in parallel. In contrast, the Harcum and Nice experiment provides direct evidence of serial processing. 4.1.4 Summary These studies indicate that the highest prelexical representation encodes rela- tionships between letters, rather than the absolute position of individual letters. Priming showed an advantage for external letters, but error patterns showed 30 monotonically decreasing letter retention from left to right (for strings of six or fewer letters). By using a stimulus that varied over time, it was directly shown that read out of the visual image occurred in a left-to-right manner for central presentation. Con icting results for non-central xation can be explained by faster processing at the OVP, allowing the spatial extent of the stimulus to be processed 1 1/2 times. In contrast, length e ects are not suitable for diagnosing serial versus parallel processing, as the e ect of length varies with length, while it is unlikely that the type of processing varies with length. 4.2 Letter-Level Experiments Next I consider results from experiments which involved letter identi cation in brie y presented strings that were not orthographically legal. Such experiments should re ect bottom-up processing to the letter level in the absence of top-down lexical and phonological in uences. Although some have argued that patterns evoked by processing of non-word strings are not relevant to word recognition (Grainger, pers. comm.), I argue in the following review that observed patterns must be a result of processing speci c to visual word recognition. First I focus on studies in which the target string was xated in the center, and move to studies in which the target appeared in a single visual eld. I will use the following notation to specify retinal location. A location is speci ed in units of letter widths (with xation at 0), where the LVF has negative values. The absolute value of a location gives the eccentricity (distance from xation). A string?s location will be given by the locations of the rst letter and the last letter, separated by a double colon. For example, if xation falls on the nal letter of 31 a ve-letter string, the string is at -4::0. If xation falls on the rst letter, the string is at 0::4. 4.2.1 Fixation at String Center In a priming study [Per95], Peressotti and Grainger investigated whether letter units are position-speci c or position-independent by testing whether there was priming for the same letter across di erent string positions. Subjects were asked to perform an alphabetic decision task, where they determined whether or not strings of three characters consisted solely of letters (e.g., \TBR" versus \TB$"). Critical primes were also trigrams, consisting of characters from the target string, either in the same order (same-position primes) or in a di erent order where no letter stayed in the same position (scrambled primes). The dependent variable was RT. In order to assure that facilitation did not result from physical overlap, prime strings and test strings where presented in fonts of di erent sizes. To examine temporal e ects, prime exposure duration was varied. The results varied with this duration. For durations of 33 ms, scrambled primes yielded no facilitation, while same-position primes sped RTs by 22 ms. For exposures of 50 ms and 67 ms, cross-position priming did occur, with facilitations of 9 ms and 14 ms, respectively, for the scrambled primes, while the level of same-position priming stayed roughly the same. Thus priming was observed for the scrambled primes at the longer durations. This is in contrast to word targets, where no priming is observed for completely scrambled primes. Thus it seems that priming can occur at the letter level when relative position is not respected, but not at the word level. Because priming occurred at the shortest duration for same-position but not scrambled primes, the authors took this as evidence for activation of position-speci c letter detec- 32 tors followed by position-independent letter units. However, the assumption of position-speci c letter units may not be warranted. I have shown that these position-speci c results could be accounted for by location-speci c units with overlapping receptive elds [Whi01b]. Alternatively, the results might re ect an advantage for maintaining the relative, not absolute, position of the letters. In contrast, the cross-position priming results provide strong evidence for position- independent letter units; it is di cult to see how such priming could occur in the absence of such units. Next I consider error patterns in tasks where letters in brie y presented strings are to be identi ed. Studies have generally shown that letter perceptibility de- creases as string position increases, with the exception of the nal letter (and possibly the xated letter) [Wol74, Lef78, Ham82, Mas82]. The observed nal-letter advantage is often taken as arising from reduced lateral inhibition at a low level of processing. That is, because the nal letter is not masked by a letter to its left, it is more easily perceived. However, this account is in con ict with data from studies of non-letter symbols [Lef78, Mas82]. When a string of such symbols is xated at its center, perceptibility is lowest at the rst and last symbols, as would be expected from the e ect of acuity. Thus, there is no advantage from reduced lateral masking. Therefore, reduced lateral masking cannot account for the nal-letter advantage. Strings of numbers also show an advantage for the external numbers [Mas82]. Thus, characters that commonly occur in strings display an external character advantage, but other symbols do not. This indicates that the advantage arises from the way that strings are processed and encoded. However, this pattern di ers from the error patterns observed for degraded word identi cation (from aphasics and very brief presentations to normals), where 33 there is no nal-letter advantage [Hum90, Mon98, Whi99]. One potential source of this di erence is exposure duration. The letter identi cation tasks have used durations of 80 ms or more, while the word tasks for normals have employed du- rations of around 40 ms. (This di erence re ects the fact that letter identi cation in non-words is more di cult than word recognition.) It may be the case that a nal letter advantage only emerges at the longer durations. Alternatively, it may be the case that the nature of the stimuli themselves (words versus letters) underlies the di erence. Letter-level experiments have also given evidence for serial, left-to-right pro- cessing. Using the same paradigm as [Har75] (discussed in section 4.1.3), Nice and Harcum [Nic76] performed a letter-based experiment, where two six-letter strings were very brie y sequentially presented. Subjects tended to report the rst letter of the rst string, and the second to sixth letters of the second string. (The position of this cross-over point varied with subject.) This provides un- equivocal evidence of serial processing; there was only time time to process the initial letters of the rst string before the stimulus changed to the second string. 4.2.2 Non-central Fixation within a String For unilateral presentation, it is well known that there is an RVF advantage in visual word recognition. Thus, the advantage for early string positions (those falling in the LVF) over the later string positions (those falling in the RVF) under central xation contrasts with the generally poorer performance observed for words presented entirely within the LVF. This pattern is also in evidence at the letter level. One study used strings consisting of a target letter embedded in a string of X?s, where the task was to identify the non-X letter [Naz04a]. For ve-letter strings at -4::0 versus 0::4, there was an RVF advantage. However 34 a nine-letter string at -4::4 produced a LVF advantage; the letters were in the same locations as in the two ve-letter conditions, yet the VF advantage reversed. This suggests that the rst half of the string a ects the second half under central presentation, perhaps re ecting integration of the two halves. If so, there should be an RVF advantage for central presentation (-4::4) in a right-to-left language. A similar experiment in Hebrew con rmed this [Naz04a]. Also in contrast to English, performance did not di er for the two locations of the ve-letter strings. The task used in these experiments did not require encoding of letter position; the single target letter could pop out from among the background letters. Despite this, di erences did emerge across languages, which therefore probably re ect highly automatic processing. This processing must be related to word recognition because it is sensitive to reading direction. Stevens and Grainger performed an experiment that used the target-letter-in- X?s task for ve- and seven-letter strings, where xation location was systemati- cally varied across all string positions [Ste03]. Although the average recognition probability across the string was symmetric with respect to visual eld, the Po- sition x Location curves showed an asymmetry. An external letter in the LVF (which was necessarily an initial letter) was better perceived than an external letter in the RVF (which was necessarily an nal letter). Internal letters were better perceived at -1 and -2 than at 1 and 2, respectively. Thus like [Naz04a], there was an LVF advantage when the string straddled both visual elds. An earlier study investigated the interaction of retinal location and string position for a wider range of locations and positions [Wol74]. The stimuli were nine-letter consonant strings presented for 200 ms, and the task was to report as many letters as possible in a left-to-right report order. The location of the rst letter of the string was systematically varied from -12 to 5. This yielded separate 35 retinal location curves for all string positions, and separate string position curves for retinal locations -4 to 4. An analysis of the data showed a signi cant interaction of string position with visual eld. That is, for a given string position and distance from xation, the result varied with visual eld. These experimental data are displayed in Figure 4.1, in terms of perceptibility at a given retinal location as position varies. To summarize, in the LVF, perceptibility initially drops o quickly with increasing string position, and then levels o . However, in the RVF, perceptibility is more slowly and smoothly decreasing. For example, when a letter at -3 is in the rst position, accuracy is 100%, but accuracy is only 35% when that letter is in the third position. Perceptibility decreases to 20% for position 4, but stays roughly constant as position increases from 4 to 7. In contrast, at the analogous location in the RVF/LH (3), perceptibility drops from 95% for position 1 to 55% for position 3; a smaller drop than in the LVF/RH. Perceptibility drops to 30% for position 4, and continues to decrease to 5% for position 7 (rather than stabilizing as in the LVF/RH). Thus the e ect of increasing the number of letters to the left of a given eccentricity varies with VF. 4.2.3 Unilateral Presentation This positional interaction with visual eld is also in evidence for short strings presented within a single VF. In one study, four-letter strings were embedded in a masking array [Est76]. The array was comprised of 9 $?s, an x, and 9 more $?s, and was presented so the x appeared at xation. A consonant string replaced the $?s in one of four locations: -8::-5, -5::-2, 2::5, or 5::8. Thus a letter at -5 or 5 could be either an initial or nal letter. Exposure duration was either 150 ms or 2400 ms. In the longer duration, eye position was monitored, and the trial was 36 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= -2 R= -3 R= -4 R= -5 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= +2 R= +3 R= +4 R= +5 Figure 4.1: Results from [Wol74], with LVF/RH on left and RVF/LH on right. Each line represents a xed retinal location. As string position is increased (i.e., more letters occur to the left), performance decreases. The pattern of decrease varies with visual eld. terminated if xation strayed more than 0:33 from the central xation marker. For both durations, the data displayed similar patterns (see Figure 4.2). For -5::-2 and 2::5, performance on the external letters was better than the internal letters. This could not have been a result of a lack of lateral masking because the external letters were always surrounded by $?s. For -8::-5, accuracy decreased with string position. For 5::8, accuracy was at across the rst three positions, and rose for the nal position. Thus, for the larger eccentricity, the letters farthest from xation were the best perceived in both VFs. This striking pattern has also been observed in studies in which strings were not presented within a masking array [Bou73, Leg01]. This asymmetry between position and VF was also apparent within a single location. At -5, accuracy was much higher when it was an initial letter. At 5, accuracy was much higher when it was a nal letter. There seems to be a general advantage for initial letters in the LVF and nal letters in the RVF. These patterns 37 0 20 40 60 80 100 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Percent correct Retinal Location Figure 4.2: Results from [Est76], for the 2400 ms exposure duration. must result from how a letter string is encoded; they are considerably di erent from what would be expected on the basis of acuity. It cannot be the case that this occurs simply because initial letters fall in the LVF and nal letters in the RVF for xated words, because short words are often processed without being directly xated, in which case the rst letter falls in the RVF. However, Jordan and colleagues have claimed that there is no such asymme- try [Jor03]. Using stringent xation control, they got the same deep, U-shaped patterns (i.e., better accuracy at the external letters) for various locations in both visual elds. Without xation control, they got a pattern in line with the above experiment: decreasing accuracy with increasing position in the LVF, and shallow U-shaped patterns in the RVF. Therefore, they claim that any observed asymmetry is an artifact of erroneous xations falling outside the required xa- tion point. However, this does not explain the observed asymmetry. If there is no asymmetry, the same pattern should emerge no matter where xation falls. Note 38 that the above study [Est76] controlled xation, and yet yielded strong positional asymmetry with VF even when the subjects had 2.4 seconds to examine the stim- ulus. However, the tolerance in the Jordan experiments was much smaller. In order for the stimulus to be initially displayed, xation could not deviate from center by more than 0:125 for 1 second. Movements this small are on the order of microsaccades, which occur re exively at intervals of less than a second in order to keep an object stabilized on the retinas. As Nazir [Naz03] points out, maintaining xation under this constraint is a demanding task, requiring focused attention prior to presentation of the stimuli. Therefore, performance of this additional xation task is likely the source of the di ering results, rather than uncontrolled xation errors. In other studies, consonant-vowel-consonant trigrams were presented verti- cally [Hel95, Hle97]. For LVF presentation, subjects made many more errors involving the last letter than the rst letter of the string. For RVF presentation, this nding was greatly attenuated: there were more errors on the rst letter, and fewer errors on the last letter (relative to LVF presentation), resulting in a more even distribution of errors across the string. Correct recognition of the entire string was better in the RVF/LH than the LVF/RH. These patterns were taken to be additional evidence of parallel processing of strings by specialized linguistic modules in the LH, and less e cient, serial processing in the RH. However, a counterintuitive result arose when the input was directed to both hemispheres simultaneously. Under bilateral presentation, the error pattern is more similar to the LVF/RH than the RVF/LH pattern [Hle97]. Thus, even though the LH was more e ective than the RH at performing the task, the RH?s mode of processing (i.e., error pattern) dominated when the stimuli were pre- sented to both hemispheres simultaneously. Under a dual processing modes ac- 39 count, it is unclear why this should be the case. Similar experiments in languages read in other directions have also cast doubt on a dual modes account of these data. For Hebrew readers, the pattern re- versed [Evi99]. Final letter errors were more likely in the RVF/LH than the LVF/RH, and the bilateral pattern was the same as the RVF/LH. A study of Japanese kana, for which the vertical orientation is normal, showed no di erences between LVF/RH, RVF/LH, and bilateral presentation patterns [Hell99]. Thus, the patterns vary with reading direction, indicating that they are not a result of hemispheric dominance. 4.2.4 Summary Priming experiments have shown that facilitation at the letter level can occur across string positions. By using a temporally varying stimulus, it was demon- strated that letters are processed from left to right (in English speakers). We have seen the letter perceptibility varies in a way that is contrary to acuity. Under central xation, there is a external letter advantage, where per- ceptibility decreases from left to right, and rises for the nal letter. (This rise at the nal latter contrasts with the lack of a nal-letter advantage observed in perceptual word-recognition tasks.) Under LVF presentation, the initial letter is perceived the best. Under RVF presentation, the nal letter is perceived the best. Letter perceptibility patterns are also sensitive to reading direction. These factors indicate that letter tasks are in uenced by processing speci c to visual word recognition. Thus a comprehensive model of letter-position encoding should explain these e ects. 40 Chapter 5 Models of LPE In this chapter, I rst summarize the desired properties of an LPE model, as constrained by the neurobiological and behavioral data reviewed in the previous two chapters. I then give a brief overview of how arti cial neural networks are usually implemented. In the next section, I present other researchers? models, which are all implemented models. These models are evaluated with respect to how well they ful ll the desired properties. 5.1 Desiderata for an LPE Model Neurobiological plausibility. The lowest level of the model should repre- sent V1, while the highest level should represent the lexical level. The lowest level should be characterized by a retinotopic mapping, where ac- tivation levels correspond to visual acuity. At the lexical level, the word representation corresponding to the input should become the most active. Transformations within and between levels should be involve functionality that could be carried out by real groups of neurons. Lowest level is split across hemispheres. Available evidence indicates that the fovea is not bilaterally represented. Therefore the lowest level of the 41 model should incorporate a split representation of the input, and the model should integrate this information into a uni ed LPE. Convert a retinotopic encoding to location-invariant encoding. The model should solve the problem of how a location-speci c representation at the lowest level is transformed into a location-invariant LPE. Incorporate position-independent letter units. The observed cross-position priming for non-word strings indicates that letter units exist which can encode a letter in any position. This is consistent with the behavior of letter-position dyslexics, who get the identities of letters correct, but not their positions. The model should explain how a positional information is dynamically bound to such units. Explain basic positional patterns. Under central presentation, letter percep- tibility generally decreases from the beginning of the string to the end of the string, but rises for the nal letter. The model should explain why this pattern di ers from the acuity pattern, and why the nal-letter advantage is not evident for degraded word recognition. Explain relative-position and transposition priming. The highest prelexical representation should be consistent with the fact that word-level priming only occurs when the target?s letter order is preserved for the most part in the prime. This priming does not depend on absolute string position. Explain evidence for serial processing. Tasks in which two strings are brie y sequentially presented show that the beginning of the rst string and the end of the second string are processed. This shows a serial read out of the visual image. 42 Explain visual eld di erences. Positional patterns vary with VF, reading direction, and hemispheric dominance. In particular, there is an initial- letter primacy in the LVF, and a nal-letter primacy in the RVF. 5.2 Review of Modeling Basics A neuron (or group of neurons) is modeled as a node which receives activation on incoming connections from other nodes, and sends activation on its outgoing connections to other nodes. Activation is integer-valued or real-valued. Nodes are usually grouped into layers. Each node in a layer has the same pattern of connections to nodes within that layer (lateral connections), and nodes in other layers. Activations at the lowest layer of the model (input layer) are set by the mod- eler. Otherwise, the activation is a (often non-linear) function of the input reach- ing the node on its incoming connections. Associated with each connection is a weight, which models the e cacy of the synaptic transmission between the two nodes. The amount of input arriving along a connection is usually taken as the product of the activation of the sending node and the connection weight. Thus a lower connection weight allows less transfer of activation. The input to a node is the sum of these individual inputs. Weights can be set by the modeler, or developed via a learning algorithm. A node only carries out local computa- tions, limited to functions of its internal state and the incoming activations. Such functions operate on numbers, not abstract symbols. See Figure 5.1. The connection weights into a node can be considered a vector; the activations of sending nodes can also be considered a vector. The input to a node is then the dot-product of the weight vector and the activation vector. For activation vectors 43 .4 .8 .3 input = .1*.4 + .2 *.8 + .1*.3 = .23f(.23) .1 .2 .1 Figure 5.1: Basic components of an implemented model. Each node has an activation value (shown in the center of the node). At the lowest level of the model, activation values are clamped to particular values. Each connection has an associated weight. The input to a node is the dot-product of the activation vector and the weight vector. The activation of a node is a function of this input. A node sends its activation along outgoing connections. 44 of xed Euclidean length, the dot-product is maximized when the activation vector is parallel to the weight vector (i.e., the angle between the two vectors is 0). Thus activation level generally re ects how closely the incoming activation vector is aligned with the node?s weight vector. Activation level can model di erent aspects of neural function. One possi- bility is for the activation value to represent membrane potential. In this cause, outgoing connections carry information about individual neural spikes (or sets of spikes if the node represents a set of neurons with the same dynamics). This re- quires simulation at millisecond time scale. The alternative is that the activation value represents ring rate, or the total number of spikes over some time period. This is a more abstract level of modeling. There are three basic types of learning algorithms used to modify connection weights: supervised, reinforcement, and unsupervised learning. In all cases, the input units are clamped, and activation ows through network to the output layer. In supervised learning, the activations on the output layer are compared to the target activations desired by the modeler (for that particular input). The di erences between the actual and target activations (the errors) are used to modify connection weights so that the errors decrease in magnitude. The most well-known algorithm of this type is back-propagation. In this way, the network learns to perform a particular task (i.e. an input-output mapping). In reinforcement learning, the modeler provides feedback that is limited to a reward signal; this is essentially trial-and-error learning. Thus there is some randomness associated with the activation functions. When an output pattern receives a positive reward, connection weights are modi ed to increase the prob- ability of repeating that output pattern. When a negative reward is generated, connection weights are changed to decrease the probability of repeating that out- 45 put pattern, and to increase the probability of creating a di erent output pattern (which could potentially generate a good response). In unsupervised learning, there is no feedback from the modeler. Rather, the goal is for the network to self-organize to re ect regularities in the input. One example of this approach is Hebbian learning, where connection weights are strengthened between two nodes that are both activated at the same time (or at successive time steps). 5.3 Models of LPE 5.3.1 Interactive Activation Model The most well-known model of visual word recognition, the Interactive Acti- vation (IA) model [McC81], used a position-speci c encoding. That is, there was a separate node representing each letter in each position. Letter nodes connected forward to nodes representing individual words, and these word nodes connected back to letter nodes. Connection weights were binary, with a weight of 1 if the letter was in that position in the word, and 0 otherwise. The model was based on four-letter words, so di ering lengths were not an issue. The primary goal of this model was to illustrate the e ects of top-down activation from the word level back to the letter level. The position-speci c encoding was likely implemented as an expediency, rather than as an instantiation of a theoretical model of LPE. As we have seen, such a position-speci c encoding is not compatible with behavioral studies. 46 5.3.2 Print-to-Sound Models Trained by Back-Propagation. In a model of learning to read aloud [Sei89], orthographic input was repre- sented by letter trigrams. (An example of a trigram encoding is given in section 2.1). Orthographic units were fully interconnected with a hidden layer, which was fully interconnected with the output layer representing phonemes. The con- nection weights were learned via the back propagation algorithm. The goal of the modeling was to show that a single network could learn to pronounce both regular and exception words. (An exception word is a word that does not follow the usual rules of pronunciation, such as pint.) However, this model produced poor generalization on reading pseudowords (pronounceable letter strings that are not words). This was likely due the choice of input representation; represen- tations of the same letter in di erent contexts bore no relationship to each other, making generalization di cult. This is known as the dispersion problem. Chang- ing the input encoding to a slot-based one which encoded letters and graphemes in onset, vowel and coda positions allowed better performance on pseudoword reading [Plsa93]. Again, the focus of these models was not LPE, and the choice of orthographic encoding was merely a means to an end. Nevertheless, it is instructive to evaluate trigrams as a potential prelexical encoding. They do not rely on absolute-position encoding, consistent with ex- perimental evidence. However, they do not o er su cient exibility to account for the relative-position results. Consider the prime 1346 and the target word 123456. The only trigrams it shares with the target are **1 and 6**, so there is no basis for an in uence of the order of the interior letters. The di culties in achieving good pseudoword performance using trigrams points to an important issue. LPE not only serves a direct, orthographic route 47 to lexical semantics, it also provides input to a process which learns to map or- thography to phonology. These results demonstrate that context units, such as trigrams, make that task di cult due to the dispersion problem. This suggests that an encoding based on position-independent letter units should exist at some level of processing, in order to provide dispersion-free input to the phonologi- cal system. This is consistent with the above experimental evidence for such representational units [Per95, Fri01]. However, such an encoding cannot readily explain the relative-position priming data [Hum90, Per99, Per03], suggesting that position-independent letter units may not directly contact the lexical level. 5.3.3 BLIRNET Unlike the preceding models, BLIRNET [Moz91] speci cally focused on the issue of how a retinotopic representation of a letter string could be transformed into a location-invariant LPE. At layer 1, the retina was represented as a 6 x 36 array of nodes, with each location comprised of 5 nodes which represented di erent features (corresponding to four di erent line orientations, and a node encoding whether the line terminated in that location). Each letter was repre- sented by these features over a 3 x 3 region. Above this layer were 5 more layers in which the size of the array progressively decreased, but the number of features at each location increased. For example, at layer 2, the array was 3 x 12, with 45 di erent types of feature nodes at each location. Each node in layer 2 received inputs in a topographic manner from a 4 x 6 region of layer 1. By layer 6, the array was reduced to 1 x 1, with 720 di erent features; thus, this layer did not encode any information about stimulus location. The features detected by these higher layers were hard-coded in the connection weights. Weights were equal to each other on all connections joining a given 48 feature type at layer n 1 to a given feature type at layer n. This provided a degree of location invariance. The connection weight between two feature types was randomly chosen (under some distributional constraints). Thus the detected \features" did not correspond to psychologically motivated units. The resulting pattern in layer 6 was mapped to a trigram representation via supervised learning. Trigrams also included units of the form A BC, meaning that a single letter occurred between A and BC. Because the total number of possible trigrams is huge, the 540 most common were chosen. Thus there was a trigram layer of 540 units above layer 6, with initially random connection weights. The network was trained by encoding various words at various locations in layer 1. For each trial, the resulting trigram activation pattern was compared to the desired trigram activation pattern for that word, and the weights were modi ed to bring the actual activation pattern closer to the desired pattern. At the end of training, the trigram layer produced a noisy, somewhat location- invariant representation of the letters encoded on layer 1. This representation was cleaned up via lateral excitation between consistent trigrams (e.g., BON and B ND) and lateral inhibition between inconsistent trigrams (e.g., BON and OBN), while unrelated trigrams (e.g., BON and DER) did not a ect each other. The trigram layer was connected to a lexical layer with top-down connections back to the trigram layer. This feedback also helped to clean up the trigram represen- tations. This allowed translation-invariant recognition of words encoded in layer 1. Thus, at layer 6, there was su cient information to reliably extract trigram identities when additional consistency and lexical information were added. While this model addresses the di cult problem of location-invariance, there are some di culties with the proposed solution. The only way to represent letter position in this model is via a contextual encoding, since all spatial informa- 49 tion is factored out. Thus a model that achieves location invariance in this way fundamentally rules out a position-independent, letter-based representation. It could learn to detect individual letters, but could not represent their positions independent of their context in a general way. The model did include some tri- grams corresponding to positional letter detectors, of the forms **A, ** A, A **, and A** (encoding rst, second, next-to-last, and last letters). However, this representation cannot be extended to the interior positions. This is in con ict with the above arguments for the existence of a letter-based representation and for position-independent letter units. Furthermore, this location invariance is achieved via an unrealistic jump in receptive- eld size; the representation at the rst layer corresponds to features, while the representation at the second layer corresponds to adjacent letter pairs. Nevertheless, it is instructive to evaluate the proposed trigram encoding. It includes \ " and therefore o ers a more exible encoding than one that only includes contiguous trigrams. Under this scheme, 1346 shares trigrams *1 3, 1 34, 34 6, and 4 6* with the target 123456, while 1436 does not share these trigrams. This is consistent with relative-position priming results. However, the representation is still not exible enough. Consider the stimuli 12dd56 and 124356 for the target 123456. The two stimuli share an equal number of trigrams with the target, which is inconsistent with transposition-priming results. 5.3.4 A Split Fovea Model Trained by Back-Propagation The goal of this model is to explain visual eld di erences based on a split cor- tical representation of the fovea [Mon04]. The input was in the form of location- speci c letter units, with four locations in each \visual eld", corresponding to foveal vision. The left four units connected directly to one bank of hidden units 50 representing the RH, while the right four units connected directly to a di erent bank representing the LH. The two banks of hidden units were fully intercon- nected with each other. The hidden units connected to an output layer repre- senting a lexical encoding, which could be phonological or semantic depending on the simulation. The network was trained on four-letter words. The size of the input layer allowed a four-letter word to occur in ve di erent locations. The network was trained via back-propagation by representing each word at all ve possible locations on the input layer. Thus the network learned to produce a location-invariant representation at the output layer. Measurements of performance based on mean squared error showed similar patterns to some human phenomena, such as the location of the Optimal View- ing Position, positional e ects of grapheme-phoneme irregularities, and VF dif- ferences in semantic processing. These results are explained as arising from an interaction between positional letter statistics and the distribution of positions with a \visual eld". In English, there is more variation in letter identities at the beginning of words than at the ends of words. However, when contiguous letter pairs are considered, there is more variation at the ends of words than at the beginnings. Under the model?s input conditions, initial letters fell more fre- quently in the\LVF"and nal letters in the\RVF". For example, of the ve input locations, only one yields the rst letter in the \RVF". Thus the \RH" becomes tuned to the statistics of beginnings of words, while the \LH" becomes tuned to the ends of words. Because of the di ering information granularities within words, coarseness of representation varies with \hemisphere". This leads to the positional and VF e ects in the model. This model has the advantage that it deals with the issue of a split repre- sentation of the fovea. However, I suggest that the underlying assumptions are 51 unrealistic. The model?s results arise because the hidden-layer representation of a string varies with its location on the input layer. However, this means that there is no location-invariant prelexical representation. For example, there is no abstract encoding that the letters cart spells the word CART. Rather, the re- lationship between the letters cart and the word CART has to be relearned for every possible stimulus location.1 This is ine cient and contrary to all other models and theories of LPE, which assume that an abstract representation of letter order contacts the lexicon. This is also inconsistent with imaging evidence that processing becomes left-lateralized at a prelexical level [Coh00, Deh04]. Furthermore, there are also problems with the assumption underlying the positional frequencies. The relationship between letter position and visual elds arose from the use of symmetric visual elds in the model. However, to achieve normal speed in text reading, 4 LVF letters and 12 RVF letters must be visible [Ray75]. Thus, the visual elds are e ectively highly asymmetric in reading. Short words falling in the right parafovea are frequently processed without being directly xated, about 50% of the time for four-letter words [Ray76]. Thus it is not actually the case that the initial letters of a word are considerably more likely to fall in the LVF than the RVF. 5.3.5 SOLAR The SOLAR model was developed to illustrate how a word recognition module could self-organize to recognize strings of varying lengths [Dav99]. It is based on 1This is only strictly true when the output is not predictable from the input, such as for a semantic encoding. The model could potentially give the correct phonological response for a regular word at untrained location, based on generalization from other words presented at that location. 52 the SONNET model [Nig93]. The highest prelexical encoding employed position- independent letter units, where position was represented by an activation gra- dient. That is, a letter unit could represent that letter in any position, and its activation level encoded position. Activation decreased from left to right. For example, to represent the input CART, letter node C would have the highest activation, A the next highest, R the next and T the lowest. (Multiple instances of a letter were encoded by di erent instances of a letter node). This activation pattern was taken to arise from a serial read out of the visual image, i.e., letter node C is activated, then A, then R, then T. Earlier letters accumulate higher activation values because they re longer. Therefore, the nal letter of a string is taken to have a set minimum activation level, and activation increases as position decreases. The serial read out was taken to correspond to a covert attentional scan of the visual image. After learning, connection weights into a word node became equal to the let- ter activation pattern resulting from that word. However, a simple activation function based on the dot-product of the weight and activation vectors causes di culties. For example, the input car would activate CART more than CAR, since the activations of C, A, and R are higher for cart than for car, and these di erences are learned on the connection weights. One approach to solving this problem is to normalize the length of the activation vector to 1. Thus is accom- plished by squaring each activation, summing these squares, and dividing each activation by the square root of this sum. Thus the activations for C, A, and R for cart become smaller than for car because the activations for CART are divided by a larger quantity (because it includes the square of the activation for T). These di erences are re ected on the respective learned connection weights. Therefore the input car activates CAR more than CART. However, this approach 53 becomes less and less e ective for longer words, because the weights on large po- sitions become quite small, and they do not have much in uence. Therefore this solution is not robust in the presence of noise because di erences in activation values at the word level can be quite small. For example, taking the ratio between positional activations to be 1.2, an input of the form 1234 activates a word node encoding 12345 to 0.96 (versus 1.0 for word node 1234). Another problem is that this encoding of position is not robust, especially toward the ends of words where activations and weights are low. For example, the sequence 12354 would activate word node 12345 to 0.99. To solve these problems, another component, I; contributed to the bottom-up input in the SONNET and SOLAR models. I operates on normalized activation and weight vectors, as follows. For word node W; the ratio of each letter unit?s activation and its connection weight was taken. This ratio was capped at a maximum of 1.0 for each letter. Each ratio was multiplied by a constant K, and the result was added to 2 K. IW was set to the product of these values, giving a maximum possible value of 2L, where L is the number of letters having non- zero connection weights into W. IW was multiplied with the dot-product of the activation and weight vectors to comprise the bottom-up input to word node W. Thus this new component compared the weight and activation vectors directly to each other, producing a penalty at letters where the activation value was less than the connection weight. For example, taking K = 1:5, input of the form 1234 gives I = 16 for word node 1234, and I = 8 for word node 12345, since the ratio of 5?s activation to its weight is 0, giving a value of 0.5 for that connection. Thus the shorter word has an advantage. Input of the form 12345 gives I = 32 for word node 12345, while 12354 gives I = 28, which is 87.5% of the maximal value possible value of I, as opposed to 99% of the maximal possible value for 54 the dot-product. Thus misorderings are ampli ed. This model has the advantages that it is consistent with the evidence for serial read out and position-independent letter detectors. However, there are di culties concerning the way that letter nodes activate the lexical level. While the computation of I does indeed solve the above problems, it is not biologically plausible. It is unclear how an activation level and a weight could be directly compared. A weight re ects properties of synaptic transmission. It modulates the e ciency of the interaction between nodes. This value cannot be extracted from synapses and used in other calculations. Thus, I is a computational convenience and does not illuminate how ordering information is actually compared in the brain. Moreover, it does not give the correct results in some cases. Consider the inputs 1346 and 1436 for the word node 123456. For all positions in both of these stimuli, the letter activation is higher than the weight; thus all the ratios are maxed out at 1.0, giving I = 16 for both stimuli. Therefore, this measure is insensitive to ordering for a stimulus that is shorter than the target, contrary to experimental evidence. Another problem is the proposed activation pattern across the letters. The nal letter has the lowest activation level. This is inconsistent with the broad range of evidence for a nal letter advantage. While it is argued that increased performance could arise at the last letter due to a recency e ect from the se- rial activation of letters, this implies that activation level and performance are independent of each other. It is generally assumed that performance re ects ac- tivation level. That is, a recency e ect occurs because the activation of the nal letter remains higher than previous letters due to less decay. This is inconsistent with the assumption that the nal letter has the lowest activation level. 55 5.3.6 LEX In this model position-independent letter units connected to word units, which included a phonological encoding of the word [Kwa99a]. Letter order was rep- resented serially. The rst letter res and activates all words having that rst letter. The second letter res and further activates only those words matching on that letter in that position, and the more highly activated words inhibit those words that do not match. This process continues for each letter until there is only one active word remaining. After each letter res, the model generates a phonological output which is a combination of the pronunciations of all active words. The goal of the model was to demonstrate that this type of processing could account for naming and lexical-decision phenomena in the absence of learned grapheme-phoneme mapping rules. For the present purpose, the serial encoding of letter order is of the most in- terest. However, no details are given as to how the serial input is matched to the stored lexical encoding. That is, how does the nth letter only further activate those words matching that letter in the nth position, without explicitly encoding letter positions within a word node? It seems unlikely that this match was imple- mented in a biologically plausible manner (i.e. based on numerical computations on weighted connections). However, even if it were, these activation dynamics are problematic, because they are equivalent to a position-speci c encoding. That is, although the letter units are position-independent and dynamically represent position by ring order, they activate words in a position-speci c way. This is incompatible with relative-position priming. 56 5.4 Summary I conclude this section by reviewing how the requirements of an LPE model are satis ed (or not) by the above models. Neurobiological Plausibility: The LEX model did not specify the activation function for the word level. The SOLAR model used an unrealistic one in which activations and weights were directly compared. Other models were plausible in that they relied on standard activation functions. Split fovea, and Visual Field Di erences: The only model that addressed these issues was the Split fovea model. However, the split representation was not integrated into an location-invariant LPE, and the VF di erences arose from unrealistic assumptions. Retinotopic to Location-Invariant Encoding: Only the Split Fovea and BLIR- NET models addressed this problem. Neither model incorporated position-independent letter units, which is in con ict with the next requirement. Position-Independent Letter Units and Serial Encoding: The SOLAR and LEX models included such letter units under a serial readout of the visual image. In the SOLAR model, position was dynamically represented by activation level, and activation level was driven by serial processing. However, the word-level activation function was neurobiologically implausible. In the LEX model, order was represented temporally. However, it is unclear exactly how this encoding activated words, and the proposed activation dynamics were position-speci c. Thus neither model really solves the problem of how the letter-based, temporal encoding could be decoded at the word level. Relative and Transposition Priming: The SOLAR model came the closest to ful lling these requirements. However, this achievement is based on an implau- 57 sible word-activation function, which doesn?t give the proper relative-position results for primes and targets of di ering lengths. As for contextual units, the trigrams used in the present models (print-to-sound1 and BLIRNET) are not suf- ciently exible. The IA, print-to-sound2, and LEX models activated words in a position-speci c way. It is unlikely that the location-speci c encodings developed by the Split Fovea model could replicate these phenomena. Serial Processing: The LEX and SOLAR models include a serial readout of letters. Positional Patterns: SOLAR is the only model producing varying letter ac- tivation levels, which arise from serial activation of letter nodes. The proposed activation gradient is consistent with behavioral evidence for non- nal positions, but not with the nal-letter advantage. Furthermore, the activation gradient does not explain the interaction of perceptibility patterns with visual eld under unilateral presentation. In the following chapter, I present the SERIOL model, which satis es all of these requirements. 58 Chapter 6 The SERIOL Model of LPE My theoretical model of LPE is dubbed the SERIOL model (Sequential En- coding Regulated by Inputs to Oscillations within Letter units) [Whi01a]. The model is best motivated in a top-down manner. I rst give an overview, starting at the word level and working down. I then specify the model in more detail, which is best done in a bottom-up manner. 6.1 Overview 6.1.1 Highest Prelexical Orthographic Representation The relative-position and transposition priming results [Hum90, Per99, Per03, Gra04a] place strong constraints on the nature of the highest prelexical represen- tation [Gra04b]. Contextual units are a natural type of unit to represent order in the non-position-speci c manner that seems to be required. As we have seen, trigram units do not o er su cient exibility. Adding a wild-card character (\ " in BLIRNET [Moz91]) brings the representation closer to explaining the priming results. Maximum exibility is achieved by using bigrams instead of trigrams, and allowing letters to occur between the two letters of the bigram. Following my original speci cation of such bigram units [Whi99, Whi01a], Grainger later 59 also endorsed such units, dubbing them open bigrams [Gra04b, Sch04]. Bigram activation level is a decreasing function of the distance between two letters. Thus, bigrams triggered by contiguous letters are more highly activated than those representing separated letters. This leads naturally to a maximal allowable separation. Priming data suggest 2 is the maximum [Sch04]. A new assumption is that the external letters are anchored by edge bigrams1. For ex- ample, the stimulus chart activates bigrams *C, CH, HA, AR, RT, T* and CA, HR, AT and CR, HT, where the rst group of bigrams has the highest activation level, the next group has a lower activation level, and the last pair of bigrams has the lowest activation level. Bigrams contact the word level via weighted con- nections, where the weights are proportional to the bigram activation pattern for each word. 6.1.2 Nature of Pre-Bigram representation So then how are bigrams activated? Consistent with evidence for position- independent letter units [Per95], I assume that such units comprise the next lowest level. This requires that position be dynamically represented. Two pos- sibilities are that position is represented by an activation pattern, or by ring order. As discussed above in section 5.3.5, a monotonically decreasing activation gradient is inconsistent with the nal-letter advantage. Therefore, in line with evidence for left-to-right string processing [Har75, Nic76], letter order is taken to be represented serially. Thus in our example, C res, then A, then R, then T. A bigram is activated when its constituent letters re in the correct order, and bigram activation level falls o as the interval between letters increases. 1The original speci cations of the model [Whi99, Whi01a] did not include edge bigrams. 60 Because bigram units intervene between the letter and word levels, this solves the problem of how a temporal letter encoding can activate the word level. (This problem is discussed in section 5.4.) However, recall that the encoding of letter order also subserves a phonological route to the lexicon, as discussed in section 5.3.2. While open bigrams are suitable for lexical access along an orthographic route, they are not suitable for phonological access because they do not pro- vide phonologically meaningful units, thereby introducing the dispersion prob- lem. Therefore, I assume that processing branches after the sequential encoding. Along the orthographic route, letters activate open-bigrams, and then words. Along the phonological route, letters activate phonemes and then words (perhaps via some intermediate syllable-based encoding). The point is that bigrams are not activated along the phonological route. The serial nature of the letter-based encoding maps well onto the serial nature of phonology. 6.1.3 Induction of Serial Encoding How is this temporal ring pattern induced at the letter level? Hop eld [Hop95], and Lisman and Idiart [Lis95] have proposed related mechanisms for precisely controlling timing of ring. This is accomplished via a node which undergoes sub-threshold oscillations of excitability. For convenience, I designate the trough of the cycle to be the \start"of the cycle. Input level then determines how early in the cycle such a node is able to cross threshold and re. (See Figure 6.1.) Near the beginning of the cycle, excitability is low, so only a node receiving a high level of input can cross threshold and re. Excitability increases over time, allowing nodes receiving less and less input to progressively re. Thus serial ring at the letter level can be accomplished via letter nodes which oscillate in synchrony and take input in the form of an activation gradient. In our example, 61 Cell Potential Time 1 2 Base Oscillation Threshhold Figure 6.1: Interaction of input level and timing of ring for a cell with undergoing a sub-threshold oscillation of excitability. When a relatively high level of input (top curving line) is added to the base oscillation, the cell crosses threshold at time 1 (action potential not illustrated). If less input were received, the cell would cross threshold later in the cycle, such as at time 2. C would get the most input, A the next, R the next, and T the least. So C can re the earliest, A next, R next, and nally T. Note that these letter nodes are not tied to location or position. The same letter node can represent a letter occurring at any position, based on its timing of ring. Thus there must be an activation gradient across the next lower level of the model, to provide input to the letter level. Because this gradient decreases from left to right, these lower-level units must be tuned to retinal location. I have as- sumed that the input to the letter level comes from feature units [Whi01a]. How- ever, the assumption of feature units is not crucial. Input to position-independent letter units could just as well come from location-speci c letter units. The im- portant point is that an activation gradient across units tuned to retinal lo- cation interacts with synchronously oscillating letter nodes which are location- 62 and position-independent; a retinotopic representation is converted into a serial representation. The resulting serial representation is a location-invariant encod- ing. Thus location invariance is achieved by mapping space onto to time. This location-invariant encoding is presumed to occur in the LH. For convenience, I assume that this locational gradient occurs across feature units. The induction of serial ring also results in varying activations at the letter level. Letters that receive more input also re faster and achieve higher acti- vations. Therefore positional activations at the letter level are similar to those at the feature level, with the exception of the nal letter. The nal letter is not inhibited by the ring of a subsequent letter; thus it can re longer than non- nal letters. Although the nal letter receives a lower level of input than the other letters, it can reach a higher activation level (where activation is based on total number of spikes). This is consistent with the nal-letter advantage. See Figure 6.2 for a schematic of the letter through word layers. 6.1.4 Creation of the Locational Gradient How is the activation gradient induced at the feature level? Recall that at the lowest level of the model (dubbed the edge layer) there is a di erent activation pattern, one based on acuity. For a xated word, the acuity pattern across the letters in the RVF is the same as required for the locational gradient (i.e., decreasing from left to right). Thus the acuity gradient can serve as the locational gradient for those letters. However, in the LVF, the acuity gradient increases from left to right; its slope is in the opposite direction as required for the locational gradient. Therefore, when the edge level activates the feature level, the acuity gradient must be inverted in the LVF/RH, while it can be maintained for the RVF/LH. Details of this processing are presented below. For now, it is su cient 63 i m e T ZA B C R T 0.7 WORD BIGRAM Detect ordered pairs then 0.8 1.0 0.6 0.4 ACCR ATRTARCA 1.0 0.7 CART from Feature level GRADED INPUTS Sequential firing LETTER 1.01.0 Figure 6.2: Architecture of the letter, bigram, and word levels of the SERIOL model, with example of encoding the word CART. At the letter level, simulta- neous graded inputs are converted into serial ring, as indicated by the timing of ring displayed under the letter nodes. Bigram nodes recognize temporally ordered pairs of letters (connections shown for a single bigram). Bigram activa- tions (shown above the nodes) decrease with increasing temporal separation of the constituent letters. Activation of word nodes is based on the conventional dot-product model. 64 to note that such hemisphere-speci c processing could potentially be a source of VF di erences. 6.1.5 Summary There are ve levels of representation: edge, feature, letter, bigram, and word. The acuity gradient at the edge level is converted via hemisphere-speci c process- ing into a monotonically decreasing locational gradient at the feature level. This gradient interacts with oscillatory letter nodes, yielding serial ring and creating a location-invariant representation. A positional activation pattern also results at the letter level. The letter level feeds into separate orthographic and phonolog- ical routes. Along the orthographic route, open-bigram nodes respond to letter pairs that re in a particular order. Bigram activation depends on the time lag between the ring of the constituent letters. The bigrams contact the lexical level via weighted connections. I conclude this overview by brie y specifying how this model satis es the requirements for a LPE model. Split fovea and VF di erences: A split fovea is assumed at edge level. For- mation of the locational gradient integrates the two halves of the string. Due to the acuity gradient, this requires hemisphere-speci c processing, potentially accounting for VF di erences. Position-independent letter units and serial processing: Position is dynami- cally represented by ring order. Activation of words via a bigram layer provides a mechanism for decoding the temporal representation. Retinotopic to Location-Invariant encoding: This is achieved via the interac- tion of the locational gradient and oscillatory letter nodes. Relative-position and transposition priming: These phenomena are explained 65 by the open-bigram units. Positional patterns: The locational gradient overrides the acuity pattern. This gradient creates varying activations at the letter level. Lack of inhibition of the nal letter by a subsequent letter creates a nal-letter advantage. Neurobiological plausibility: Most interactions occur along standard weighted connections. As for the proposed temporal encoding, Lisman and Idiart dis- cuss empirical support for the underlying assumptions [Lis95]. In line with the proposed precision of spike timing, recent studies have shown that single spikes encode signi cant amounts of information [Rie97], and that spike timing is re- producible at a millisecond time scale [Ber97, Vic96]. In line with the proposed oscillatory cells, slice preparations have shown sub-threshold, theta-band oscilla- tions in cortical pyramidal cells [Fel01, Buc04]. Furthermore, a role for theta-band oscillations has been implicated in visual word recognition [Kli01]. As for the bi- gram nodes, others have proposed neural mechanisms of how temporally ordered pairs could be recognized, via transition of receptor conformations [Deh87], or activation decay under speci c connectivity patterns [Pul03]. 6.2 SERIOL model Having given an overview of the model, it is now presented in more detail. As discussed in the Introduction, this a theoretical framework. The model is speci ed by describing the representation and the activation pattern at each layer, and the transformations between layers. In the following, the term activation denotes the total amount of neural ac- tivity induced by a letter (within a given processing layer) over some xed time period. Thus, activation increases with the number of cells ring, their ring 66 rate, and the duration of ring (if ring duration is less than the time period being considered). 6.2.1 Edge Layer to Feature Layer At the edge level, the activation pattern results from the acuity gradient. That is, the total amount of neural activity representing a letter decreases as distance from xation increases. At the feature level, this pattern must be converted into the locational gradient, wherein activation decreases from left to right. Obviously, a high level of activation for the leftmost letter cannot be achieved by increasing the number of cells representing that letter. Rather, the locational gradient is created via modi cation of ring rates. It is assumed that the following trans- formations are learned during reading acquisition, most likely in response to a top-down attentional gradient. Recall that the acuity gradient can serve as the locational gradient in the RVF/LH, but not the LVF/RH. In the LVF/RH, the acuity gradient is inverted as the feature level is activated, via a combination of excitation and lateral inhi- bition. This process is displayed in Figure 6.3. It is proposed that letter features in the LVF/RH become more highly activated by edge-level inputs than those in the RVF/LH. This allows the rst letter to reach a high level of activation. This could occur either via higher bottom-up connection weights from the edge level, or by stronger self-excitatory connections. Within the RH feature level, there is strong left-to-right lateral inhibition. That is, a feature node inhibits nodes to its right. As a result, letter features corresponding to the rst letter receive no lateral inhibition, and inhibition increases as letter position increases. Thus, the features comprising the rst letter attain the highest activation level (due to strong excitation and lack of lateral inhibition), and activation decreases 67 toward xation (due to sharply increasing lateral inhibition, from more and more letters). In the RVF/LH, the acuity gradient serves as the locational gradient. Overall excitation is weaker than to the LVF/RH. Left-to-right inhibition is not necessary, although some weak such inhibition may steepen the slope of the gradient. The two hemispheric gradients are \spliced" together via functional cross- hemispheric inhibition. The RH features inhibit the LH features, bringing the activation of the RH features lower than the activation of the least activated LH features. As a result, an activation gradient that is strictly decreasing from left to right is created. This cross-hemispheric inhibition explains the LVF advantage for letter perceptibility in strings that straddle both visual elds [Ste03, Naz04a]. Next I consider the nature of this proposed cross-hemispheric inhibition. One possibility is that RH features directly inhibit LH features across the corpus cal- losum. Another possibility is that the RH feature-layer representation activates a corresponding feature-level represention in the LH, and that the inhibition occurs within the LH. It is a matter of debate whether callosal connections are primar- ily excitatory or inhibitory (see [Reg01] for a discussion). Computational models have shown that inhibitory cross-hemispheric connections are required to produce strong hemispheric lateralization, while predominately excitatory connections are necessary to model the reduced neural activity observed contralateral to a cor- tical lesion [Lev00, Reg01]. A single model demonstrated that inhibition at the sub-cortical level and excitation at the cortical level could account for both phe- nomena, suggesting that callosal connections may be predominately excitatory [Reg01]. However, even if callosal connections are predominately excitatory, the existence of inhibitory connections is not ruled out. Indeed, in the cat, stimulation of transcallosal neurons resulted in both excitatory and inhibitory post-synaptic 68 TSC A S A T LL E E RVF/LH fixation LVF/RH Figure 6.3: Formation of the locational gradient at the feature layer, for the cen- trally xated stimulus CASTLE. The horizontal axis represents retinal location, while the vertical axis represents activation level. The bold-face letters represent bottom-up input levels, which are higher in the RH than the LH. In each hemi- sphere, activation decreases as a eccentricity increases, due to the acuity gradient. The italicized letters represent the e ect left-to-right inhibition within the RH, and RH-to-LH inhibition. In the RH, C inhibits A, and C and A inhibition S, creating a decreasing gradient. The RH inhibits each letter in the LH by the same amount, bringing the activation of T lower than that of S. As a result, activation monotonically decreases from left to right. 69 potentials in contralateral receptive cells [Cis03]. It might possible to selectively strengthen such inhibitory connections, allowing unidirectional, transcallosal in- hibition. I do assume that callosal transfer of the RH information to the LH occurs prior to the letter level, which is taken to correspond to the LH?s VWFA. So the RH features may transcallosally inhibit the LH features and excite the LH letter representations. Alternatively, RH features may activate feature-level \copies" within the LH, and such inhibition and excitation would then occur entirely within the LH. Thus, I leave as open questions the nature of callosal transfer and the substrate of the proposed cross-hemispheric inhibition. 6.2.2 Feature Layer to Letter Layer The locational gradient of the feature level induces a temporal ring pat- tern across letter nodes wherein position is represented by the precise timing of ring relative to other letter nodes. All letter nodes are assumed to undergo synchronous, periodic oscillations of excitability. Following Lisman and Idiart [Lis95], this oscillation is taken to fall in the theta range (5 - 8 Hz; cycle length = 125 to 200 ms). Due to the locational gradient, letter nodes re serially. An acti- vated letter node inhibits other letter nodes. As a letter node continues to re, its ring rate slows, reducing lateral inhibition to the other nodes. This allows a new letter node to start ring. When an active letter node receives lateral inhibition, it then becomes strongly inhibited, so that it will not re re for the remainder of the oscillatory cycle.2 Thus the graded input levels and lateral inhibition create 2This raises the question of how repeated letters are handled. I assume that there are multiple copies of each letter node, and a di erent node becomes activated for each instance. 70 serial ring at the letter level. This process also creates varying activations at the letter level. I assume that a higher input level leads to faster ring.3 The activation of a letter node depends on both its ring rate and duration. Firing duration is determined by when the next letter starts to re, which is determined by the input level to that node. Thus the activation of a letter depends both on its own input level, and the input level to the next letter. Assuming a fairly constant ring duration across letters, this gives a decreasing activation gradient at the letter level. The ring duration of each letter is taken to be on the order of 10 - 20 ms. However, the nal letter is not inhibited by a subsequent letter. It can continue re until the end (down-phase) of the oscillatory cycle.4 Therefore, the nal letter could potentially re longer than the other letters, and reach a higher level of activation than the internal letters even though it receives less input. 6.2.3 Letter Layer to Bigram Layer A bigram node XY becomes activated when letter node X res, and then letter node Y res within a certain time period. Thus letter node X primes or gates node XY, allowing it to re when input from letter node Y is received. If the node is not initially primed by input from letter X, it cannot re. A bigram 3This is consistent with experimental results in which a hippocampal CA1 neuron was driven by oscillatory current injected into the cell body, coupled with stimulation to the dendrites. Increasing the amplitude of the dendritic current caused the cell to re earlier with respect to the somatic oscillatory cycle, and to re more quickly, generating more action potentials [Mag01]. 4This assumes that a single word is being processed, as in experimental studies. Under natural reading conditions, multiple short words could be represented in a single oscillatory cycle. 71 node responds with a burst of ring, and then is quiet. The number of spikes in this burst decreases as the time increases between the ring of X and Y. Thus, the activation of XY indexes the separation of letters X and Y in the string. In previous articles on the SERIOL model, I have assumed that bigram acti- vations were in uenced by letter activations [Whi99, Whi01a, Whi04a, Whi04c]. However, this assumption is inconsistent with emerging evidence on the weak positional e ects of priming at the word level [Gra04a]. Therefore, I now take bi- gram activation levels to be a ected only by separation of the constituent letters. Following the evidence for a special role for external letters, the string is anchored to these endpoints via edge bigrams.5 That is, bigram *X is activated when letter X is preceded by a space, and bigram Y* is activated when letter Y is followed by a space. In contrast to other bigrams, an edge bigram cannot become partially activated (i.e., by the second or next-to-last letter). Thus I assume a special mechanism for the activation of edge bigrams, which operates somewhat di erently than for bigrams detecting a pair of letters. The details of this edge detection are left for future work. Because letters are activated sequentially, bigram activations occur sequen- tially. For example, the input cart rst activates bigram node *C (when letter node C res), then CA (when A res), then AR and CR (when R res), then RT, AT, and CT (when T res), and then T*. 5This is a new assumption. The importance of the external letters was formerly captured via high activations of bigrams containing those letters. However, now that bigram activation levels do not re ect letter activation levels, edge bigrams are now assumed. 72 6.2.4 Bigram Layer to Letter Layer Bigram nodes connect to word nodes via weighted connections. The weight on a bigram-word connection is proportional to the activation level of that bigram when that word is presented as input. (As would result from Hebbian learning.) As is usual in neural network models, the weight vector is normalized, so that bigrams making up shorter words have higher connection weights than bigrams making up longer words. For example, this allows the string tee to activate the word node TEE more than TEETHE.6 The input to a word node is the dot product of the weight vector and input vector. The input vector changes over time, because bigram activations occur serially, as indicated above. The activation of a word node at time t is a function of its input at time t and its activation at time t 1. Lateral inhibition within the word layer also operates over time. 6.3 Summary In the following, I summarize the important assumptions in the model. Edge Layer { Retinotopic { Activation levels based on acuity gradient. { Representation of fovea split across hemispheres. Feature Layer (for a left-to-right language) 6Normalization is another new assumption. Information concerning the length of the string was formerly carried on the activations of bigrams which represented the nal letter. 73 { Retinotopic, representation still split across hemispheres { Locational Gradient - Activation decreases from left to right. { Locational Gradient formed by hemisphere-speci c processing: Stronger excitation to RH than LH. Left-to-right lateral inhibition with a hemisphere, much stronger in RH. RH inhibits LH. Letter Layer { Location- and position-independent letter nodes, located in LH. { Letter nodes undergo sub-threshold oscillations in synchrony. { Lateral inhibition between letter nodes. { Interaction of oscillations, lateral inhibition, and locational-gradient input gives serial ring. { Letter node activation depends on: Firing rate - determined by input level. Firing duration - determined by when next letter starts to re, which is determined by the input level to that letter. Bigram Layer { Bigram XY activated when letter X res and then letter Y res. { Activation of bigram XY decreases with the amount of time between the ring of letter X and letter Y. { Edge bigrams also activated. 74 Word Layer { Receives weighted connections from bigram layer. { Weight vectors are normalized to give advantage to shorter words. { Lateral inhibition operates as bigrams sequentially activate word nodes. 75 Chapter 7 Account and Simulations of LPE Behavioral Results Having speci ed the SERIOL model, and motivated the di erent processing layers, I next discuss in more detail how the model accounts for the behavioral re- sults, with the use of implemented models in some cases. The topics are presented in roughly the same order as the review of the experimental results. 7.1 Word Level 7.1.1 Bigrams I start with a simulation of the bigram and word layers, based on a database of over 3,500 monosyllabic words. The most fundamental requirement is that the bigram representation of letter string should activate the corresponding word node more highly than any other word node. Thus one goal of the simulation is to show that the bigram representation does indeed allow correct word recognition. In addition to demonstrating the viability of the bigrams, another goal is to reconcile some con icting results concerning positional e ects at the word level. Recent priming data on long words has demonstrated that facilitation is rather insensitive to position of matched letters in the target. That is, when a prime is 4 or 5 letters, and a target is 7 or 9 letters, there is little di erence in facilitation 76 between primes matching on the rst letters versus the nal letters [Gra04a]. Yet aphasic [Whi99] and perceptual data [Hum90, Mon98] indicate an advantage for the initial letters over the nal letters. In a previous implementation of the bigram and word levels, I simulated the aphasic data using bigram activations that were sensitive to letter activations (and therefore to string position) [Whi99]. However, this assumption that letter position in uences bigram activations is inconsistent with the lack of positional e ects in priming. Another problem is that the original simulation required an additional assumption - that input to the letter level was reduced in aphasics, thereby pushing the ring of the nal letter near the end of the oscillatory cycle, yielding a low activation level for the nal letter (as opposed to the usual nal-letter advantage.) This was necessary to simulate the nding that the nal letter is the least likely to be preserved in an erroneous response. Therefore, I sought to implement an improved bigram-to-word simulation that demonstrates both a weak positional priming e ect, and the strong positional er- ror pattern in the aphasic data (ideally without requiring additional assumptions about activation patterns at the letter level). In the original simulation, the tem- poral aspect of bigram and word activations was not considered, nor was lateral inhibition within the word layer. Rather, a bigram vector activated the word layer in a single time step in a purely bottom up manner. However, a more realistic simulation which includes these factors may allow the above goals to be met. It may be the case that the aphasic error pattern arises from a temporal activation pattern, rather than a positional one. That is, bigrams that are matched early in the word-activation process could have an advantage over those that are matched later (due to ongoing lateral inhibition within the word layer), even though bi- gram activations do not vary with position. Based on these ideas, I implemented the following simulation, which met the three goals: (1) correct recognition of 77 all words in the database; (2) replication of aphasic error pattern under noise; (3) lack of positional e ects in target-node activations. I rst give give a brief overview of the simulation. The input layer was comprised of bigram nodes, and the output layer consisted of word nodes representing all words in a database of 3650 single-syllable English words. The input layer connected directly to the output layer. Bigram-to-word weights were set according to the principles in section 6.2.4. Bigram activations were clamped sequentially, as discussed in section 6.2.3. Lateral inhibition within the word layer occurred after each set of bigram activations. Lateral inhibition was included to show that the temporal development of word-level activations could account for the aphasic error pattern. It was not used to simulate set- tling (reaction) time. Thus the word node having the highest activation following presentation of the nal bigram was simply selected as the response. Aphasic per- formance was simulated by adding noise to the word level. Priming was simulated by noting target-node activation under partial input. Next the simulation is speci ed in more detail. The functions implementing normalization and lateral inhibition were chosen on the basis of convenience and computational e ciency, rather than biological plausibility. In the following, C denotes a parameter. Let Bxy denote a bigram node representing the letter x followed by the letter y. Its activation, A; for a string S is a function of the number of letters separating x and y, denoted Sep. A(Bxy; S)= 1.0 for Sep = 0, CS1 for Sep = 1, CS2 for Sep = 2; and 0 otherwise. Let WdS represent a word node encoding string S. The weight from a bigram node to a word node is given by: W(Bxy; WdS) = CnrmLen(S) + C nrm A(Bxy; S) where Len(S) gives the length of the string. This scaling of the bigram?s activa- 78 tion value provides normalization by decreasing the weights for longer words, via division by Len(S). The constant Cnrm modulates this normalization; the higher its value, the less the e ect. (If a bigram receives two di erent activation levels for a word, the larger of A(Bxy; S) is taken.) A string S is presented over Len(S) + 1 time steps. At each time step t, the bigrams are clamped to the values that would arise from the activation of the letter in position t. Word-level activations are then updated in two stages. (1) For each word node, the incoming activation is simply added to the current activation. The incoming activation is given by the dot product of the bigram vector and the word node?s weight vector. (2) The e ects of lateral inhibition are simulated by updating each word node?s activation as follows: A(WdS; t) = Cinh A(Wd S; t) MaxA(t) + (1:0 Cinh) A(Wd S; t) where MaxA(t) is the activation of the word node having the highest activation. The constant Cinh (which takes values from 0 to 1.0) determines the overall contribution of inhibition. When Cinh is 0.0, the activation remains unchanged; when Cinh is 1.0, the activation is weighted by the ratio of the activation to the maximum activation. (Thus, the lower the activation value is with respect to the maximum activation, the more the activation is reduced, thereby simulating the e ect of lateral inhibition.) The parameters were hand-tuned to meet the above three goals. These goals are often at cross purposes. Goal (1) requires normalization of the weight vector. Yet if shorter words have too much of an advantage, they excessively inhibit longer words, under the inhibition required for goal (2). Goal (2) requires strong positional e ects, while goal (3) requires weak positional e ects. A range of parameter values near the following values yielded reasonable re- 79 sults; the results for these particular values are presented. CS1 = 0:8 CS2 = 0:2 Cnrm = 50 Cinh = 0:5 All words in the database were recognized correctly, under the requirement that di erence between the activation of the target word and the next highest word be at least 0.2. The most challenging task was to distinguish between TEE, THEE, TEETH, and TEETHE. Priming was simulated by including the strings CDFGHKLMN, LMNPQRS, and STVWX in the database, and calculating their activation when a partial match was used as input. For example, to simulate a prime corresponding to the nal ve letters of a nine-letter word (56789), the activation of the CDFGHKLMN node is calculated for the input HKLMN. Table 7.1 gives the results. For seven- and nine-letter targets, there was a very weak advantage (of about 0.15) for initial versus nal primes, which is numerically consistent with the experimental results [Gra04a]. Primes which did not experimentally produce facilitation all yielded simulated activation levels (< 3.9) that were lower than activations of all primes that did produce facilitation (> 4.2). There was a strong correlation between amount of facilitation and simulated score, as shown in Figure 7.1. The large di erence in the values of CS1 and CS2 was required to allow 13459 to give a considerably lower score than 6789 (for a 9-letter target), in accordance with the nding that only the latter produced priming. However, CS2 had to remain non-zero in order to be consistent with the nding that 125436 primes a six-letter target, while 12d4d6 does not [Per04]. That is, if CS2 is 0, bigrams 25 and 36 have weights of 0, erroneously giving no di erence between these two types of primes. A lesion was simulated by adding normally distributed noise to each word 80 Prime Act Fac (ms) 1234 4.71 36* 2345 4.73 32* 1245 4.50 31* 12345 6.01 45* 34567 5.84 37* 13457 5.58 29* 1234 4.56 36* 4567 4.41 32* 1357 3.71 12 15437 2.69 0 73451 2.30 7 12345 5.75 30* 56789 5.62 26* 1234 4.35 23* 6789 4.22 19* 14569 3.86 12 1469 2.20 8 Table 7.1: Simulated and experimental results for priming conditions from [Gra04a]. Act denotes the activation of the target node in the simulation for the given prime. Fac denotes the the facilitation for that prime in the experi- mental results (di erence between reaction times for the control condition (dddd or ddddd) and the prime condition), where * denotes facilitation is statistically signi cant. The top group is ve-letter targets; the middle group is seven-letter targets and the bottom group is nine-letter targets. 81 0 1 2 3 4 5 6 7 0 10 20 30 40 50 Simulated Score Facilitation (ms) Figure 7.1: Comparison of simulated score and amount of facilitation using data from Table 7.1 (r=.87; p < .0001). 82 node at each time step (prior to the inhibition). Noise with mean 0.3 and stan- dard deviation 0.35 yielded good results, shown in gure 7.2. As is evident, the probability of retaining a letter decreased with its position. This is not merely an artifact of the scoring method (in which a retained letter had to be in the correct absolute position), as scoring from right to left did not yield this pattern. Furthermore, this decreasing pattern was not present when the simulation was run without lateral inhibition. (See Figure 7.3). Thus, under inhibition, words that are highly activated early come to dominate. Therefore, nal letters have less in uence than the initial letters even though their bigrams are activated to same level. The results of the lesioned simulation also showed other similarities to the experimental data. Aphasic subjects tended to preserve word length in their erroneous responses. Average response length to targets of lengths 3-6 were 4.0, 4.2, 4.9, and 5.9, respectively [Whi99]. The simulated data also showed sensitivity to target length, giving 4.2, 4.8, 5.1, and 5.8. Retention level at a given position tended to increase with target length for both the aphasics and the simulation. For example, for position 3, experimental retention rates were 40%, 55%, 65%, and 55% for target lengths 3-6, respectively. The simulated data exaggerated this e ect, giving 36%, 48%, 81%, and 92%. Thus the simulation accomplished the stated goals. There was a weak posi- tional e ect for priming, but a strong positional e ect in the presence of noise. In the priming simulation, the target node?s activation was primarily in uenced by the number and separation of the prime?s bigrams, while the temporal nature of the inhibition only had a small e ect. In the lesion simulation, potential er- roneous responses that were not highly activated initially became inhibited and remained at a disadvantage. Therefore, retention level was highest for early string 83 0 20 40 60 80 100 1 2 3 4 5 6 Percent Retained Letter Position Experimental Simulated. Figure 7.2: Experimental [Whi99] and simulated results for the aphasic error pattern. The percent retained refers to the percentage of erroneous trials in which the letter in the ith position in the target occurred in the ith position the response (n = 201 for experiment; n = 363 for simulation). Data are collapsed over target lengths of three to six. (In the both the experimental data and the simulation, there was also a decreasing pattern within each target length.) 84 0 20 40 60 80 100 1 2 3 4 5 6 Percent Retained Letter Position Backward No Inhibition Figure 7.3: Simulation results under backward scoring, and no inhibition. In backward scoring, the target and response are aligned at the nal letter, and scored from right to left. In this case, position 1 corresponds to the nal letter, 2 corresponds to the next-to-last letter, etc. The backward results are from the same simulation run as Figure 7.2. For the no-inhibition condition, a new simulation was run with Cinh = 0, and scored in the forward manner. Because backward scoring yielded a relatively at pattern, and no inhibition yielded a V-shaped pattern, this shows that the decreasing pattern in Figure 7.2 was not merely an artifact of the scoring method. 85 positions and decreased across the string, giving a strong positional e ect. The principles implemented in this simulation are also consistent with other priming and perceptual data. The left-to-right activation of bigrams accounts for the initial-letter advantage when only a single letter is matched in the prime [Hum90], and the perceptual error pattern, in which letter retention decreases across the string (like the aphasic error pattern) [Hum90, Mon04]. The anchoring of the external letters (via edge bigrams) accounts for their positional speci city, wherein priming does not occur when an external letter is moved to an internal position [Hum90, Per03]. For simplicity in the simulation, internal bigrams were weighted as highly as edge bigrams. However, edge bigrams may actually be weighted higher, which would account for the nding that matching the external letters produces more facilitation than matching any other two letters of a four- letter target [Hum90]. When three out of four letters are matched in a prime, there is no positional speci city [Hum90], consistent with the weak positional e ects in the priming simulations. Next I discuss a word-level e ect that is explained at the letter level. 7.1.2 Letters Sequential activation at the letter level explains the observed interaction be- tween word length and rotation angle in the lexical-decision experiment in which the stimuli were rotated [Kor85] (as discussed in section 8.1). Recall that length had no e ect for small angles, while each additional letter delayed RTs by about 200 ms for large angles. For intermediate angles, RTs were neither at nor linear. These data are redisplayed in Figure 7.4. The authors concluded that it was not possible to explain this data under a single unitary principle. However, the SERIOL model does allow such an explanation. 86 800 1000 1200 1400 1600 1800 2 3 4 5 Response Time String Length 0, 100 20, 120 40, 140 60, 160 80, 180 Figure 7.4: Experimental reaction times (in milliseconds) for the rotated-string lexical-decision task. Each line represents one angle of rotation, where the lower lines correspond to 0 through 80 , and the upper lines correspond to 100 to 180 . 87 As discussed in section 8.1, the presence or absence of a length e ect does not necessarily diagnose whether lexical access occurs serially or in parallel. There could be no length e ect under serial access if earlier ring of the nal letter for shorter words is o set by longer settling time at the word level. However, length e ects may arise under conditions of degraded presentation, when input levels to letter nodes are reduced such that it takes multiple oscillatory cycles to represent a sequence of letters that is normally represented in a single cycle. I suggest that such a phenomenon underlies the RT results from the rotated word experiment. This analysis implies that such length e ects should depend on the time scale of the oscillatory cycle. Recall that for the largest rotation angles, each additional letter increased RTs by approximately 200 ms, which is on the order of the length of the proposed oscillatory cycle. Thus I propose that a unitary principle can explain this data - namely, that letter position is encoded temporally via an oscillatory carrier wave. When the input is degraded (by rotating the letter string), the underlying temporal nature of the encoding is exposed. The feasibility of explaining this data under the SERIOL model is next demonstrated via a simulation, which was rst presented in [Whi02]. Simulation I assume that subjects performed the lexical decision task by mentally rotating the string to the canonical horizontal orientation, and then processing the string as usual. This assumption is consistent with the fact that RTs for two-letter words were smoothly increasing with rotation angle. It is also assumed that the act of mental rotation decreases the amount of input reaching the letter nodes, and that this degradation increases with the amount of rotation. These assumptions, in conjunction with the SERIOL model, provide a natural explanation for the 88 experimental data. Up to a certain amount of rotation, there is still su cient input to activate all the letters within a single oscillatory cycle (i.e., up to 60 ). After that point, there is su cient input to activate all of the letters in shorter words, while longer strings require an additional cycle (i.e., for 80 and 100 ). This accounts for the intermediate region where RTs are neither constant nor linear. With further degradation, only two-letter words can be represented in a single cycle; each additional letter requires an additional cycle (i.e., 120 to 180 ). It is assumed that once the mental image of a letter has activated a letter node, that image is inhibited. This allows a determination of whether all letters have been processed. However, bigram activation depends on the ordered ring of letter nodes within a single oscillatory cycle. If severely degraded input causes each letter node to re on a separate cycle, how then could the bigram nodes become activated? It is assumed that letters which have already red can re re again on successive cycles. However, this re ring can?t be triggered by bottom-up input, since the mental image is inhibited once it activates a letter node. How then could a previously activated letter node re re? It has been proposed that an after-depolarization (ADP), which has been observed in cortical pyramidal cells following spiking, can maintain short-term memory across oscillatory cycles [Lis95]. The ADP is a slow, steady increase in excitability, peaking at approximately 200 ms post-spike. The temporal gradient of the ADP can maintain the ring order of elements across oscillatory cycles, in the absence of bottom-up input, as demonstrated in a simulation [Lis95]. Thus, this mechanism could maintain the ring order of letter nodes that have been previously activated. I have implemented a simulation of the RT for the rotated word experiment based on these ideas. The interaction between the underlying oscillatory cycle, 89 external input levels, lateral inhibition, and the ADP was modeled in order to arrive at a ring time for the nal letter of the string. This ring time, combined with other quantities, gives the modeled RT. Next the details of the implemented model are presented. Instantiating the theoretical framework in a simulation entails the speci cation of quite a few pa- rameters. Most of these parameters are related to the neuronal dynamics (ADP, oscillations, and inhibition) and are set to physiologically plausible values, similar to those used in [Lis95]. In tting the computational model to the experimental data, the primary focus of optimization was the input function. This function was hand tuned. Reaction-Time Equation The modeled RT, R, is given by: R( ; l) = CBR + H( ) + W( ; l) where denotes the angle of rotation (given in degrees) and l denotes the string length. CBR denotes a base RT, set to 730 ms. H denotes the time required to mentally rotate the string; it is a linearly increasing function of . Fitting to the RTs for two-letter words gives: H( ) = 1:5 W denotes the time required to activate all the letter nodes corresponding to the string; that is, W is the rst time at which the nal letter node res. The functions which determine W are the instantiation of the SERIOL framework. These functions specify the activation of the letter nodes. 90 Letter-node Equations Following Lisman and Idiart [Lis95], letter nodes are modeled as units that undergo a sub-threshold oscillatory drive, exhibit an increase in excitability after ring (ADP), and send inhibitory inputs to each other. The membrane potential, V , of a letter node is given by: V ( ; i; t; c) = O(t) + A(i; t) I(t) + E( ; i; c) where i denotes the letter node representing the ith letter of the word, t denotes time (ranging from 0 to length of the oscillatory cycle), and c denotes the number of completed oscillatory cycles. O gives the oscillatory drive, A gives the ADP, I gives the inhibitory input, and E gives the excitatory external input (originating from the feature level). A node res when V exceeds a threshold, CTH, which is speci ed relative to resting potential, and set to 10mV. Firing causes the node?s ADP component to be reset and inhibition to be sent to the other nodes. E is permanently set to 0 the rst time that a node res. The oscillatory function O has a cycle length of 200 ms, and linearly increases from -5mV to 5mV during the rst half of the cycle, and decreases back to -5mV during the second half. The ADP and inhibition are modeled by functions of the form: F(t; M; T) = M (t=T)1:5 exp(1 t=T) which increases to a maximal value (controlled by parameter M) and then de- creases (on a time scale controlled by parameterT). The ADP is given by: A(i; t) = F(ti; MA; TA) 91 where ti denotes the amount of time since node i last red. (A is 0 if the node has not yet red.) The inhibition is the sum of the inhibitory inputs from all letter nodes, given by: I(i; t) = lX j=1 F(tj; MI; TI): These parameters were hand tuned (in conjunction with E) to give the desired ring pattern. The following values were used: TA = 200 ms, MA = 11 mV, TI = 3 ms, MI = 3 mV. The external input E is a decreasing function of position i; this corresponds to the locational gradient at the feature level. In the following, E is speci ed for a node that has not yet red; if node i has already red, E( ; i; c) = 0. First we consider the initial oscillatory cycle, for an unrotated string. The following function was used: E(0 ; i; 0) = 10:6 0:5i Mental rotation degrades the external input, so E decreases as increases: E( + 20 ; i; 0) = E( ; i; 0) 0:65 sin( + 20 ) External input builds up over time, with E increasing after each oscillatory cycle: E( ; i; c + 1) = E( ; i; c) + 0:2 Results A simulation for each combination of l and was run, starting at time t = 0 and using a time step of 1 ms. At each time step, each letter node?s potential was calculated using the equation for V . For all rotation angles and string lengths, all active letters of the string red in the correct sequence on each cycle. The value of W( ; l) was set to tfinal + 200cfinal where where tfinaland cfinal are the 92 rst t; c at which V ( ; l; t; c) > Cth. For example, for = 0 and l = 4, nodes 1, 2, 3, and 4 red at t =49, 63, 74, and 84, respectively, during the rst cycle, giving W(0 ; 4) = 84. For = 180 and l = 4, nodes 1 and 2 red at t =86 and 100 in the rst cycle. In the second cycle, nodes 1, 2, and 3 red at t =52, 65, and 94. In the third cycle, nodes 1, 2, 3, and 4 red at t =43, 55, 66, and 97, giving W(180 ; 4) = 97 + 200 2 = 497. Each node re red earlier in successive cycles due to the ADP. This earlier ring, in conjunction with increasing external input, allowed more letters to re on each cycle. The slowly increasing ramp of the ADP, in conjunction with lateral inhibition, maintained the proper ring sequence across cycles. The RT was then calculated using the equation for R. The results are given in Figure 7.5. The simulation reproduced the experimental pattern of relatively at RTs for small angles, and rapidly increasing RTs for large angles, with a mixture of the two patterns for intermediate angles. In the experimental data, there was also a pervasive disadvantage for two-letter words, which is not captured by the simulation. 1 While this simulation may seem complex, it is merely an instantiation of the previously speci ed dynamics for induction of the serial ring pattern (with the addition of the ADP), coupled with the assumption that bottom-up input decreases with rotation angle and increases over time. It illustrates the simple idea that the interaction between string length and rotation angle arose because multiple oscillatory cycles were progressively required to represent all of the letters 1The simulated results can be made to look more like the experimental results by simply adding 100 ms to all two-letter reaction times. It is unclear what the source of this disadvantage is. It may be related to the fact that vowels are normally explicitly expressed in Hebrew, leading to ambiguity for very short strings. 93 800 1000 1200 1400 1600 1800 2 3 4 5 Simulated Reaction Time (ms) String Length 0, 100 20, 120 40, 140 60, 160 80, 180 Figure 7.5: Simulated reaction times for the rotated-string, lexical-decision task. Notation is the same as Figure 7.4. 94 of the input string. Applying the SERIOL model to this experimental data yields a natural explanation of these data. It accounts for the nding that there is an intermediate region of rotation angles where processing seems neither fully parallel nor fully serial, which is di cult to explain otherwise. It also predicts the nding that the increase in RT per letter for large rotation angles is on the order of 200 ms (i.e., an oscillatory cycle in the theta range). 7.2 Letter Perceptibility Patterns As discussed in section 6.2.2, the induction of the serial encoding leads to di ering activations at the letter level. These activation patterns depend on the interaction of the locational gradient and the oscillatory cycle. Such dynamics explain observed patterns of letter perceptibility, which vary with string position and visual eld, as follows. For a centrally xated string, the initial-letter advantage and nal-letter ad- vantage arise for di erent reasons. The initial letter has an advantage because it receives the highest level of bottom-up input, allowing it to re the fastest. It receives the most input because it is not inhibited from the left at the feature level. The nal letter has an advantage because it is not inhibited by a subse- quent letter during the induction of serial ring. That is, it is not inhibited from the right at the letter level. Thus, like others, I also attribute the advantage for the external letters to a lack of lateral inhibition. But this arises because of string-speci c processing, and not from a lack of masking at a very low level (as is generally assumed). This proposal is consistent with the nding that there is no external symbol advantage for strings of symbols that are not letters or num- bers [Mas82]. For such centrally xated symbol strings, the external symbol is 95 the least well perceived, as would be expected on the basis of acuity. Strings of letters and numbers show an external symbol advantage because of string-speci c processing. This predicts that it should possible to di erentially a ect the initial- and nal-letter advantages. The initial-letter advantage should disappear if the amount of bottom-up input to the initial letter is not signi cantly higher than to the other letters. The nal-letter advantage should disappear if the ring of the nal letter is pushed late into the oscillatory cycle. As we shall see, this is exactly what happens for brief, lateralized presentation of short strings. First however, a more in depth consideration of activation patterns at the feature level is required. Recall that locational gradient formation requires di erent processing across the hemispheres. In the RVF/LH, the acuity gradient serves as the locational gradient. In the LVF/RH, the acuity gradient is inverted via strong bottom-up excitation and left-to-right lateral inhibition. Because the locational gradient is formed by di erent mechanisms in each hemisphere, the shape of the resulting gradient may vary with hemisphere, especially at large eccentricities. Recall that acuity falls o fastest near xation, and falls o more slowly as eccentricity increases. That is, the slope of the acuity gradient is steepest near xation, and becomes shallower as eccentricity increases. Since the RVF/LH locational gradient is based on the acuity gradient, this implies that the RVF/LH locational gradient becomes more shallow as eccentricity increases. (See right half of Figure 7.6.) In the LVF/RH, formation of the locational gradient depends on left-to-right lateral inhibition. This processing is optimized to create the locational gradient for a small number of letters near xation. For long strings at large eccentrici- ties, inhibition may be too strong at early string positions (due to their relatively 96 C RA C C A R T A R TT Figure 7.6: Schematic of locational gradients for the stimulus CART at three di erent presentation locations. The vertical axis represents activation, while the horizontal axis represents retinal location. For central presentation, the gradi- ent is smoothly and rapidly decreasing. For RVF presentation, the gradient is shallower because the acuity gradient is shallower. For LVF presentation, the ini- tial letter strongly inhibits nearby letters, but the gradient attens out as acuity increases. 97 low level of activation), but may become too weak at later string positions (due to their increasing acuity). (See left half of Figure 7.6). Thus the prediction is that the locational gradient should vary with visual eld. Assuming that letter perceptibility directly indexes letter activation levels, which depend on feature ac- tivation levels, this suggests that letter perceptibility patterns should vary with visual eld. As discussed in section 4.2, they do indeed. Recall that letter per- ceptibility uniformly drops o with increasing string position in the RVF/LH. In contrast, in the LVF/RH, perceptibility drops o sharply across early string positions, and then attens out for later string positions [Wol74]. This data is re-presented in the top panels of Figure 7.7. The proposed hemisphere-speci c shapes of the locational gradient explain this data2, as shown by the results of a mathematical model (bottom panels of Figure 7.7), described next. 7.2.1 Mathematical Model The data are modeled by calculating a feature-level activation, which is con- verted into a letter-level activation, which determines an accuracy score. At the feature level, the stronger excitation and left-to-right inhibition in the RH are modeled, as well as the cross-hemispheric inhibition from the LH to RH. The general form of the equations are presented rst, then the speci c instantiations of those equations are speci ed. 2I?d like to note that the theory of locational gradient formation was not formulated to explain this data. Rather, the theory was constructed to explain how a monotonically decreasing gradient could be formed, starting with the assumption of the acuity pattern. Once the theory was formulated, it predicted these activation patterns. Only then did I actually seek out relevant experimental data. The fact that existing data showed the predicted pattern convinced me that I was on the right track. 98 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= -2 R= -3 R= -4 R= -5 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= +2 R= +3 R= +4 R= +5 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= -2 R= -3 R= -4 R= -5 0 20 40 60 80 100 1 2 3 4 5 6 7 Percent Correct String Position R= +2 R= +3 R= +4 R= +5 Figure 7.7: Experimental (top) and modeled (bottom) results of [Wol74], with LVF presentation on the left and RVF on the right. Each graph shows the e ect of string position on perceptibility at a given retinal location (speci ed in R units of letter width). 99 All feature nodes comprising a single letter are assumed to reach a similar level of activation, which is determined by the retinal location, R, and string position, P, of that letter. For simplicity, xation (R = 0) is assigned to a single hemisphere, namely the RVF/LH. The activation of a feature node is denoted F. I rst consider feature activations in the absence of hemispheric interaction, denoted Fh. Fh is determined by the combination of bottom-up excitatory input, E, and lateral inhibitory input, I, and is restricted to a maximal value, cM. That is, Fh(R; P) = min(cM; E(R) I(R; P)): In the following speci cation of Fh, \letter" will refer to a letter?s feature nodes. Bottom-Up Excitation Excitatory input is a function of acuity, denoted C, and visual eld (which is determined by R): E(R) = 8 >>< >>: C(R) if R 0 cE C(R) if R < 0 where cE 1, re ecting the assumption of stronger excitatory input to the LVF/RH. E decreases as jRj increases, re ecting the acuity gradient. Lateral Inhibition Lateral inhibitory input is the sum of inhibitory inputs from letters to the left of R. This quantity increases with the number of such letters, their activa- tion levels, and the strength of the inhibitory connections. Rather than directly modeling the feedback processes underlying such lateral inhibition, the amount of inhibitory input is approximated as the activation of the leftmost letter?s features 100 weighted by a function of the number of letters sending inhibition. The leftmost letter refers to the letter which lies farthest to the left within the same visual eld as R. Its retinal location is denoted Rl. To determine the inhibitory input, the leftmost letter?s activation is multiplied by a weighting function, W. W increases with the number of letters lying between Rland R. W also depends on hemisphere; inhibitory connections are stronger in the LVF/RH than in the RVF/LH (as is necessary to invert the acuity gradient). Thus, we have I(R; P) = Fh(Rl; Pl) W(jRl Rj; R) where the rst term on the right hand side gives the activation of the leftmost letter, and W is a non-decreasing function of jR Rlj, which is larger for R < 0 than for R > 0. If R = Rl, W = 0 (because the features of the leftmost letter do not receive inhibition). Cross-hemispheric Lateral Inhibition The individual hemispheric gradients are \spliced together" via inhibition of the RVF/LH?s letters by an amount proportional to the number of letters in the LVF/RH. That is, F(R; P) = 8> >< >: Fh(R; P) cF (P R 1) if R 0 Fh(R; P) if R < 0 where cF is a positive constant. This yields a decreasing gradient such that F(R; P) > F(R + 1; P + 1). 101 Speci cation of Feature-Level Parameters and Functions To instantiate the feature level, values for the constants cM, cE, and cF, and de nitions of the functions C and W must be supplied. The following allowed a good t to the data: cM = 1:0 cE = 1:8 cF = 0:2 The acuity function is de ned recursively: C(jRj + 1) = C(jRj) Cdif(jRj + 1) C(0) = 1:1 Cdif(r) = 8 >>> >>> >>> < >>> >>> >>> : 0:1 if 1 r 3 0:07 if 3 < r 6 0:05 if 6 < r 9 0:04 if 9 < r: Cdif decreases as r increases, re ecting the decrease in the slope of the acuity gradient with increasing eccentricity. The de nition of the inhibitory weighting function,W, is best displayed in tabular form: jR Rlj R 0 R < 0 0 0:00 0:00 1 0:15 0:80 2 0:25 1:10 3 0:30 1:25 4 0:50 1:35 5 0:50 1:45 6 0:50 1:65 102 The Letter Level For simplicity, feature level activations were directly converted to letter level activations (rather than modeling the oscillatory cycle), as follows: L(R; P) = 8> >< >>: F(R; P) + 0:2 if P = 1 F(R; P) if P > 1 Thus, letter level activations are equivalent to feature level activations, except at the rst position. This was necessary to provide a good t the data, and corresponds to a non-linearity in the interaction of the oscillatory function and the locational gradient near the trough of the cycle for high levels of input. The letter activation was converted to the modeled accuracy by multiplying by 100 and bounding the result between 0 and 100. The results are given in Figure 7.7. In the LVF/RH, increasing string posi- tion from 1 to 2 or from 2 to 3 has a strong e ect because of the high level of inhibition. However, as string position continues to increase, there is less and less e ect because the leftmost letter becomes less and less activated. Thus the perceptibility function attens out. This e ect is most pronounced at larger ec- centricities where feature-level activations are lower. In the RVF/LH, increasing string position leads to an increase in the activation of the leftmost letter or to increased cross-hemispheric inhibition. Coupled with the weak inhibition, this leads to a slow, steady decrease in perceptibility as string position increases. 7.2.2 Short Strings Next I consider perceptibility patterns for short strings (three or four letters) at large eccentricities, as discussed in section 4.2.3. In the following, primacy will signify that a letter is perceived better than all other letters, whereas advantage 103 0 20 40 60 80 100 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Percent correct Retinal Location Figure 7.8: Experimental results from [Est76] for a four-letter string embedded in $?s, occurring at two di erent retinal locations in each visual eld. Exposure duration was 2400 ms. (Subjects were trained to maintain central xation, and their gaze was monitored.) will mean that an external letter is perceived better than the internal letters. Recall that in the LVF/RH, there is an initial-letter primacy, with little or no advantage for the nal letter. In the RVF/LH, there is little or no advantage for the initial letter, and there is a nal-letter primacy. Thus, in each visual eld, the letter farthest from xation is the best perceived, and the advantage for the other external letter is reduced. In particular, we will consider the results of [Est76], re-presented in Figure 7.8. The proposed hemisphere-speci c locational gradients, coupled with their in- teraction with the oscillatory cycle explain these patterns. In the LVF/RH, at the feature level, the initial letter is strongly excited, and strongly inhibits letters to the left. This leads to an initial-letter primacy, while the ring of the nal 104 letter is pushed late into the oscillatory cycle, providing little advantage. In the RVF/LH, overall bottom-up excitation is weaker. Therefore, the activation of the initial letter?s features is not boosted to a high level. Furthermore, there is weak left-to-right inhibition, while the acuity/locational gradient is quite shallow. Therefore the activation of the second letter?s features is quite close to that of the rst letter. As a result, at the letter level, the ring of the rst letter is rapidly cut o by the second letter. Each successive letter quickly inhibits the preceding letter (due to the shallow locational gradient), allowing the nal letter to start ring early in the oscillatory cycle. Therefore the nal letter can re longer that the other letters, creating a nal-letter primacy. The proposed activation pat- terns are displayed in Figure 7.9. This explains the perceptibility patterns for locations -8::-5 and 5::8. This account also explains the initial/ nal di erence at a single retinal location (at -5 and 5 in Figure 7.8). In the LVF/RH, the left-to-right inhibition creates a disadvantage for a nal letter. In the RVF/LH, the shallow locational gradient creates a disadvantage for an initial letter because its ring is rapidly inhibited by the second letter. In contrast to asymmetric patterns at the larger eccentricity, the perceptibil- ity function is U-shaped for both -5::-2 and 2::5. Due to higher acuity, bottom-up input is higher overall. In the LVF/RH, this allows the nal letter to start r- ing earlier in the cycle, creating a nal-letter advantage. Along with the usual initial-letter advantage, this gives the U-shaped pattern. In the RVF/LH, the acuity/locational gradient is steeper than for the larger eccentricity, so the dif- ference in input to the rst and second letters is larger, creating an initial-letter advantage and giving an overall U-shape. Next we consider the implications of this account for the e ects of exposure 105 DD GFC F Activation time String Position CCCC C C D D D D F F F G G G * C C D D F F G G G G G G * C G Figure 7.9: Locational gradient and resulting ring pattern for LVF/RH presen- tation (normal font) and RVF/LH presentation (bold italics). Top: Comparison of locational gradient for string CDFG under RVF/LH presentation and LVF/RH presentation. Bottom: Cartoon of resulting ring pattern at the letter level. The point in the oscillatory cycle at which the down phase prevents further ring is marked *. In the LVF/RH, the rst letter res faster and longer than the other letters, because it receives a much higher level of input. The variations in the amount of bottom-up input create decreasing activation across the string. The nal letter starts ring late in the cycle, and is soon cut o by the end of the oscillatory cycle, giving no nal-letter advantage. In the RVF/LH, each letter rapidly cuts o ring of the previous letter, allowing the nal letter to re a long time. As a result, activation is at across the string and rises for the nal letter. These ring patterns account for the perceptibility patterns at the larger eccentricities in Figure 7.8. 106 duration for presentation at large eccentricities. Under the assumption that a longer exposure duration increases the overall level of bottom-up input, the above analysis suggests that the LVF initial-letter primacy and the RVF nal-letter primacy should be di erentially by variations in exposures. In the RVF, we would not expect to see a nal-letter primacy at very brief exposures, because the very low level of input pushes the ring of the nal letter late into the oscillatory cycle. As exposure duration increases, the ring of all the letters is shifted earlier and earlier into the cycle, allowing the nal letter to re longer and longer. In contrast, the activation of a non- nal letter shouldn?t change much, because its ring is still quickly cut o by the subsequent letter. Thus, a nal-letter primacy should emerge as exposure duration increases. However, in the LVF, the initial-letter primacy should be present at very brief durations, because strong left-to-right inhibition at the feature level does not depend on temporality, so it is always present. As exposure duration increases, the initial-letter should be the primary bene ciary, because, at the feature level, the increased bottom-up input to the non-initial letters is canceled by increased lateral inhibition from the rst letter. To summarize, in the RVF, the nal-letter primacy should not be present at very brief exposures. Increasing exposure duration should primarily bene t the nal letter, creating a nal-letter primacy. In the LVF, the initial-letter primacy should be present a very brief exposures. Increasing exposure duration should primarily bene t the initial letter, increasing its primacy. A search of the literature revealed that a relevant experiment had already been performed, in which retinal location and exposure duration were varied in a trigram identi cation task [Leg01]. However, the published data were not presented in a way that would allow evaluation of the above predictions. So I 107 20 40 60 80 100 1 2 3 Percent correct LVF String Position 125, 200 ms 50, 80 ms 20 40 60 80 100 1 2 3 Percent correct RVF String Position 125, 200 ms 50, 80 ms Figure 7.10: Results from Experiment 2 of [Leg01] for the two largest eccentrici- ties, grouped by exposure duration, with 95% con dence intervals. requested the raw data from the authors, who kindly provided it. The data were analyzed for the largest two eccentricities (-12::-10 and -11::-9 versus 9::11 and 10::12) for very brief exposures (50 ms and 80 ms) versus longer exposures (125 ms and 200 ms). This analysis did indeed reveal the predicted patterns, as shown in Figure 7.10. This account also explains the error patterns observed for unilaterally pre- sented vertical strings in a left-to-right language ([Hel95, Hle97], discussed in 4.2.3), under the assumption that the string is rst mentally projected to the hor- izontal and then the locational gradient is formed as usual. The preponderance of the LVF pattern under bilateral presentation may re ect the cross-hemispheric inhibition necessary for locational gradient formation. For right-to-left languages, acuity gradient inversion would occur in the RVF/LH and cross-hemispheric in- 108 hibition would apply from the LH to the RH. This explains the observed reversal of the error patterns for Hebrew [Evi99]. For a vertical language, locational gra- dient formation would not occur along the horizontal axis, so there should be no left/right asymmetry. This explains the observed error patterns for vertical Japanese kana [Hell99]. 7.3 Summary and Discussion We have seen that bigrams allow correct word identi cation, and that the proposed mechanism of bigram activation (i.e., formation of the locational gradi- ent; interaction of the locational gradient with the oscillatory cycle) has allowed a cohesive explanation of aphasic error patterns, priming data, perceptibility pat- terns, and the reaction-time pattern for rotated strings. The central (and most controversial) proposal of the SERIOL model is the temporal encoding of letter order. While the above accounts do not directly prove a serial encoding, these phenomena are otherwise di cult to explain. The fact that the letters farthest from xation are the best perceived is in complete contradiction to their acuity, and a possible underlying mechanism has heretofore never been proposed. However, this counterintuitive pattern arises naturally from the serial-encoding mechanisms (based on the principles of feature-level left-to- right inhibition and word-level right-to-left inhibition). Furthermore, the tempo- ral development of this pattern (as exposure duration is increased) is exactly as predicted. The reaction-time pattern for rotated strings also de ed explanation prior to the SERIOL model. Again, the model allows a natural explanation of this data (based on the number of oscillatory cycles required to represent the string). I note that the model was not designed to explain these phenomena. 109 Rather, these explanations fell out of the already-formulated model. Normals shown very brie y presented strings and aphasics both show a strong positional error pattern, with retention rate falling o with position. However, there is little or no e ect of position when primes possess more than two of the target?s letters. The temporal nature of word activation explains the strong positional e ect in the error patterns. In the presence of noise, small di erences that occur early are exaggerated by lateral inhibition. Therefore, the initial letters have more in uence than the nal letters on the relative levels of word activations, even in the absence of position-speci c activation patterns at the bigram level. In contrast, priming reveals activation of the target word in particular. In this case, the temporal development of competition is not very important; the size of the e ect is dominated by the number of bigrams shared by the prime and target. Thus the serial encoding allows an explanation of the contrast between the error data and the priming data. The model was originally designed to explain aphasic error pattern. However, the explanation of this pattern has evolved, while the model has remained basi- cally the same. Originally, the positional error pattern was explained by varying letter activations (induced by the serial ring) that were passed on to the bi- gram level. However, this error pattern is now explained by directly considering the temporal aspect of bigram- and word-level activations. This required only a minor revision to the model - bigram activations are no longer sensitive to the positions of the constituent letters. Thus, a temporal encoding allows these disparate phenomena to be explained, and provides a answer to the question of how a position-independent letter node could be dynamically bound to a string position. Furthermore, experiments in which two strings were brie y sequentially presented have provided direct evi- 110 dence of a serial read out [Har75, Nic76]. The serial encoding depends on the formation of the locational gradient, which requires hemisphere-speci c processing, leading to hemisphere-speci c activation patterns. Such activation patterns could also potentially explain the visual- eld asymmetries at the lexical level, as discussed in the following two chapters. 111 Chapter 8 Asymmetry of the Length E ect We have seen how the SERIOL model explains VF-speci c letter perceptibility patterns. Word-level VF asymmetries have also been observed. Such asymmetries have generally been taken to re ect hemisphere-speci c modes of lexical access. However, such an assumption con icts with brain imaging evidence for a left- lateralized lexical access [Coh00, Deh04] (and with the SERIOL model, which assumes a single mode of lexical access). Yet, how could asymmetries at the lexical level arise under a single mode of lexical access? This chapter and the next provides an answer to this question. One of the most studied asymmetries involves the e ect of string length. I will concentrate on this asymmetry in this chapter. I rst review the relevant experimental data. I then present the SERIOL account of the asymmetry, and an experiment testing this account. 8.1 Experimental Data It has long been recognized that string length has di ering e ects across the visual elds [Mel57, Bou73]. Ellis and Young performed an extensive series of experiments elucidating this interesting phenomenon. For lexical decision on 112 words of three to six letters, presentation to the RVF yields no length e ect, while presentation to the LVF causes RTs to increase by approximately 20 ms for each additional letter [You85, Ell88]. This pattern is present even if the location of the initial letter is held xed as string length is increased [You85], indicating that the asymmetry is not related to the acuity of the initial letter. For short stimuli, there is a small RVF advantage, and this advantage increases with word length, because RVF RTs do not increase, while LVF RTs do. These results have been taken as evidence for dual modes of lexical access, where the LH uses an e cient, parallel method of access, while the RH uses a less e cient, non-parallel mode of access [You85, Ell88]. However, there are di culties with this proposal. As discussed in section 3.3, imaging evidence indicates that processing becomes left-lateralized at a prelexical level. Thus there can be only one mode of lexical access because lexical access is always routed through the LH. Furthermore, a split representation of the fovea means that the left half a centrally xated word is projected to the RH, and the right half to the LH. Dual modes of lexical access would then lead to the unlikely scenario that each half of a word is accessed by a separate mechanism. Indeed, the processing of xated words is in uenced by the number of letters in the LVF, but not the RVF, as would be expected under a split fovea [Bry96, Lav01a]. This is related to the phenomenon of the Optimal Viewing Position (OVP) [Ore84]. Word recognition is optimal when xation falls between the rst and middle letters of a word. The cost of moving away from this OVP varies with direction. When xation falls at the rst letter, there is a small decrement in performance that varies little with the length of the word (i.e. the number of letters falling in the RVF). However, when xation falls at the nal letter, there is a larger decrement in performance that increases with word length (i.e., 113 the number of letters in the LVF). Brysbaert and colleagues [Bry96] investigated this phenomenon by varying the location of words of di erent lengths. Locations ranged from entirely within the LVF, to various xation points within the word, to entirely within the RVF. There was a smooth pattern of performance as location was systematically changed. This indicates that the OVP and the LVF length e ect arise from the same underlying factors. What factors are involved has been a subject of some debate. One possibility is that the initial letters of a word are more informative, so there is an advantage for xating near them [Ore84]. While it was demonstrated that informational content does a ect the OVP, this cannot be the whole story. The advantage for xating near the end of words in which the nal letters are the most informative is much reduced compared to the advantage to xating near the beginning of words in which the initial letters are the most informative [Ore84]. Therefore, the OVP probably also depends on hemispheric speci city. The more e cient processing in the RVF may arise from more direct access to the dominant hemisphere [Bry94, Bry96]. Alternatively, it could be due to e ects of reading direction. In a left-to-right language, the to-be-processed text occurs in the RVF. Thus the LH may become specialized through perceptual learning to process upcoming text, giving a RVF advantage [Naz03, Naz04a]. One way of evaluating these possibilities is to investigate right-to-left lan- guages. A study of Arabic showed that the OVP falls in the center of the word; it is not shifted to the left as in left-to-right languages [Far96]. Hemi- eld studies of the length e ect in Hebrew have shown a length e ect in both VFs [Lav01b, Lav02c, Naz04b]. Thus right-to-left languages show a di erent pat- tern than left-to-right languages, although not an exact reversal. These results suggest that hemispheric specialization provides a constant RVF/LH advantage, 114 while reading direction provides a RVF/LH advantage for left-to-right languages versus a LVF/RH advantage for right-to-left languages. The sum of these two factors provides a strong RVF/LH advantage for left-to-right languages, and more balanced performance across the VFs for right-to-left languages. Another way of investigating these issues is to vary hemispheric specialization. Brysbaert [Bry94] identi ed a group of subjects who did not display the usual LH dominance for language. LH-dominant and non-LH-dominant readers read words of varying lengths, where xation could fall either on the rst or last letter of the word. For the LH-dominant readers, there was a strong RVF (initial letter) advantage, and a strong LVF ( nal letter) length e ect. For the non-LH- dominant readers, the RVF advantage and the LVF length e ect were reduced. This indicates that the cost of callosal transfer contributes to the RVF advantage. However, reading direction probably also plays a role, as the results did not completely reverse for the non-LH-dominant readers. In summary, in left-to-right languages there is a strong asymmetry in the length e ect. It is likely that reading direction and hemispheric dominance both play a part in this asymmetry. It is unlikely that di ering modes of lexical access contribute to this asymmetry, because we have seen that length e ects are not indicative of serial versus parallel processing [New04], the asymmetry is present for xated words [Ore84, Bry96], and brain imaging indicates that lexical access is left-lateralized independently of presentation location [Coh00, Deh04]. Next I discuss how the SERIOL model explains the asymmetry of the length e ect. 115 8.2 SERIOL Account of the Length E ect Recall that there is an asymmetry of activation patterns across the feature level. For RVF/LH presentation, the locational gradient is smoothly decreasing. For LVF/RH presentation, the second and third letters are strongly inhibited, while letters closer to xation may not be inhibited enough. Thus the loca- tional gradient is initially steeply decreasing, and then attens out. For longer words, the locational gradient may not even be monotonically decreasing. (See bold-faced characters in Figure 8.1.) A smoothly decreasing locational gradient is necessary for the optimal encoding of letter order. The LVF/RH locational gradient becomes more and more non-smooth (non-optimal) as string length in- creases. This increasingly degraded LVF/RH activation pattern then provides an increasingly degraded representation of letter order. A degraded representation of letter order would increase settling time at the lexical level, as activation would be less focused on the target word. Thus an LVF/RH length e ect may emerge because letter-position encoding becomes less and less accurate, and settling time increases more and more. This analysis describes the hypothesized contribution of reading direction to the length e ect. In a right-to-left language, acuity gradient inversion would oc- cur in the opposite hemisphere, and thus there would be a non-optimal gradient for the RVF/LH. What then is the contribution of hemispheric dominance? It may be the case that a non-optimal locational gradient in a left-to-right language is further degraded by callosal transfer, increasing its e ect. For right-to-left lan- guages, the e ect of a non-optimal RVF/LH locational gradient may be reduced. This issue is further discussed in section 8.4. In the following, we will concentrate on left-to-right languages. 116 This above analysis implies that the length e ect should disappear if a smoothly decreasing locational gradient could be created in the LVF. It should be possible to create a smooth gradient via an increase of bottom-up input to the second and third letters (to compensate for lateral inhibition from the rst letter). Increas- ing bottom-up input to those letters should also decrease the activations of the features of the fourth and fth letters, due to increased left-to-right lateral inhibi- tion. Additionally, for words of more than ve letters, a reduction of bottom-up input is probably required at the nal letters in order to compensate for their in- creasing acuity (i.e., to bring their activation levels low enough to make a smooth gradient). (See italic characters in Figure 8.1.) These adjustments could be accomplished under experimental conditions by increasing contrast at the second and third positions, and reducing contrast at the sixth and higher positions. This leads to the prediction that such a manip- ulation should cancel the length e ect in the LVF/RH via facilitation for the longer strings. That is, for four- to six-letter strings, mean RTs to ve- and six-letter strings under this contrast manipulation should be as fast as the mean RT to four-letter strings under normal presentation. Conversely, application of the same pattern in the RVF/LH should create a length e ect due to disruption of a previously smooth locational gradient. We tested these predictions in the following experiment [Whi04c]. 8.3 Length Investigation This experiment was designed by me, but run by my colleague Michal Lavidor at the University of Hull, U.K. I speci ed the overall contrast pattern, while she developed the particular presentation conditions (i.e. background color and letter 117 C A S T + + _ S L ET L A E Figure 8.1: Example of proposed LVF/RH locational gradient for normal presen- tation (bold face) and under contrast manipulation (italics, shifted to the right for clarity) for a six-letter word. Horizontal axis represents retinal location, while vertical axis represents activation level at the feature layer. For normal presen- tation, the locational gradient is not smooth, becoming quite at near xation. Increasing the contrast of the second and third letters raises their activation levels, and decreases the activation levels of the fourth and fth letters due to increased left-to-right inhibition. Decreasing the contrast of the sixth letter decreases its activation level. As a result, the locational gradient is more smoothly decreasing. 118 colors). Participants Twenty-three right-handed, native English speakers served as subjects for a lexical decision experiment (mean age 19.7). Ten were males, and 13 were females. All gave their informed consent to participate in the study. Stimuli Ninety-six English content words and 96 nonwords were used, with equal numbers of 4-, 5-, and 6-letter words (32 of each). These three word sets were matched for written word frequency, orthographic neighborhood size, and image- ability. Ninety-six nonwords were generated from another word pool by changing one letter, such that the nonwords were legal and pronounceable. Nonwords were also made of 4, 5 and 6 letters in equal proportion. All stimuli were presented in 14-point Helvetica lower-case font on a dark gray background of 3 cd/m2. Letters were displayed at three contrast levels: high contrast (c=0.64) white let- ters, medium contrast light-gray letters (high contrast reduced by 40%), and low contrast darker-gray letters (high contrast reduced by 60%). In the control con- dition, letters at all positions were presented in medium contrast. In the adjust condition, for all string lengths, the rst and fourth positions were presented in medium contrast, and the second and third positions in high contrast. For 5- and 6-letter targets, the fth position was presented in medium contrast. For 6-letter targets, the sixth position was presented in low contrast. Thus, relative to the control condition, the second and third letters were brightened, and the sixth letter (if present) was darkened, while the other positions were presented 119 at the same contrast level. Design Each subject was assigned to one of the 2 versions of the experiment. The di erent versions rotated the word sets across the presentation conditions (control and adjust conditions in a Latin square design). Each session began with 70 practice trials to introduce the task. Every target stimulus was presented twice, once in each visual eld, giving 384 trials for each subject. Stimuli were presented in a random order with the restriction that no more than 3 successive words or nonwords, or 3 successive LVF or RVF trials occurred together. The within- subject factors were lexical status (word or nonword), length (4, 5, or 6 letters), visual eld (RVF, LVF), and presentation condition (control or adjust). Procedure Each trial began with + appearing in the center of the screen for 400 ms, which then disappeared when the target string was presented. Targets were brie y presented for 180 ms at a displacement of 2.5 from the xation point to the center of the string. The subject?s task was to decide, as quickly and as accurately as possible, whether the stimulus was a legal English word or a nonword. Participants were informed that central xation was important, and a chinrest together with a head strap were used to ensure stable head position at a distance of 50 cm from screen center. Participants? eye movements were monitored by an infra-red eye tracker, and were recorded for the rst 700 ms of each trial. 120 Results Trials in which gaze did not remain stable on the xation cross were discarded (3% of word trials; 5.1% of nonword trials). RTs of less than 200 ms and more than 1100 ms were also discarded either as anticipatory or excessively lengthy (discarded trials occurred infrequently, less than 3% of the total). Mean RTs and error rates are given in Tables 8.1 and 8.2. Repeated measures ANOVAs on RTs (separated for words and nonwords) revealed that visual eld had a signi cant e ect (F(1,22)=47.3, p<0.00001 for words, ns for nonwords), with RVF words (mean RT = 502 ms) responded to faster than LVF words (mean RT = 545 ms). String length was signi cant (words: F(2,44)=19.8, p<0.001; nonwords: F(2,44)=3.53, p<0.05), with longer latencies to longer strings. The main e ect of presentation condition was not signi cant. Presentation condition and visual eld interacted (F(1,22)=22.7, p<0.001 for words; F(1,22)=8.0, p<0.05 for nonwords). This interaction was analyzed us- ing a simple main e ects analysis. For LVF stimuli, the adjust condition was faster than the control condition (F(1,22)=6.76, p<0.05); For RVF stimuli, the opposite pattern was found (F(1,22)=5.33, p<0.05). No interaction was found for presentation condition and length, nor for visual eld and length. The interaction between presentation condition, visual eld, and word length was signi cant for word stimuli (F(2,44)=16.84, p<0.001; ns for nonwords). The triple interaction was analyzed using a simple main e ects analysis. For LVF words, a length ef- fect occurred only under the control condition (F(2,44)=7.91, p<0.01). For RVF words, a length e ect occurred only under the adjust condition (F(2,44)=8.14, p<0.01). This pattern is clearly shown in Figure 8.2. The pattern for nonwords was similar, but the three-way interaction did not 121 LVF con. LVF Adj. RVF con. RVF adj. Four Mean RT 527 536 487 474 S.D. 71 69 68 66 % error 15 18 12 7 Five Mean RT 563 518 477 536 S.D. 70 67 70 72 % error 13 10 12 7 Six Mean RT 594 535 490 548 S.D. 80 71 77 71 % error 14 14 10 11 Table 8.1: Results for word targets. 122 LVF Con. LVF Adj. RVF Con. RVF Adj. Four Mean RT 561 560 573 587 S.D. 88 82 82 89 % error 11 11 11 21 Five Mean RT 613 582 572 617 S.D. 100 87 83 79 % error 13 11 10 14 Six Mean RT 653 561 596 630 S.D. 88 93 84 89 % error 9 10 18 14 Table 8.2: Results for non-word targets. 123 460 480 500 520 540 560 580 600 4 5 6 Reaction Time (ms) LVF Word Length Control Adjust 460 480 500 520 540 560 580 600 4 5 6 Reaction Time (ms) RVF Word Length Control Adjust Figure 8.2: Results for word targets. reach signi cance. Average error rate was 12%, and no signi cant e ects of visual eld, length, or presentation condition were found. 8.4 Discussion As predicted, the LVF/RH length e ect was eliminated under the adjust con- dition. It cannot be argued that the e ect was still present, though masked. Five- and six-letter words under the adjust condition were processed as quickly as four-letter words under the control condition, demonstrating that the length e ect was completely neutralized. This conclusively demonstrates that a length e ect is not inherent feature of RH processing, for if it were, it would not be possible to eliminate it via a visual manipulation. Therefore, the LVF length e ect does not arise from an RH-speci c mode of lexical access, disproving the 124 dual-modes theory [Ell88]. Since we were able to abolish the length e ect via an activation-pattern cor- rection, this indicates that the LVF activation pattern is a contributing factor to the length e ect. The appropriate contrast manipulations to neutralize the length e ect were precisely predicted from the theory of locational-gradient formation, providing strong support for this aspect of the SERIOL model. We suggest that locational gradient formation provides a mechanistic account of the perceptual learning espoused by Nazir [Naz03, Naz04a]. Also in line with our predictions, a length e ect was created in the RVF/LH. While it may not be surprising that increased RTs were associated with the degradation of the sixth letter in the RVF adjust condition (since it was far from xation), we note that most of this increase was present for ve-letter strings. For these strings, the only change from the control condition was positional con- trast enhancement at the second and third letters. Yet, this enhancement was inhibitory in the RVF. It is unlikely that the inhibition arose solely because this enhancement reduced the visibility of nearby letters, because this manipulation had no e ect on error rates or on RT to four-letter words, although the possibil- ity that the low-acuity fourth letter was a ected only when it was not the last letter cannot be ruled out. Nevertheless, the RVF adjust-condition results are consistent with our predictions. The adjust condition had no e ect on four-letter words, relative to their respective control conditions. However, it might be ex- pected that RVF RT should increase due to a degraded locational gradient, and LVF RT should decrease due to an improved locational gradient. So why did the contrast manipulation have no e ect on four-letter words? It may be the case that settling time is relatively insensitive to small di erences in activation patterns for shorter words, due to the large number of competitors. 125 Further investigations into the length e ect will involve languages read from right to left, such as Hebrew. For such languages, the locational gradient should decrease from right to left. Thus, the consistency of the acuity gradient with re- spect to the locational gradient is reversed. That is, the acuity gradient matches the locational gradient in the LVF/RH, not the RVF/LH. This suggests that the length e ect should reverse. However, experimental studies have given con icting results. One has shown a length e ect for both visual elds [Lav01b]. One has shown the predicted reversal [Naz04b], while another has shown the same pattern as left-to-right languages [Lav02c]. Overall, these results suggest that the robust asymmetry observed for left-to-right languages is not present for Hebrew, where a length e ect seems to occur in both visual elds. Based on these ndings, I proposed that callosal transfer to the dominant hemisphere also contributes to the length e ect by preferentially degrading more lowly activated letter features [Whi04a]. In the case of a left-to-right language, this further reduces the feature- level activations of the second and third letters. In the case of a right-to-left language, this reduces feature-level activations of the nal letters, thereby de- laying their ring at the letter layer, creating a length e ect. Thus, it should be possible to cancel the Hebrew LVF/RH length e ect by using a di erent ex- perimental manipulation than in a left-to-right language - namely, by increasing bottom-up input in proportion to distance from xation. In contrast, the same type of manipulation as in English should cancel the Hebrew RVF/LH length e ect. 126 Chapter 9 Asymmetry of the N e ect The e ect of another lexical property, orthographic neighborhood size (N), also interacts with VF. N is the number words which can be formed by replacing one letter of the target word [Col77]. For example, CARE has a large neighbor- hood - BARE, DARE, CORE, CURE, CAME, CAGE, CART, CARD, etc. First I present experimental data on the N e ect. Then I discuss the SERIOL account of the N e ect, and present two experiments testing this account. 9.1 The N e ect Under central presentation in a lexical-decision task, low-frequency words with large neighborhoods are responded to more quickly than those with small neigh- borhoods [And89, And97]. It is surprising that the N e ect manifests as facil- itation, because lateral inhibition within the word level is commonly assumed. Therefore, increased similarity to other words should increase inhibition to the target and slow down RTs, rather than speed them. Thus, there must be some facilitatory e ect that arises in spite of lateral inhibition. Several explanations for the locus of this unexpected facilitation have been proposed. It could arise at the letter level, as in the Interactive Activation model 127 [McC81]. In this scenario, excitation from the word level feeds back to the letter level, and then forward to the word level. Thus increased similarity to words increases the amount excitatory feedback to the letter level, which then allows the target word to reach response criterion more quickly. Alternatively, the facil- itation could arise solely within the word level. In their multiple read-out model [Gra96], Grainger and Jacobs have proposed that increased activation across the word-level speeds a task that does not require the unique identi cation of a single word, such as lexical decision. Another possible locus is the phonological level, either through general feed- back to the target word or speci cally via word bodies [Zie98]. The word-body hypothesis was tested in a series of lexical-decision experiments [Zie98]. In one set of words, N was held constant while the number of words matching the target?s body (body neighbors, denoted BN) was varied. (A body neighbor does not have to be of the same length as the target.) In another set of words, BN was held constant while N was varied. In the BN manipulation, high BN was facilitatory (as compared to low BN). However, in the N manipulation, high N had no e ect. Thus facilitation depended on a large number of body neighbors, not N-metric neighbors. Since BN and N are usually highly correlated, these results suggest that the standard N e ect arises from body neighbors. The same manipulations were also performed for non-words. For such targets, high N has an inhibitory e ect, as increased similarity to real words makes it more di cult to reject a target. However, the BN manipulation did not produce an inhibitory e ect for high BN targets. In contrast, the N manipulation did produce the standard inhibitory N e ect. These results cast doubt on the phonological interpretation of the facilitatory e ect of BN on word targets, because increased phonological similarity of non-words to real words (for high-BN targets) should 128 have slowed RTs. Investigation into the N e ect has recently been extended to lateralized pre- sentation. These experiments demonstrated that the N e ect is present for LVF, but not RVF, presentation [Lav02a, Lav02b]. Thus, for the N e ect, central presentation patterns with the LVF, not the RVF. Therefore, it cannot be the case that the LVF/RVF di erence occurs simply because LVF stimuli are less e ciently processed than RVF stimuli, because the N e ect occurs for central presentation, where stimuli are the most e ciently processed. (This pattern has been shown within a single set of stimuli [Lav02b].) 9.2 The SERIOL Account of the N e ect This asymmetry makes it unlikely that the facilitatory N e ect is due to phonological in uences or to total word-level activation, as it is unclear why those factors would vary with visual eld. A more likely candidate is word-to- letter feedback, as we have already discussed that letter-level activation patterns vary with VF. Although the SERIOL model focuses on the bottom-up processing stream, I do not mean to rule out top-down activation from the word level back to lower levels. The oscillatory cycle driving the letter level is taken to fall in the theta band (5 - 8 Hz) [Lis95, Kli96, Kli01]. Thus, an individual cycle would take 125 to 200 ms, allowing more than one cycle to occur during lexical decision. Input to the letter level is necessarily bottom-up during the rst oscillatory cycle. On subsequent cycles, input to letter nodes could arise from both bottom-up and top-down sources. It is assumed that top-down input from the word to the letter level is also in the form of a gradient, where the rst letter receives the most input, the second letter the next most, etc. Such a gradient would be instrumental in 129 serial output of letters when spelling. I propose that the hemispheric asymmetry of the N e ect arises from the formation of the locational gradient, coupled with the processing which converts the locational gradient into a serial ring pattern. Due to these dynamics, top- down input to the letter level (from high N) has a facilitatory e ect for LVF/RH presentation, but not for RVF/LH presentation. First I focus on the dynamics of the conversion of the spatial gradient to serial ring at the letter level. The point at which a non-initial letter node can start to re is limited both by lateral inhibition from the prior letter, and by its own level of excitatory input. When the ring rate of the currently active letter node exceeds a certain level, no other letter node can re, due to the constant lateral inhibition. At some point, the current letter?s ring rate and the resulting lateral inhibition will decrease to a level which would allow the next letter to re. If the next letter currently receives enough excitatory input to cross threshold at this point, it can re. In this case, lateral inhibition from the active letter was the limiting factor on when the next letter could start to re. However, if the next letter does not receive enough excitatory input to re immediately, its activation is delayed until its excitability increases enough (via the oscillatory cycle) for it to cross ring threshold. In this case, the limiting factor was the amount of excitatory input. In the following, I will focus on four-letter stimuli, as most N experiments are performed on stimuli of that length. Recall that in the feature level of the LVF/RH, the second and third letters receive strong lateral inhibition, whereas the second and third letters in the RVF/LH do not. For central presentation, the second letter receives strong inhibition in the LVF/RH, and the third letter receives strong cross-hemispheric inhibition. Based on this di erence in activation 130 patterns and the above dynamics, I propose that the lower level of bottom-up input (to the letter level) to the second and third letters is the primary locus of the N e ect, as follows. For LVF and central presentation, the activations of the second and third letters are limited by their level of excitatory input. Therefore, a slight increase in excitation (due to feedback from the word level from high N) allows those letter nodes to cross threshold and re sooner. In contrast, for RVF presentation, those letter nodes receive a relatively higher level of bottom-up input. Their ring is limited by lateral inhibition, rather than excitatory input. Thus, the second and third letter nodes already re as early as possible, and a slight increase in excitatory input has no e ect. So top-down excitation allows the internal letter nodes to re earlier for LVF/RH and central presentation, but not RVF/LH presentation. When the second and third letter nodes re earlier, the corresponding bigrams are activated earlier. This then allows activation to begin to be focused on the target word node earlier, reducing lateral inhibition from other word nodes. For example, consider the stimulus bore. When *B res, two-letter words starting with B are the most highly activated (due to the higher connection weights for shorter words). These word nodes inhibit less highly activated word nodes (in- cluding the target BORE). When BO res, three-letter words starting with BO are the most highly activated, and inhibit the other word nodes. When OR and BR re, four-letter word nodes starting with BOR are the most highly activated. Finally, the target BORE is no longer inhibited by other more highly activated word nodes (although if BOR were itself a word, there would still be a more active word node). Thus the sooner that the activation becomes focused on the target, the less lateral inhibition there is from other word nodes. This decreased lateral inhibition over the course of the oscillatory cycle will allow the target word node 131 to reach response criterion sooner, decreasing RT. 1 Thus top-down excitation from high N decreases RTs for LVF and central presentation. For RVF presenta- tion, the second and third letter nodes already re as early as possible, so there is no N e ect. For LVF presentation, another factor may also be at work. Recall that the activation level of non-contiguous bigrams is determined by the amount of time between the ring of the rst constituent letter and the ring of the second constituent letter. This time lag is determined by the relative levels of feature- layer inputs to those letter nodes, which are determined by the locational gradient. Bigram-to-word connection weights are based on the bigram activation pattern resulting from a smoothly decreasing locational gradient. When the locational gradient is not smoothly decreasing (as in the LVF/RH), a somewhat di erent bigram activation pattern results. Thus, there is a mismatch between the bigram activation vector and the learned weight vector, making activation less focused on the target word. Top-down input from high N may compensate for the lack of smoothness of the locational gradient, bringing the bigram activation vector nearer the learned weight vector. This could also contribute to the N e ect for LVF presentation. 1This account is revised from the original account, which focused on increased activation levels for the second and third letters, which were passed onto the bigram and word levels. Given the new assumption that bigram activations do not re ect letter activations, this account has been modi ed to focus on timing of ring. However, the underlying idea remains the same. There is an N e ect for central and LVF/RH presentation because the locational gradient is steeper than for RVF/LH presentation. 132 9.3 Predictions In this experiment, we concentrated on the asymmetry of the N e ect under lateralized presentation. Because the proposal is that di erences in bottom-up activation patterns underlie this asymmetry, changes to these activation patterns should modulate the N e ect. If the LVF/RH activation pattern could be cre- ated in the RVF/LH, the N e ect should appear in the RVF/LH. Conversely, if the RVF/LH activation pattern could be created in the LVF/RH, the N e ect should disappear in the LVF/RH. It should be possible to adjust activation pat- terns by manipulating contrast levels at speci c string positions. The RVF/LH?s feature-level activation pattern could be replicated in the LVF/RH by slightly dimming the external letters. Dimming the rst letter should decrease lateral in- hibition from that letter, mimicking the weaker left-to-right inhibition in the LH. Dimming the nal letter should compensate for the increasing acuity, creating a more smoothly decreasing gradient. As a result, the locational gradient should be smoother and shallower, mimicking the usual activation pattern in the RVF/LH. (See Figure 9.1.) This should negate the N e ect. Conversely, the LVF/RH?s activation pattern could be mimicked in the RVF/LH by slightly dimming the internal letters. This should induce the N e ect in the RVF. To test these predictions, we performed a lateralized lexical-decision experi- ment of low-N versus high-N words, with two di erent patterns of dimmed input, in addition to the control (undimmed) condition [Whi04b]. All stimuli were four- letter words. In the inner-dimmed condition, the contrast of the second and third letters was reduced. In the outer-dimmed condition, the contrast of the rst and fourth letters was reduced. The analysis of the N e ect allows precise predictions concerning the expected e ects of these manipulations. 133 BB A RR DD A Figure 9.1: Outer dimming in the LVF/RH. The normal locational gradient is shown in bold-face. The results of outer dimming are shown in italics (shifted to the right for clarity.) Reducing the contrast of the rst letter reduces its activation level, and decreases inhibition to the second and third letters, increasing their activation levels. As a result, the locational gradient is shallower across the rst three letters. Reducing the contrast of the fourth letter reduces its activation level. As a result, the locational gradient is smoother across the last three letters. 134 Let R be the RT for the control / RVF / low-N condition, L be the additional time cost of presentation to the non-dominant hemisphere, and Z be the cost of low input to the second and third letters. Thus, the expected RTs for the other control conditions are: control/RVF/high-N= R control/LVF/high-N = R + L control/LVF/low-N = R+L+Z First outer dimming is considered. There is little direct cost for reducing input to the external letters, because their activations remain relatively high. Therefore, in the RVF, outer dimming should have little e ect, giving: outer/RVF/high-N= R outer/RVF/low-N = R In the LVF, outer dimming should compensate for the normal cost of low input to the second and third letters. Therefore: outer/LVF/high-N= R + L outer/LVF/low-N = R + L Note the counterintuitive prediction that such stimulus degradation should pro- duce facilitation for low-N (relative to the undimmed control). As a result, there should be no N e ect. Next inner dimming is considered. In the RVF, this should induce a cost at the internal letters for low-N. However, top-down activation from high-N should compensate for this decreased bottom-up input. Thus: inner/RVF/high-N= R inner/RVF/low-N = R + L Therefore, an N e ect should be created. In the LVF, inner dimming should not change the overall bottom-up activation pattern, although it could potentially increase the cost of low activation at the internal letters. We would expect the size of the N e ect to stay the same or get larger. 135 L-low L-high R-low R-high Reaction Time Control Inner Dimmed Outer Dimmed Figure 9.2: Predicted pattern for Experiment 2. In summary, the predictions are that outer dimming should decrease RTs for the LVF / low-N condition (giving no N e ect), and that inner dimming should increase RTs for the RVF / high-N condition (giving an N e ect). Inner dimming might also increase RTs for the LVF conditions. Other manipulations should have little e ect. See Figure 9.2 for a graphical presentation of these predictions, under the simplest assumptions - that inner dimming incurs no additional cost in the LVF, and that L and Z are of the same magnitude. The latter assumption is consistent with the results of [Lav02a], in which both were on the order of 30 ms. 136 9.4 N-e ect Investigation 1 This experiment was designed by me, but run by my colleague Michal Lavidor at the University of Hull, U.K. I speci ed the overall contrast patterns, while she developed the particular presentation conditions (i.e. background color and letter colors). Participants Nineteen native English speakers participated in the experiment. All had normal or corrected-to-normal vision and were aged 18-26 (mean age 19.4, s.d. 1.6). Each participant received either a course credit or $2. All participants were right-handed and scored at least 80 on the Edinburgh test. Nine were males, 10 females. Design and materials Stimuli. The word stimuli were 78 4-letter, English content words (nouns and verbs). Half of the words had fewer than 10 orthographic neighbors (mean no. of neighbors 6.2). These words formed the low-N group. The remaining words all had more than 12 neighbors (mean 17.0). These formed the high-N group. The low-N and high-N groups were matched on written frequency, imageability, and age of acquisition. Each group was divided into 3 sets, to allow rotation through the 3 di erent presentation conditions (control, inner-dimmed, or outer-dimmed). These 6 sets were also matched for written word frequency, imageability, and age of acquisition. The stimuli are given in Table 9.1. Since the model we tested focuses on words, the non-words were created such that they would amplify the N e ect for words (based on [Sik02]). The non- 137 Low 1 Low 2 Low 3 High 1 High 2 High 3 beau arch babe bush beam bite cube aunt coal cage bolt boot earl chop crab cone deer cake germ disc gasp dent duck cart gulf duel grip dusk dump dock heap fork jerk hank gang hail howl lamb lens herd gore hint newt menu liar hind hose hush oath omen oven hump lime joke palm plug raid mall maze leak shed prey riot mule pump mist soap roar sand nail rent port swim suds sigh rust rope rake Table 9.1: Stimuli for N-e ect investigations. 138 words were generated from a di erent pool of 4-letter words by altering one or two letters, usually replacing the vowels with consonants (however bigrams were always orthographically legal). There was no special e ort to match N size of the non-words as they served mainly as the context for the words; however to keep chance performance at 50% level we presented the non-words at the same illumination conditions as the real words. All stimuli were presented in 14-point Helvetica lower-case font, appearing as high contrast (c=0.72) white letters on a gray background of 4 cd/m2. In the inner-dimmed condition, light-gray patches were projected on the 2nd and 3rd letters of the presented target, so the contrast between the letter and the background color was decreased by 33%; thus these letters were dimmer than the rest of the word. Similarly, two light-gray patches dimmed the 1st and 4th letters in the outer-dimmed condition. In the control condition, no letters were dimmed. The stimuli were presented for 180 ms, at a displacement of 2:5 from the xation point to the center of the stimulus. The displacement was to the left or to the right of a central focus point (LVF and RVF, respectively). Design. Each subject was assigned to one of the 3 versions of the experiment. The di erent versions rotated the word sets across the experimental conditions (high- and low-N words in control, inner-dimmed, and outer-dimmed conditions). Each target stimuli was presented once to each visual eld. The within-subject factors for words were N size (high, low), visual eld (RVF, LVF) and presentation condition (control, inner-dimmed or outer-dimmed). Each combination of the within-subject variables was repeated 13 times. 139 Procedure Stimulus presentation was controlled by an IBM Pentium computer on 17" SVGA display. The participants sat at a viewing distance of 50 cm, with the head positioned in a chin rest. The experiment was designed using Super-Lab version 2. Each session began with 10 practice trials to introduce the task, followed by 24 additional practice trials of centrally presented letter strings, where the task was to perform lexical decision. Thirty-six additional practice trials presented words and non-words either to the left or to the right of the xation point. Each trial began with + appearing in the center of the screen for 400 ms. For the rst trial, the + remained for 2000 ms, and disappeared when the target word was presented. The + would again reappear to allow projection of the next target word. Targets were brie y presented for 180 ms (either a word or a non-word), to the left or to the right of the focus point. The participant?s task was to decide, as quickly and as accurately as possible, whether the stimulus was a legal English word or a non-word. Participants responded by pressing one of two available response keys, labeled ?word? and ?non-word? on a standard ?QWERTY? keyboard. For half of the participants, the response ?word? was made by pressing the ?N? key, and ?non-word? by pressing the ?V? key. For the other half, the response keys were reversed. The participants were randomly assigned to one of the two response options. Results Since the main manipulation of orthographic neighborhood was designed for the word stimuli, the repeated measures analysis with N (high, low), visual eld 140 L-low L-high R-low R-high control mean RT 620 595 569 566 S.D. 72 70 67 66 % error 19 15 18 18 inner-dim mean RT 611 590 590 569 S.D. 72 69 73 69 % error 20 17 18 15 outer-dim mean RT 592 598 555 558 S.D. 70 84 80 75 % error 14 20 11 15 Table 9.2: Results for N-e ect investigation 1. (right, left) and presentation condition (control, inner-dimmed or outer-dimmed) as the within-subjects variables were conducted only for words. RTs of less than 150ms and more than 1400ms were discarded either as anticipatory or excessively lengthy (discarded trials occurred infrequently, less than 2% of the total). Mean RTs for correct responses are summarized in Table 9.2, and presented graphically in Figure 9.3. Reaction times. Visual eld had a signi cant e ect [F1(1,18)=7.2, p<0.05; F2(1,24)=6.4, p<0.05], with RVF words (mean RT = 567 ms) responded to faster than LVF words (mean RT = 601 ms). Presentation type and neighborhood size interacted [F1(2,36)=4.18, p<0.05; F2 not signi cant]. We examined the simple e ects of N for each visual condition separately and found that the N e ect was 141 540 560 580 600 620 640 L-low L-high R-low R-high Reaction Time (ms) Control Inner Dimmed Outer Dimmed Figure 9.3: Results for N-e ect investigation 1. signi cant both in the control condition [F(1,18)=5.9, p<0.05] and the inner- dimmed condition [F(1,18)=8.2, p<0.05], but not the outer-dimmed condition. The interaction between presentation type, visual eld, and orthographic neighborhood size was also signi cant [F1(2,36)=6.3, p<0.01; F2(2,48)=6.0, p<0.01]. Post hoc Bonferroni (p<0.05) comparisons yielded that for LVF words, the N ef- fect occurred under both the control and inner-dimmed conditions, but not the outer-dimmed condition. For RVF words, the N e ect emerged only under the inner-dimmed condition. Error rates. Average error rate was 16%, and the patterns were similar to the RT data. However, no signi cant e ects of visual eld, N size, or presentation condition were found (see mean error rates in Table 9.2). 142 Discussion The hemispheric speci city of the N e ect was replicated for the control con- ditions, with faster RTs to high-N than low-N words in the LVF/RH, but not the RVF/LH. In the LVF/RH, dimming the outer letters negated the N-e ect, via facilitation (relative to the control condition) for low-N, but not high-N. In the RVF/LH, outer dimming had no e ect. In the RVF/LH, dimming the inner let- ters created the N e ect via inhibition for low-N, but not high-N. In the LVF/RH, inner dimming had no e ect. A comparison of Figures 9.2 and 9.3 shows that the experimental results closely match the predicted pattern. 9.5 Further Predictions The previous experiment showed the predicted patterns for lateralized pre- sentation. It should be possible to also negate the N e ect for central (CVF) presentation via a contrast manipulation. However, a di erent manipulation may be required, due to the di ering shapes of the locational gradient in the CVF and the LVF. In the LVF, input to the fourth letter may too high with respect to the third letter (due to incomplete inversion of the acuity gradient). However, this would not be the case for the CVF, where the locational gradient across the third and fourth letters is determined by the steeply decreasing RVF/LH acuity gradient. In the CVF, input to the fourth letter would not be too high, and dimming that letter may not be bene cial. Therefore, we initially ran a pilot study to determine what manipulation would negate the CVF N e ect. This study indicated that dimming both outer letters did not remove the e ect, while dimming only the rst letter did. This is consistent with the proposal that input to the fourth letter is relatively too high for the LVF, but not the CVF. 143 In the following experiment, we sought to negate the N e ect for both LVF and CVF presentation within a single study. In the dimmed condition, the outer two letters were dimmed for LVF and RVF presentation, while only the rst letter was dimmed for CVF presentation. The respective control conditions re- mained the same as in the previous experiment. For the dimmed condition, we expected to negate the CVF N e ect (by facilitating responses to low-N words), and to replicate the results from the outer-dimmed conditions in the previous experiment. 9.6 N-e ect Investigation 2 This experiment was designed by me, but run by my colleague Michal Lavidor at the University of Hull, U.K. It used the particular presentation conditions (i.e. background color and letter colors) that she developed for the previous experiment. Participants Twenty- ve native English speakers participated in the experiment. All had normal or corrected-to-normal vision and were aged 18-28 (mean age 19.6, s.d. 1.9). Each participant received either a course credit or $2. All participants were right-handed and scored at least 80 on the Edinburgh test. Eleven were males, 14 females. Design and Materials Stimuli. The same stimuli as in the previous experiment were used (see Table 9.1). 144 Design. Each subject was assigned to one of the 3 versions of the experiment. The di erent versions rotated the word sets across the experimental conditions. The within-subject factors for words were N size (high, low), visual eld (RVF, LVF or center) and presentation condition (control or dimmed). Each combina- tion of the within-subject variables was repeated 13 times. The dimmed condition included dimming of the two external letters for the RVF and LVF presentations, and dimming of the rst letter for the centrally-presented stimuli. Procedure. The procedure was similar to the procedure of the previous exper- iment. Results Since the main manipulation of orthographic neighborhood was designed for the word stimuli, the repeated measures analysis with N (high, low), visual eld (right, left, center) and presentation condition (control or outer-dimmed) as the within-subjects variables were conducted only for words. The results of one par- ticipant were not included in the analysis due to low-accuracy performance (below chance level). RTs of less than 150 ms and more than 1400 ms were discarded ei- ther as anticipatory or excessively lengthy (discarded trials occurred infrequently, about 2% of the total). Mean RTs for correct responses are presented in Table 9.3and Figure 9.4. Reaction times. Visual eld had a signi cant e ect [F1(2,46)=10.3, p<0.01; F2(2,24)=8.1, p<0.01. Centrally presented words (mean RT = 478 ms) yielded fastest responses, followed by RVF words (mean RT = 536 ms), then LVF words (mean RT = 567 ms). Post-hoc di erences were analyzed employing Bonferroni comparisons (p<0.05). The interaction between presentation type, visual eld, and orthographic 145 L-few L-many C-few C-many R-few R-many control mean RT 582 560 495 472 537 540 S.D. 59 65 57 60 71 66 % error 16 12 10 8 13 11 dimmed mean RT 558 570 471 479 536 533 S.D. 63 52 58 55 59 60 % error 14 15 9 10 13 11 Table 9.3: Results for N-e ect investigation 2. In the dimmed condition, the outer two letters were dimmed for RVF and LVF presentation, while only the rst letter was dimmed for CVF presentation. 460 480 500 520 540 560 580 L-low L-high C-low C-high R-low R-high Reaction Time (ms) Control Dimmed Figure 9.4: Results for N-e ect investigation 2. 146 neighborhood size was also signi cant [F1(2,46)=5.8, p<0.01; F2(2,48)=4.9, p<0.05]. Post hoc Bonferroni (p<0.05) comparisons yielded that for LVF and CVF words, the N e ect occurred for the control condition, but not the dimmed condition. For RVF words, there was no N e ect in either condition. Error rates. Average error rate was 11%, and the patterns were similar to the RT data. However, no signi cant e ects of visual eld, N size, or presentation condition were found. Discussion In the control condition, N e ects for the CVF and LVF, but not the RVF, were replicated. The dimmed condition for the LVF and RVF (wherein the outer letters were adjusted) replicated the results from the previous experiment - the LVF N e ect was negated via facilitation for low-N, while dimming had no e ect in the RVF. Crucially, the CVF dimmed condition (wherein only the rst letter was adjusted) negated the N e ect, via facilitation for low-N, but not high-N. Thus the predicted results were achieved. 9.7 Implications Experiments 2 and 3 showed that it is possible to create or negate the N e ect by altering bottom-up activation patterns via contrast manipulations, as predicted by the SERIOL account. Note that a simpler explanation of these results does not su ce. It cannot be the case that dimming the outer letters was facilitatory for LVF / low-N words simply because the internal letters were unmasked at a very low level. In that case, there should have been a similar e ect in the RVF, yet none was found. It could not be the case that such RVF 147 facilitation did not occur simply because the stimuli were less degraded than in the LVF, since we demonstrated a facilitation in the least degraded location, the CVF. Moreover, the creation of an RVF N-e ect by dimming the internal letters indicates that the reason that such an e ect does not usually occur is that those letters are usually more highly activated. This places the locus of the VF x N-e ect interaction squarely at the level of hemisphere-speci c, orthographic activation patterns. The SERIOL model explains the source and nature of these patterns. Locus of the N e ect in Lexical Decision The fact that manipulations of contrast modulated the N e ect indicates that its primary locus is the letter level. Other accounts of the N-e ect based on word-level activations [Gra96] or phonological representations [Zie98] cannot explain the demonstrated e ects of manipulating the visual properties of letters. Andrews [And97] noted that the N e ect appears less strong in French and Spanish. The conclusion that feedback excitation to the letter level is the primary source of the N e ect can potentially account for such a linguistic di erence. Under the assumption that the reading lexicon also provides the spelling lexicon [Bur02], spelling could be represented by connections from a word node back to the letter nodes. In languages with shallower orthographies than English, such as Spanish and French, it is less necessary to encode spelling via word-to-letter connections, since spelling is predictable from phonology. Therefore, word-to- letter connections may be weaker in such languages. These weaker top-down connections would then account for the reduced in uence of N in these languages. However, others have argued against such a letter-level locus [Bro93, Rey04], based on the absence of an interaction between stimulus quality and word fre- 148 quency in lexical decision [Sta75, Bro93, Bal95]. That is, when letter contrast is uniformly low, the cost of this degradation does not vary with the frequency of the target word. If there were feedback from the word level to the letter level, this should cause an interaction between stimulus quality and a lexical attribute, such as frequency. The lack of such an interaction has been taken as indicat- ing that processing is staged, rather than interactive. That is, computations are completed at the letter level before being passed on to the word level, as opposed to a continuous interaction between levels. However, this nding is not inconsistent with the model, or experimental results. Note that the SERIOL model is not a fully interactive; letter activations only occur at speci c time intervals. Although I have not fully speci ed all the timing relationships between levels, the implicit assumption is that there is gating between the feature and letter levels. The induction of the correct ring order at the letter level depends on the proper activation pattern at the feature level. Thus, the feature level must settle into this pattern before it activates the letter level. If the letter nodes were activated while the feature level were still settling, the wrong ring pattern would result. Moreover, feature-level input must be passed to the letter level at the start of an oscillatory cycle. Therefore there has to be some co-ordination between the feature and letter levels, so that feature level activation a ects the letter level at the right time. Thus we assume a staged activation. So, the e ects of uniformly low stimulus quality may be resolved before the feature level is allowed to activate the letter level, consistent with the lack of interaction between overall stimulus quality and frequency. However, this does not rule out the possibility of feedback from the word level a ecting the letter level at a later point in processing. For example, such feedback might occur during the down-phase of the oscillatory cycle. Under this 149 scenario, word-level activation would not a ect the letter level until the end of the oscillatory cycle. This feedback would then have an e ect on letter activations during the next oscillatory cycle. Such feedback would not interact with overall e ects of stimulus quality, which have been resolved prior to activation of the letter level. However, this feedback would interact with the resulting activation pattern passed forward from the feature level. That is, overall low stimulus quality may have a large inhibitory e ect the rst time that the feature level activates the letter level, and this e ect may dominate as compared to any later top-down e ects. We have demonstrated an interaction between the N e ect and positional ma- nipulations of letter contrast. Thus we have demonstrated an interaction between a lexical attribute and stimulus quality, indicating that feedback from the word to letter level does occur, and is the primary source of the N e ect in lexical decision. Orthographic Similarity The proposal that the internal letters are the primary source of the N e ect implies that the position of di erence between a target and its neighbor should matter. A neighbor should be most facilitatory when it matches on the internal letters. This explains the nding that the usual N e ect comes from body neigh- bors [Zie98]. A word node corresponding to body neighbor would not become highly activated, because it likely would not match on the important rst letter. Thus, I propose that the N e ect occurs as a result of top-down input to letter nodes via the summed excitation of a large number of moderately active word nodes. This also explains why non-word targets were not a ected by the body- neighbor manipulation [Zie98]. Increasing the number of body neighbors does not 150 increase the number of highly activated word nodes. Under the assumption that only highly active word nodes slow RTs to non-words, body neighbors would not a ect RTs to non-words. In contrast, increasing the number of N-metric neighbors makes a highly activated word node more likely. The proposal that facilitation results from a moderately active non-target word nodes leaves open the possibility of an inhibitory e ect for a highly acti- vated non-target word node, as would be expected from lateral inhibition within the word level. This proposal explains observed in uences of a single higher frequency neighbor. For ve-letter French targets, the existence of a higher fre- quency neighbor mismatching at the fourth letter had an inhibitory e ect in lexical decision, while the existence of one mismatching at the second letter did not [Gra89]. Perea investigated this phenomenon in English using a perceptual identi cation task for a brie y presented target (67 ms) which was followed by a mask [Per98]. The target was preceded by a 500-ms prime that was a higher fre- quency neighbor. When the prime mismatched the target on the third or fourth letters, there was an inhibitory e ect on target identi cation (compared to an unrelated prime). In contrast, a prime mismatching on the rst, second, or fth letter had no e ect. The lack of e ect for a mismatch at an external letter ( rst or fth letter) [Per98] is explained by the edge bigrams. If an edge bigram is not matched, the neighbor does not become highly activated and does not have an inhibitory e ect on the target. The e ect of internal-letter position [Gra89, Per98] is explained by the sequential activation of bigrams. If a neighbor mismatches on the second letter, it is inhibited early and cannot accrue a high enough activation level to interfere with the target. However, a mismatch occurring later (at the third or fourth letter) has less of an e ect, leading to high activation of the neighbor, and 151 an inhibitory e ect on the target. A non-target formed by transposing two letters of the target, such as SALT and SLAT, is also highly activated under the bigram metric, because most of the bigrams are shared [Gra04b]. This accounts for the nding that having such a transposed-letter neighbor can be inhibitory [And96]. In summary, I propose that the facilitatory N e ect occurs via moderately active word nodes. Such non-target nodes do not strongly inhibit the target, while their summed top-down input to the letter level provides facilitation. Increased RTs occur when a single neighbor is highly activated, strongly inhibiting the target. Locus of Visual Field Asymmetries The fact that the normal visual- eld x N-e ect interaction was overridden by our manipulations demonstrates that it cannot be a result inherent hemispheric di erences at the level of lexical access, because if it were, it would not a ected by such manipulations. Rather, an asymmetric word-level e ect can result from di erences in processing near the visual level. This casts doubt on the widely held assumption that hemi eld e ects re ect di erential processing at the lexical level. Letter-Position Encoding The highly speci c, counterintuitive predictions were based on the details of locational gradient formation. The con rmation of these predictions provides strong support for the idea that letter-position encoding employs a spatial acti- vation gradient, the formation of which requires hemisphere-speci c processing, giving di ering activation patterns across the visual elds. Although these ex- 152 perimental results do not directly con rm the claim that the locational gradient induces a serial encoding of letter order, the proposed dynamics do explain why top-down feedback has no e ect when the internal letters already receive a rela- tively high level of excitatory bottom-up input. 9.8 General Discussion The SERIOL model has elucidated the source of a phenomenon that has re- mained mysterious for decades, the asymmetry of the length e ect [Mel57, Bou73, Ell88, Naz03, Jor03]. It has also explained the recently discovered asymmetry of the N e ect, at the same time revealing the source of facilitation for high N. The model explains why the CVF patterns with the LVF for the N e ect, but with the RVF for the length e ect. For an N e ect to occur, the slope of the locational gradient must be su ciently steep that top-down input can assist the ring of the internal letters. The locational gradient is steeper in the LVF and CVF than in the RVF across early string positions, explaining the pattern of the N e ect. In contrast, the length e ect occurs when the locational gradient is not smoothly decreasing. This only occurs in the LVF (as a result of acuity-gradient inversion across a large number of letters), explaining the pattern of the length e ect. This analysis implies that it may also be possible for high N to compensate for a non-smooth, LVF gradient in longer words. Indeed, [Lav02a] showed that there was no length e ect for LVF high-N words of three to ve letters, while there was a length e ect for low-N words. This lack of a length e ect for high-N words was the rst time that an absence of a length e ect was demonstrated for LVF. In contrast, in the present work, we have shown for the rst time how to abolish a length e ect in a set of words that normally shows such an e ect [Whi04c]. 153 These results demonstrate that these hemispheric asymmetries do not entail di erent modes of lexical access. Rather, hemisphere-speci c activation patterns are the cause. Thus, the locus of visual- eld e ects is lower in the processing stream than is commonly assumed. These results suggest that it is not appro- priate to use visual half- eld studies to investigate linguistic-level hemispheric speci city. As such experiments are currently widely used, this is an important nding. To further buttress this claim, we are currently applying our contrast- manipulation methodology to a semantic asymmetry related to primes that have two di erent meanings [Bur88]. Logically, if there is one mode of lexical access, as our results and brain-imaging evidence indicate, semantic asymmetries must also originate prelexically. Therefore, we expect to be able to reverse this asymme- try also. Mechanistically, degraded letter-position encoding in the LVF/RH may create more di use lexical activation (than RVF/LH presentation), causing more di use semantic activation, leading to an asymmetry in semantic priming. Ex- tension of our results to the semantic level would conclusively demonstrate that VF asymmetries arise at a prelexical level, which would indicate that hemi eld experiments should no longer be used to make claims about hemisphere-speci c processing at the lexical level and above. The fact that the SERIOL model has lead to these experimental results il- lustrates the utility of the overall approach. The predictions and experimental designs were generated by reasoning about a theoretical model, not by running a simulation. The theoretical model was formulated by considering a wide range of behavioral data and neurobiological constraints. This allowed a theory of how letter-position is encoded in a mature brain, and has led to novel, counterintu- itive predictions that have been experimentally veri ed, and have elucidated long standing questions in area of visual word recognition. 154 I believe that this general approach allows one to get at what the brain is doing in a way that is not achievable by training an arti cial neural network. It forces consideration of what a brain is actually doing, and how it is doing it. More realistic and complex tasks can modeled when the work is not limited by implementational issues. Rather, computation is considered at a more abstract level, but is still heavily constrained - by neurobiological plausibility, and behav- ioral patterns. Thus although the model is speci ed on the functional level, the speci cation is still highly speci c, much more so than box-and-arrow models. This speci city is in evidence in the range of accurate predictions generated by the SERIOL model. Once neural mechanisms in a mature brain have been estab- lished, we are then in a better position to consider how learning occurs, because we know what the end point should be, and what computational mechanisms must be available. In the following chapter, I consider the implications of the SERIOL model for the more general arena of visual object recognition and for dyslexia. In the subsequent chapters, I then apply the overall approach to the problem of parsing. 155 Chapter 10 SERIOL Speculations In section 2.2, I claimed that understanding how the brain handles LPE should shed light on fundamental processing mechanisms. In this chapter, I address this issue. I start with a consideration of which aspects of the SERIOL model are learned, and which are innate. Based on this analysis, I discuss how the presumably innate aspects could apply to object recognition in general. I then consider how disruption to the learned aspects could contribute to dyslexia. This discussion will be sketchy and speculative. To treat these subjects in detail would require several more dissertations! 10.1 Innate versus Learned Aspects of the SERIOL Model Starting at the highest level of the model, I now consider how the proposed processing could be learned during reading acquisition. The word level of the model corresponds to the orthographic lexicon. Obviously, people must learn to associate a word?s spelling and its meaning. While I have used a localist encoding of the word level in the simulations, this assumption is not central to the theoretical model. I leave the nature of the encoding of the lexicon as an open question. 156 The lexical level is activated by bigram nodes, which represent the ordering between two letters. Thus, all relationships between the letters in a stimulus are encoded by a set of pairs. The general capacity to represent relationships in this way in the visual system may be innate (such as above/below relationships, as discussed in the following section). The bigram level is activated by the serial ring of letter nodes. This se- rial encoding depends on the oscillatory nature of letter nodes. Obviously, the brain does not learn to use oscillations to encode information. Rather oscillatory dynamics must present as an innate encoding mechanism. Serial ring also depends on a feature-level activation gradient. The left- to-right nature of this locational gradient is obviously learned, as it is based on reading direction. Furthermore, distinguishing objects by horizontal relationships is unnatural. The identity of an natural object does not change as it is rotated around the vertical axis; a lion is still a lion regardless of whether it is facing to the left or to the right. Thus the visual system must learn to distinguish horizontal order for the purpose of processing words, and it must learn to impose a monotonically decreasing activation gradient. However, the general mechanism of creating a location-invariant representation via the conversion of space into time is taken to be innate. The edge level of the model is based on known properties of the primary visual areas, and these properties are therefore innate. The transformations between the edge and feature level constitute the learned nature of the locational gradient. Thus these general representational mechanisms are taken to be innate: the pairwise representation of relationships, the existence of oscillatory cells, and the capacity to use these oscillatory cells to convert a spatial representation into a temporal representation via di erences in activation levels. In the following 157 section, I discuss how these capacities could be employed in general object recog- nition. Processing that is speci c to visual word recognition occurs primarily at the feature level. The visual system must learn to encode letter order via a monoton- ically decreasing activation gradient across a retinotopic representation. I assume that this learned in response to a top-down attentional gradient. In section 10.3, I present a simple simulation showing the feasibility of such learning. I also discuss how failure to create the locational gradient may be a causal factor in dyslexia. 10.2 Object Recognition There has been an ongoing debate as to whether objects are recognized via interpolation of view-dependent templates[Pog90], or by matching abstract struc- tural representations [Bie87]. However, recent work has indicated that the visual system may use both approaches [Fos02, Hay03]. Both types of recognition entail similar problems of representing the relationship between sub-parts in a location- invariant way, so that a stimulus can be matched against a stored representation. In the view-dependent approach, this would involve two-dimensional relationships between features, while in the compositional approach this would involve three- dimensional relationships between volume primitives, known as geons [Bie87]. In an implemented model of the geon approach [Hum92], spatial relationships were encoded using the above, below and beside predicates. Thus ordering along the vertical axis was di erentiated, but not along the horizontal access. This is in line with the above observation that left-right relationships are not invariant for natural objects. In contrast, vertical relationships usually do not vary, because natural objects are not usually upside-down. Thus it is most important to rep- 158 resent vertical relationships. To represent the structure of the constituent geons, each geon was temporally bound to a one-place relationship. For example, if a cone appeared above a brick, cone and above would re simultaneously (encoding that the cone is above something), while brick and below would re simultane- ously in a di erent time slot. However, this encoding leads to ambiguity if there are four or more geons above one another. The middle geons are both above and below another geon, so there is ambiguity about their relationships to one another. The relationships between geons were established by an exhaustive comparison between locations. A coarse coding of ten units was used to encode the vertical spatial location of the center of mass of each geon. For example, using 1 to represent the topmost location, the above unit is activated if 1 and 2 are active, or 2 and 3, or 1 and 3, etc. This requires an and gate for each pair of locations that satis es the relationship, and an or gate joining all of the and gates. While it is feasible to use this approach for a small number of possible locations, the wiring necessary for a more realistic network becomes prohibitively expensive. How could the proposed representational mechanisms overcome these di cul- ties? In the SERIOL model, a relational unit (i.e. a bigram) represents a two- place relationship, rather than a one-place relationship. Such units would reduce ambiguity. Thus vertical relationships could be represented by units representing above/below. For example, when a cone appears above a brick, it activates a cone-above-brick unit. The number of required units is the square of the number of geons (24) [Bie87], which is not prohibitively large. In the SERIOL model, the left-right relationship is identi ed not by exhaustive comparison of locations, but rather by order of ring. The same principle could be used to identify above-below relationships. In the SERIOL model, the sequential 159 ring is achieved via a monotonically decreasing activation gradient. However, there is no evidence for a monotonically decreasing gradient from the top to the bottom (or bottom to top) of the visual eld. Instead, the visual system may use the acuity gradient directly, but di erentially, in the upper visual eld (UpVF) and the lower visual eld (LoVF). if geon1 is above geon2 in the UpVF, geon1 would have a lower acuity than geon1. If the UpVF acuity gradient is converted into sequential ring of geons, the geon1-above-geon2 unit should be activated when geon1 res after geon2. In contrast, if geon1 is above geon2 in the LoVF, geon1 will have a higher acuity than geon2. If the LoVF acuity gradient is separately converted in a sequential ring pattern, the geon1-above-geon2 unit should be activated if geon1 res before geon2. So the wiring between geon and bi-geon units would vary with visual eld. This wiring would be part of the visual system?s innate capacity to represent spatial relationships. Relationships could also be hardwired across the visual elds. That is, if geon1 appears in the UpVF and geon2 in the LoVF, the geon1-above-geon2 unit is activated. In contrast to the UpVF and LoVF, there would be no visual- eld-speci c wiring for the left and right visual elds, because left-right relationships are not usually invariant. In both visual elds, if geon1 res after geon2, the geon1-beside- geon2 unit would become activated. In order to read, this mechanism would have to be overridden via a monotonically decreasing gradient which induces rst-to- last sequential ring.1 1If there is an innate mechanism for encoding above-below relationships, but not left-right relationships, why then are most scripts read horizontally? The visual eld is more extensive along the horizontal axis than along the vertical axis, and acuity decreases more quickly along the vertical axis than the horizontal axis. It may be the case that the increased acuity along the horizontal axis outweighs the cost of special processing. Also, although above-below rela- 160 Thus, for general object recognition, separate temporal encodings may be in- duced along the vertical axes in the UpVF and LoVF, and along the horizontal axes in the LVF and RVF. This would lead to the activation of bi-geon units encoding above, and bi-geon units encoding beside. This location-invariant repre- sentation could then be matched against a stored representation based on bi-geon units.2 A similar encoding mechanism could be based on features, rather than geons, for matching view-dependent templates. This sketch has suggests that the basic principles of encoding in the SERIOL model could plausibly be extended to the domain of object recognition in general. Of course, many details remain to be worked out. 10.3 Feature-Level Processing and Dyslexia According to the above discussion, the task of learning to encode letter order primarily consists of learning to create the locational gradient. What could drive this learning? I assume that it is attention based. Because print-to-sound trans- lation proceeds from left-to-right (in a left-to-right language, of course), attention is rst focused on the rst letter, then on second, etc. This may create a top- tionships may be directly computed, this would not be a result of sequential ring across the letters. For example, for the vertical word GLEN xated in the center, L and E would re, and then G and N. Thus there is no letter-based invariant representation of order. Recall that the phonological route requires such a representation. Therefore, acuity gradient inversion (in the UpVF) may be necessary for languages read from top to bottom, giving no advantage over horizontal scripts. 2Inside and/or in-front-of relationships would also be required. These principles are less applicable for determining such relationships. Rather, a mechanism to compare spatial extents would necessary. 161 down attentional gradient across the letters. This top-down gradient may then drive learning on bottom-up connections (between the edge and feature levels) and lateral connections (within the feature level). Over time, the visual system learns to automatically create an activation gradient, without top-down support. 10.3.1 Simulation of Learning to Form the Locational Gra- dient To test the feasibility of this scenario, I ran a simulation with one layer of feature nodes that were fully interconnected. A self connection was excitatory, whereas connections to other nodes were inhibitory. Bottom-up input was in the form of an acuity gradient, whereas top-down input was in the form of the locational gradient. Following the reception of bottom-up input, the network iterated for three cycles. The resulting activations were compared to locational gradient. If a node?s activation was too low, the strength of the self-connection was increased. It the activation was too high, the strengths of the inhibitory connections were increased. Then all weights decayed slightly. The simulation was performed on set of 10 nodes, where nodes 1::5 represented the LVF/RH and 6::10 represented the RVF/LH. Bottom-up activations BUi increased for i =1 to 5 (from 3.0 to 5.0), and decreased for i = 6 to 10 (from 5.0 to 3.0). The network was trained for stimuli at start::10, where start was varied from 1 to 6. BUi was set to 0.0 for i < start: The top-down activation TDi was set to 0.0 for i < start, and to 5:0 (i start) 0:5 for i start. Thus the network had to learn a single set of weights that would generate a gradient of the same shape for all stimulus locations. Excitatory and inhibitory connection weights were initially set to 0.005 and 162 -0.005, respectively. On each iteration, a node?s activation Ai was increased by the dot-product of the feature activation vector and the weight vector. After 2 iterations, if Ai < TDi (within a tolerance of 0.05), the self-connection weight was increased as follows: wii = wii + LR TDi where LR is the learning rate. If AI > TDi: wij = wij LR Aj for i 6= j Then all weights were reduced: wij = D wij where D < 1. LR =0.0006 and D = 0:999 produced good performance. After 10000 learning cycles, the desired monotonically decreasing gradient was created for all values of start: Although connection weights were initially symmetric, this training induced an asymmetry, such that weights on inhibitory connections from i to j were more negative for i < j than i > j, and were more negative for i < 6 than i > 5. This is in line with the proposed left-to-right inhibition that is stronger for the LVF/RH. Weights on self-connections were higher for i < 6 than for i > 5. This is in line with increased excitation for the LVF/RH. Thus simple learning rules yielded connection weights with the proposed characteristics, demonstrating the plausibility of learning to form a locational gradient. 10.3.2 Dyslexia If the visual system fails to learn to create the locational gradient, letter order will not be quickly and automatically represented. A de cit in visual processing 163 is consistent with evidence showing that normal readers show an early (<150 ms post-stimulus) increased activation in the the left posterior fusiform gyrus in re- sponse to letter strings, while dyslexic readers do not [Hle97, Tar99]. This early left-lateralization in normal readers may correspond to the initiation and perfor- mance of string-speci c processing (i.e. locational gradient formation). Lack of locational-gradient formation in dyslexics is also consistent with a study of the OVP in young readers who were normal or dyslexic [Duc03]. For normal readers, initially xating on the rst letter of a word yielded much better reading perfor- mance than xating on the last letter. This is in line with the usual bias in OVP experiments, which I take to result from the necessity of acuity-gradient inver- sion in the LVF. If dyslexics do not create a locational gradient, this asymmetry should not be present because acuity gradient inversion would not be performed. Indeed, the dyslexic readers showed a symmetric viewing-position function. If a rapid sequential representation of letter order cannot be induced, the visual system may compensate by performing an overt scan of the string. That is, instead of creating a sequential encoding in a single xation, multiple xations are carried out across the string. Thus a sequential encoding is created, but on a much longer time scale. This proposal explains xation data for normal versus dyslexic children. Children read words varying in length from 2 to 14 letters while their eye movements were monitored [Mac04]. (The study was in German, so long words were common in the test language.) Normal and dyslexic children showed similar patterns for words of 2-4 letters. As string length increased, the patterns diverged sharply. (The following results are the medians for each group.) For the longest words, normal subjects initially xated 4.4 letter-widths from the beginning of the word, performed saccades of 3.5 letter-widths, and nally xated 4.3 letter widths from the end of the word. In contrast, dyslexic children initially 164 xated 2 letter-widths from the beginning of the word, performed saccades of 2 letter-widths, and nally xated 1.8 letter widths from the end of the word. Thus, on a single pass over a long word, dyslexic readers made twice as many saccades as normal subjects. In addition, holding duration per xation was longer in the dyslexic children. The dyslexic pattern is consistent with a strategy in which one to three letters are processed per xation, where a slow, top-down attentional mechanism used to scan letters within a xation. Because it takes so long to process all the letters of the string, information about the initial letters may be lost by the time that the end of the string is reached. Thus multiple passes across the string may be required to read the string. Indeed, the number of backward saccades increased with word length for the dyslexic, but not the normal, subjects. Such a de cit in encoding letter order could have rami cations for learning grapheme-to-phoneme correspondences. Perhaps a rapid sequential representa- tion of letter order is necessary for learning an e ortless mapping to phonology. That is, it may be necessary to temporally align sequential orthographic and phonological representations in order to learn to e ectively translate between the two types of encodings. When a suitable, robust orthographic encoding is not available, this may interfere with such learning. Thus the well-known phonologi- cal de cits observed in dyslexics may actually have their source in a visual de cit, at least in some cases. 10.3.3 Magnocellular De cit What underlying de cit could prevent formation of the locational gradient? As discussed in section 3.3, recent research has revealed a magnocellular de cit in some dyslexics. The dorsal route of the visual system, which processes motion, 165 location, and attention, primarily receives inputs from the magnocellular path- way [Mau90]. Therefore, the underlying problem may be attentional, in that the proper top-down attentional gradient is not available to drive learning of the lo- cational gradient. This proposal is consistent with evidence of LVF mini-neglect and RVF over-distractibility in dyslexics [Fac01, Har01], indicating abnormal at- tentional gradients. Vidyasagar has made a similar proposal, suggesting that dyslexics are unable to sequentially deploy attention across the string in a rapid, top-down manner [Vid01, Vid04]. In contrast, I propose that attentional prob- lems prevent learning of the normal automatic, bottom-up processing that drives a sequential representation of letter order. The ventral route of the visual system, which processes form and color, re- ceives inputs from both the parvocellular and magnocellular pathways [Fer92]. Little is known about the role the magnocellular pathway along the ventral route. Because magnocells are larger and process information more quickly than parvo- cells, magnocells may rapidly set up a low spatial-frequency representation of the visual scene, onto which the parvocells ll in detail [Car87, Del00, Van02]. In line with this fast processing, another role of the magnocellular system in locational gradient formation may be to rapidly drive the inhibition that is necessary to invert the acuity gradient and create the locational gradient. If magnocells are functioning too slowly, it may not be possible to set up the locational gradient quickly enough to subserve the bottom-up induction of a serial encoding. Some dyslexics do not show magnocellular problems. They may fail to de- velop the locational gradient for other reasons. Perhaps an auditory de cit di- rectly prevents the development of a phonological representation that is based on individual phonemes. In the absence of such a representation, there may be less pressure to develop an orthographic representation that aligns with the phonemic 166 representation. Another potential role of the magnocellular system lies at a higher level of processing. Recall that a bigram node is activated by letters that re in a partic- ular temporal sequence. This response pro le is similar to cells which only re for a stimulus moving in a certain direction. Such directional sensitivity is charac- teristic of motion-detection cells in V5 [Mau83]. Due to this functional similarity, bigram nodes may be located in V5. Because problems with formation of bigram nodes would result in impaired ability to form and store representations of letter order, this proposal is consistent with evidence showing a correlation between motion-detection ability and both letter-position encoding ability [Cor98], and the ability to distinguish real words from pseudohomophones (e.g. rain versus rane) [Tal00]. It is also consistent with the phenomenon of letter-position dyslexia in some subjects su ering from occipitoparietal lesions, whose error responses are anagrams of the target word [Fri01]. Such subjects may have an intact serial encoding of letter order, but may lack reliable bigram representations. Similarly, some developmental dyslexics with magnocellular problems may fail to develop bigram nodes, leading to di culty in developing an orthographic lexicon. An impairment at the bigram level may be directly due to a processing de cit in V5, or may originate earlier in the processing stream, perhaps due to a lack of a rapid sequential letter-based representation, which may be necessary to drive formation of bigram nodes. 10.3.4 Possible Experimental Tests of these Proposals While highly speculative, the above analyses do suggest some avenues of ex- perimental investigation. The proposal that dyslexics fail to learn to form a loca- tional gradient could be tested by investigating letter perceptibility patterns for 167 lateralized presentation of three-letter consonant strings. For normals, the best- perceived letter in each visual eld is the letter farthest from xation, as discussed in section 4.2.3. In the LVF, this is due to the feature-level, left-to-right inhibi- tion necessary to invert the acuity gradient. I would expect a di erent pattern in dyslexics, with perceptibility more in proportion to acuity. In the RVF, the nal letter is the best perceived because it is not inhibited by a subsequent letter at the letter level. If dyslexics rely on a top-down attentional scan, this pattern should not be present. Thus for dyslexics, I would expect positional symmetry across the visual elds (resulting from a top-down scan), somewhat modulated by acuity, giving a V-shaped LVF pattern and an initial-letter primacy in the RVF. If the predicted pattern is found for dyslexics, this would suggest that it may be possible to treat dyslexia via the external imposition of a locational gradient, in order to jump-start its automatic formation. This could be accomplished by creating a contrast gradient across words presented on a computer screen. That is, the rst letter has the highest contrast, the second letter has somewhat lower contrast, the third somewhat lower than the second, etc. Each word should be centrally presented for 200 ms, to force processing within a single xation. A previous study has shown that treatment utilizing brief presentation (100 to 300 ms) of words, either centrally or randomly lateralized, improved spelling ability in dyslexics, whereas longer central presentation (1500 ms) or presentation to a single visual eld did not [Lor04]. This increased spelling ability may re ect a more reliable orthographic lexicon, stemming from more robust letter-position encoding. Perhaps brief presentation by itself forced formation of the locational gradient, because visual scanning was not an option. It would be interesting to see if imposition of a contrast gradient on such stimuli would generate a greater improvement in reading ability than standard stimuli. 168 The proposal that V5 houses bigram units could be tested via transcranial magnetic stimulation (TMS), which temporally disrupts neural activity in a small area of the cortex. Under TMS to V5, a task that requires encoding of relative letter order should be disrupted. A previous experiment has yielded suggestive results. TMS to V5 impaired ability to read pseudowords [Lis95]. Pseudoword reading requires precise encoding of letter order because top-down, lexical infor- mation is not available. Interestingly, the number of transposition errors prefer- entially increased, consistent with an induced de cit in positional coding ability. There were 75% more transposition errors (21/29) versus 33% more replacements (33/100) and additions (6/18) under TMS as compared to no stimulation. Further investigations could be carried out on Hebrew subjects performing the reading task, to see if letter-position dyslexia [Fri01] can be induced. (As discussed in section 3.3, Hebrew is an ideal language for revealing letter-position dyslexia be- cause vowels are not explicitly represented.) Alternatively, the lexical-decision task used by [Cor98], wherein nonwords were formed by transposing letters of real words, could be employed in any language. As a control, nonwords formed by replacing a letter of a word should also be included. If false positives to ana- grammatic non-words were selectively increased under TMS to left V5, this would indicate that letter-position encoding in particular was disrupted. 10.3.5 Summary The primary locus of learning in the SERIOL model is at the feature level, where the locational gradient is formed. Failure to learn to produce a locational gradient may contribute to dyslexia. Such failure may stem from a magnocellular de cit, or from the lack of a robust phonemic encoding. The general mechanisms of encoding relationships with a set of pairs, and 169 of using a spatial activation gradient to induce a temporal, location-invariant encoding could be used by the visual system for object recognition in general. To represent vertical relationships, the visual system would likely directly use the acuity gradients in the upper and lower visual elds. This would require each visual eld to interpret the order-of- ring information di erently. This concludes the discussion of LPE. In the remaining chapters, I begin to tackle the problem of how the brain creates the representation of sentence structure. 170 Chapter 11 The Parsing Problem 11.1 Speci cation of the Problem A sentence is interpreted to ascertain \who did what to whom". A verb spec- i es the \what". The participants in the actions undertake di erent thematic roles; the \who" is termed the Agent, and the \whom" the Theme. For example, consider the sentence: 1. The dog that Mary adopted bit Tim. The main idea of the sentence is (Agent=dog, action=bit, Theme=Tim), where the additional information (Agent=Mary, action=adopted, Theme=dog) modi es (Agent = dog). The job of the human parser is to take a sequence of words and convert it into such a representation of meaning. This task involves computational and rep- resentational problems that are much more di cult than those of letter-position encoding! The most important di erences are as follows. There must be unlimited productivity. Any word can appear in an instance of the corresponding syntactic or thematic category. In letter position en- coding, there are a small number of elements (letters); relationships between 171 elements can be represented via (bigram) units encoding every possible pair of elements. This type of conjunctive representation is not feasible in pars- ing due to the large number of words. The resulting representation must be hierarchical. It must be possible to represent multiple clauses and the relationships between those clauses. For example, a relative clause is embedded within the main clause in (1). In contrast, letter-position encoding only requires a linear representation of the relationships between elements. It must be possible to associate particular non-contiguous items in partic- ular ways. In the above example, dog must be associated with bit, while the intervening material must not a ect this association. This problem is especially di cult in the case of a center-embedding, as in the above exam- ple, where a new clause is started in the middle of a higher level clause. This leads to multiple unattached Agents (e.g., dog and Mary), and each one must be associated with the proper verb. In contrast, in letter-position encoding, all elements bear the same type of relationship to each other and this problem does not arise. Thus, the overall problem is\What representations, and transformations on these representations does the brain use to convert a sequence of words into a hierar- chical representation of meaning?" This does not include the question of how a word?s meaning is actually represented in the brain. Rather, the focus is on how words could be represented that would allow them to combined into hierarchical structures. The resulting hierarchical representation of thematic roles is dubbed the thematic tree. 172 Due to the di culty of the above question, I initially attack this problem by focusing on the neural basis of the underlying representations, while considering the operations on those representations at the algorithmic level. That is, the nature of the thematic tree and of the intermediate representations supporting its construction are considered at the neural level. The parsing algorithm that operates over these representations is considered at the symbolic level, for now. To satisfy rst two requirements discussed above (productive and hierarchical representations), two types of operations must be available. It must be possible to bind together an arbitrary word with a thematic role, giving productivity. It also must be possible to merge multiple such associations into a single unit so that the entire unit can enter into a binding relationship, thereby allowing hierarchical structure. To understand what third requirement above entails, a discussion of the Chomsky hierarchy of formal languages is in order. 11.2 Computational Constraints Chomsky [Cho59] identi ed a relationship between the complexity of formal languages and the computational machinery required to accept or reject a string as being a well-formed string of a language . The simplest class of language, a regular expression, is recognized by a nite-state machine, which consists of a set of states, and state transitions triggered by input tokens. See Figure 11.1. A nite-state machine can recognize strings of the form anbm. Grammars of this type correspond to phrases in natural language. For example, a de nite noun phrase is recognized by a nite-state machine that expects the, followed by any number of adjectives, followed by a noun. A nite-state machine can also recognize strings of the form (ab)n. This grammar corresponds to right-branching 173 clauses, where a?s are nouns and b?s are verbs. For example: 2. John knows Sue thinks Bill lied. In contrast, a nite-state machine cannot recognize strings of the form anbn (where n is unbounded), because there is no way to ensure that the number of a?s and b0s match up when all the a?s are processed rst. Note that anbncorresponds to center-embedding, for example (noun (noun verb) verb). Thus, a nite-state machine cannot handle the general case of center-embedded clauses. It is often pointed out that humans cannot either, as more than one center-embedding leads to an uninterpretable sentence, such as: 3. The man that the dog that Mary adopted bit screamed. However, the human parser can handle certain double center-embeddings, as in the following: 4. The fact that the dog that Mary adopted bit Tim upset her. Thus humans can indeed parse multiple center-embeddings. Recent research has suggested that the ability to process center-embeddings may be uniquely human [Fit04]. Both humans and tamarins (a type of primate) rapidly learned to recognize sequences of syllables of the form (ab)n, where a?s were in a female voice, and b?s were in a male voice, for n = 2 or 3. However, only humans learned to recognize sequences of syllables of the form anbn, for n = 2 or 3. This ability to recognize center-embeddings may re ect a neural adaptation that is speci c to language ability [Fit04]. What computational machinery is necessary for recognizing center-embedded structures? Such processing requires the functionality of a stack. A stack is characterized by the push and pop operations. Push adds an item to the top 174 S a b ea b A1 2 S ba e a A1 2 Figure 11.1: Examples of nite state machines (FSMs). Each recognizer consists of a start state, S, and an accept state, A, and intermediate (numbered) states. Transitions occur between states for speci c input tokens, where e represents the end-of-string token. The top FSM accepts strings of the form anbm, for n 1 and m 1. For example, the string a1b1b2b3would activate the following sequence of states: S,1,2,2,2,A. The bottom FSM accepts strings of the form of (ab)n, for n 1. For example, the string a1b1a2b2 would activate the following sequence of states: S,1,2,1,2,A. 175 a1 a1 a2 a1 a2 a3 a1: push(a1) a3: push(a3)a2: push(a2) a1 a2 a1 e & empty(): Accept b2: pop() b1: pop()b3: pop() y = a3 y = a2 y=a1 Figure 11.2: Example of using a stack to recognize strings of the form anbn . A stack S provides the push(S,x) operation, which puts x on the top of the S, the pop(S) operation, which removes the top item from S and returns it, and the empty(S) operation, which is true only if there are no items on S. The string anbn can be recognized using the following algorithm for token x : if x = a then push(S,x) else if x = b and not empty(S) then y=pop(S) else if x = e and empty(S) then Accept else Reject The operation of this algorithm is illustrated for the string a1a2a3b3b2b1, where the boxed items represent the items on the stack and a line represents an empty stack. In natural language, such a string would correspond to multiple center- embeddings, where a?s are subjects and b?s are verbs. The recognition algorithm could be augmented to create a representation of the structure of the input by adding appropriate structure-building operations. For example, when an item is popped, a structure could be created that represents the integration of y (a subject) with x (the current verb). This structure could be saved and attached to the structure created by the next pop operation, and so on. 176 of a stack, while pop removes the topmost item. Thus, items are popped in the reverse of the order that they were pushed. See Figure 11.2. Of course, in processing natural language, it is insu cient to merely accept or reject a string of words as being a well-formed sentence. Rather, a representation of meaning must be created as the words are processed. A recognizer can be aug- mented to construct a such representation. For example, when a pop operation is triggered, the result of the pop could be taken to be the Agent of the current verb (assuming that a?s correspond to nouns and b?s to verbs in our example). Natural language contains other structures, called crossed-serial dependencies, that cannot be parsed using a nite-state machine or a stack, as in: 5. John, Bill, and Tom were wearing green, blue, and purple, respectively. Here, respectively indicates that the following associations should be formed: (John, green) (Bill, blue) (Tom, purple). However, stack-based processing would yield (Tom, green) (Bill, blue) (John, purple). In this case, the functionality of a queue is required, which is characterized by the append and remove operations. Append adds an item to the end of a queue, while remove takes an item from the front of a queue. Thus items are removed in the same order that they are appended. To parse the above sentence, John, Bill, and Tom would be succes- sively appended to a queue. Then green would trigger a remove, giving John; blue would trigger a remove, giving Bill, etc. Thus, in order to process center-embeddings and crossed-serial dependencies, the human parser must be able to perform stack-like and queue-like operations in working memory. Therefore, such operations are an important component of the intermediate representations that allow construction of the thematic tree. In the following two chapters, I consider constraints that narrow the possibilities 177 for how the thematic tree and the intermediate encodings are represented in the brain. 11.3 Neurobiological Constraints For the problem of letter-position encoding, the architecture of the visual system constrained the lowest level of representation. Due to the high-level nature of the parsing problem, such explicit constraints are not available. Rather, there are more general constraints of neurobiological plausibility as follows. There are a nite number of neurons of xed connectivity. A node repre- senting an association between particular words cannot magically appear. Connection weights cannot be quickly altered and then returned to their original values. While there is evidence for rapidly occurring changes in synaptic strength in the hippocampal system, these changes are enduring [Bli73]. (This phenomenon is called long-term potentiation.) Due to the large numbers of sentences parsed, it is unlikely that the human parser relies on such semi-permanent changes to connection weights. However, it should be possible to store the thematic tree in the hippocam- pal system, if desired. Therefore, it should be possible to encode the in- formation in the thematic tree into long-term storage, based on changes to connection weights. A wide range of imaging studies have revealed brain areas and activation com- ponents associated language processing. Such studies provide little information about the nature of the underlying neural representations, and will not be re- viewed here. However, imaging studies in which frequency-band power is an- 178 alyzed could potentially be informative. If power in a certain band increases during a task, this may indicate that performance of the task relies on oscillatory activity in that frequency band. A range of studies have shown an increase in theta-band power in tasks that employ verbal working memory [Kli99]. A study using intracranial electrodes (in epileptic patients) allowed a particularly precise measurement of the temporal aspect of theta-band synchronization [Rag01]. These subjects performed the Sternberg task, in which 1 - 4 digits were memorized, followed by a delay interval, and then a probe. The subject then indicated whether the probe appeared in the memorized list. Spectral analysis showed a sharp increase in theta power at the start of the memorization phase. Theta power was maintained during the delay phase, and returned to baseline levels after the probe. This pattern occurred only in the theta band. In an MEG of the Sternberg task, theta power systematically increased in frontal areas as the number of digits to be remembered increased from 1 to 3 to 5 to 7 [Jen02]. Together, these studies suggest that items in verbal working memory are stored on an oscillatory carrier wave, as in the Lisman and Idiart model [Lis95] (discussed in section 6.1.2). This view is further supported by a clever experiment in which auditory clicks were presented at varying rates during performance of the Sternberg task [Bur00]. When the frequency of clicks fell just below 21 Hz, RTs were slowed and when the frequency fell just above 21 Hz, RTs were speeded. The largest changes occurred on those trials in which the largest number of items had to be remembered. These results suggest that the clicks a ected the duration of gamma cycles on which items were stored in working memory, (As gamma cycles fall in the range of 40 Hz, the 21 Hz stimuli would correspond to a harmonic of that frequency.) 179 An EEG study has linked these phenomena to sentence processing, showing that theta power in particular increased as a sentence was read [Bas02]. Another study has shown e ects of grammatical class (noun vs verb) on theta power [Kha04]. However, semantic processing seems to have no e ect on theta power. A comparison of two tasks (reading a sentence versus reading a sentence and giving the superordinate category of one of the words) showed no di erence in the theta range between the reading-only and semantic task, while alpha power increased for the semantic task [Roh01]. In sum, these results suggest that theta oscillations may play a role in syntactic encoding in working memory during sentence processing. 180 Chapter 12 Behavioral Results on Parsing Of course, it is also more di cult to investigate parsing behaviorally than letter position encoding. The most informative data come from when the parser breaks down. Such break-down is generally measured by o -line di culty ratings, or by the on-line measure of reading times in a self-paced reading study. In such a study, the words or phrases are sequentially revealed, and the timing is controlled by the subject. This allows a record of how long the subject spends on processing each word or phrase. The human parser experiences di culty in two situations: complexity and reanalysis. If the structure of a sentence is too complex, it becomes too di cult to process, as for the doubly center-embedded relative clauses in (3). Alternatively, di culty can arise when the structure of the sentence is ambiguous, and the wrong analysis is initially chosen. In some cases, an initial incorrect analysis can be easily reanalyzed to give the correct structure, while in other cases, it cannot. For example, the following sentence is very di cult to understand. 6. The doctor told the patient that he was seeing that it was time to leave. Here that he was seeing is initially taken as the Theme (what the doctor told the patient), and it is di cult to reinterpret it as relative clause modifying the 181 patient when the actual Theme (that it was time to leave) is encountered. It is possible that both types of phenomena arise from the way that inter- mediate representations in working memory are used to construct the thematic tree. In the complexity case, intermediate representations may become unable to support the generation of the thematic tree. For reanalysis, the nature of these representations may determine why some reanalyses are easy, and some are not. In the following, I will concentrate on complexity phenomena. I rst review the experimental results, and then discuss psycholinguistic models and metrics that have been proposed to account for these results. 12.1 Complexity Phenomena 12.1.1 Center-Embedding versus Crossed-Serial Depen- dencies In English, center-embedding occurs when an embedded clause follows the subject. In languages with other word orders, center-embedding can occur under di erent circumstances. In German, nested in nitival clauses result in center- embedding. For example, the sentence: 7. Joanna helped the men teach Hans to feed the horses. is expressed as follows in German: 8. Johanna hat den Mannern Hans die Pferde futtern lehen helfen. Joanna has the men Hans the horses to-feed to-teach helped. In Dutch, the same sentence would be expressed using crossed-serial depen- dencies, where the rst subject is associated with the rst verb, the second with the second verb, etc. : 182 9. Jeanine heeft de mannen Hans de paarden helpen leren voeren. Joanna has the men Hans the horses helped to-teach to-feed. A study of the relative ease comprehension for the Dutch versus German con- structions showed that the Dutch version is easier [Bac86]. Despite the fact that center-embeddings are generally more common across natural and arti cial (computer) languages, and are more complex according the Chomsky hierarchy [Cho59], crossed-serial dependencies are easier to process. 12.1.2 Di erent types of English doubly center-embedded clauses A center-embedded clause in English could either be a relative clause (RC) or a noun complement (NC). In a relative clause, there is \gap"corresponding to the the noun phrase being modi ed. For example, in the sentence: 10. The dog that Mary adopted bit Tim. that Mary adopted is a relative clause with a gap following adopted. This gap corresponds to the dog (i.e. Mary adopted the dog). In contrast, there is no gap in a noun complement, which can only follow a word whose meaning is related to a proposition. For example, 11. The fact that the dog bit Tom caused him to scream. Here that the dog bit Tom is a noun complement. It is a complete clause, elabo- rating on the fact. As mentioned in section 11.2, some doubly center-embedded clauses are very di cult to understand, while some are not. A sentence in which both are relative clauses (RC/RC) belongs to the former category, for example: 183 12. The man that the dog that Mary adopted bit screamed. Yet, if the outer embedded clause is an NC (NC/RC), such a construction seems much easier [Gib98]: 13. The fact that the dog that Mary adopted bit Tom upset her. However, the opposite ordering, an RC/NC, seems at least as di cult as an RC/RC [Gib98]: 14. The woman who the fact that Rover bit Tim upset yelled at the dog. Next I present some phenomena related to embedded clauses. 12.1.3 Interference in Working Memory One possible source of di culty in center embeddings is that multiple similar items (i.e., unattached subjects) must be maintained in working memory. Across a range of domains, it has been shown that similarity among items in working memory interferes with the ability to remember and di erentiate those items. Thus, this general di culty may also apply to syntactic working memory [Lew96]. Lewis and Nakayama [Lew02] investigated the e ects of similarity in Japanese using o -line complexity ratings. In Japanese, objects precede verbs. Thus, a sentence with a sentential complement like: 15. John knows that Bill likes Sue. would be expressed with the following word order, giving a center-embedding: 16. John [Bill Sue likes] that knows. 184 Noun phrases are case-marked with su xes, indicating their role in the sentence, where -ga indicates a subject (nominative case), -o indicates an object (accusative case), and -ni indicates an indirect object (dative case). Thus the above sentence would have the following form: 17. NP-ga NP-ga NP-o V that V. Due to these factors, many unattached NPs can be accumulated in working mem- ory, and the e ects of similarity can be easily investigated. In a pilot study, twenty di erent syntactic structures were used in which the following factors were ma- nipulated: level of embedding (0 or 1 embedded clause), number of NPs (1 to 5), similarity - maximal number of NPs with the same case (1 to 3), and adjacency - maximal number of adjacent NPs with the same case (0, 2, or 3). Ease of under- standing was rated on a scale from 1 (easy) to 7 (di cult). Regression analyses showed that a combination of similarity and adjacency was the best predictor of di culty ratings, accounting for 73% of the variance. That is, perceived com- plexity increased as the number of NPs with the same case marking increased, and as their proximity to each other increased. This phenomenon was investigated further in a study in which the number of nominative NPs was held constant at 2, and the total number of NPs (3 or 4) and number of adjacent nominative NPs (0 or 2) were manipulated. With 0 adjacent nominative NPs, the total number of NPs a ected perceived di culty (ratings of 3.0 vs 4.2 for 3 vs 4 NPs). With 2 adjacent NPs, di culty was higher, and was una ected by the total number of NPs (ratings of 5.07 vs 5.22 for 3 vs 4 NPs). Thus, the proximity of nominative NPs had the largest impact on di culty ratings. However, these ndings do not reveal whether it is the syntactic category (i.e., 185 nominative) or the surface form (i.e. both -ga marked) that matters in determin- ing similarity. To get at this question, Lee and Nakayama [Lee03] performed a similar investigation in Korean, which is structurally similar to Japanese. How- ever, Korean has two di erent nominative case markings (-ka or -i), depending on whether the noun ends in a vowel. Syntactic class was varied by topicalizing the main subject. A topicalized NP indicates the focus of the sentence. It carries a di erent case marking (-nun), and is not necessarily a subject. In a self-paced reading study of sentences with sentential complements, the rst NP had either the -ka, -i, or -nun case-marking and the second NP had either the -ka or -i marking. The results showed that topicalized sentences were easier than the nominative sentences. Within the nominative sentences, those with dissimilar sequences (-ka,-i or -i,-ka) were easier than those with similar sequences (-i,-i or -ka,-ka). Thus both syntactic class and surface form in uenced di culty. 12.1.4 NP-type e ects Experiments in English and Dutch have shown that the syntactic type of subject NPs in uences di culty. In the following, I will refer to the rst NP as N1, the second NP as N2, etc. English O -line complexity ratings have shown that the di culty of an RC/RC is in uenced by the type of the innermost subject [Gib98, War02a]. If N3 is an indexical ( rst- or second-person) pronoun, an RC/RC seems easier than if N3 is a name or a full noun phrase (FNP, e.g., the woman), for example: 186 18. The man that the dog that I adopted bit screamed. If N3 is a third-person pronoun with or without a referent, it seems somewhat more di cult than indexical pronoun, but easier than a name or FNP [War02a]: 19. Acording to Sue, the man that the dog that she adopted bit screamed. The man that the dog that she adopted bit screamed. One possible explanation for these e ects is that a pronoun reduces inter- ference in working memory, because the subjects are less similar to each other. However, if N3 is an quanti ed pronoun (such as everyone), an RC/RC seems easier than if N1 or N2 is a quanti ed pronoun [War02a]. There are two consec- utive non-pronouns when either N1 or N3 is a pronoun, yet ease of processing di ers. This e ect of position suggests that the in uence of N3-type is not merely a result of reducing the number of similar adjacent items in working memory. Dutch Next we consider e ects in crossed-serial dependencies, based on self-paced reading studies [Kaa04]. In each experiment, three subjects (N1-N3) and an object (N4) preceded three verbs, and the syntactic types of N2 and N3 were varied. In Exp. 1, N2 and N3 were either both pronouns or names, while N1 and N4 were both FNPs. In this case, NP type (of N2 and N3) had no e ect on reading times at any of the verbs. In Exp. 2, N2 and N3 were either both pronouns or FNPs, while N1 was a name and N4 was an FNP. In this case, reading times increased at V1 under the FNP condition. Why did the results di er across experiments? In Exp. 1, NP-type did not a ect the maximal number of similar adjacent items. (2 pronouns vs 2 FNPs). In 187 Exp. 2, NP-type did a ect similarity (2 pronouns vs. 3 FNPs - because N4 was a FNP). Thus an e ect of NP-type only arose when similarity increased, suggesting that the e ect of NP-type in Exp. 2 was due to interference in working memory. In line with this analysis, the disadvantage in the FNP condition was numerically twice as large when all three FNPs shared the same determiner as when they did not (114 ms vs 60 ms), suggesting a sensitivity to surface form. Summary In English, but not in Dutch, making the innermost subject a pronoun a ects processing di culty. The dependence on position and contrast with Dutch sug- gest that this e ect involves factors other than interference in working memory. In contrast, the e ect of N2- and N3-type observed in Exp. 2 of the Dutch study can be accounted for by interference in working memory. 12.1.5 The RC/RC V2-drop e ect In the following, I will refer to the verbs of the inner RC, the outer RC, and the main clause as V1, V2, and V3, respectively. If V2 is omitted, an RC/RC seems as, or more, acceptable than the grammatical version [Gib99]: 20. The man that the dog that Mary adopted screamed. This e ect is speci c to V2; if V1 or V3 is dropped, the sentence is not acceptable. However, if V2 is part of a right-branching RC, V2 cannot be acceptably dropped [Gib99]: 21. I know the man that the dog that Mary adopted bit. I know the man that the dog that Mary adopted. 188 12.1.6 V2-drop x N3-type Interaction I was curious whether the V2-drop and N3-type e ects for English RC/RCs would interact. It may not felicitous to drop V2 when N3 is a pronoun. To test this, I performed a self-paced reading study in which N3-type ( rst-person pronoun, third-person pronoun with a referent, or name) was crossed with gram- maticality (V2 present or not) [Whi04d]. Thus stimuli were of the form: 22. [According to Sue], The/the trophy that the athlete that I/Sue/she admired greatly [won at the track meet] was stolen from the display case. The preamble, According to Sue, was only present for the third-person pronoun conditions. An statistically signi cant interaction between N3-type and grammaticality was found in the region of the nal verb phase. For the grammatical sentences, there was a slow-down for the name condition relative to the two pronoun condi- tions. In contrast, for ungrammatical sentences, the name condition was numer- ically faster than the pronoun conditions. This reversal indicates that V2-drop was felicitous when N3 was a name, but not when it was a pronoun. Thus, these results show a non-local e ect of N3-type. That is, the nature of the subject of the inner RC a ects the processing of higher-level clauses (outer RC and main clause.) In contrast to o -line complexity ratings [War02b], there was no di erence in performance in the verbal regions for the rst-person versus the third-person pronouns. Thus the increased o -line complexity ratings for third person pro- nouns may re ect an overall increase in di culty related to binding the pronoun to its referent or to not having a referent. The present results indicate that the integration of subjects and verbs is una ected by the type of pronoun. 189 12.1.7 Summary Studies of consecutive NPs have indicated that processing di culty increases as the number of similar NPs increases, and as the proximity between those items increases [Lew02, Lee03, Kaa04]. Similarity seems to depend on both syntactic and surface features of the NPs [Lee03, Kaa04]. Cross-linguistic comparisons have shown that crossed-serial dependencies are easier to process than center-embeddings [Bac86]. In English, an NC/RC is easier to process than an RC/RC or an RC/NC [Gib98]. The processing of an RC/RC is facilitated when N3 is pronoun, and this e ect seems to go above and beyond interference in working memory [War02a]. A similar e ect does not arise for crossed-serial dependencies in Dutch [Kaa04]. In English, when N3 is a FNP, it is felicitous to drop V2 [Gib99], but when N3 is a pronoun, it is not [Whi04d]. 12.2 Accounts Next I review some proposals as to the source of these complexity phenom- ena. The rst is a psycholinguistic model, while the following two are complexity metrics. Thus, none of these proposals are couched in terms of a neurobiologi- cally plausible model. However, it is of interest to examine the ability of these approaches to account for the above data. 12.2.1 Vosse & Kempen [Vos00] This is an implemented, localist model, which is based on a lexicalist grammar. Each word is associated with a lexical frame, which is a prede ned, elementary syntactic tree. The model creates and operates over a network of nodes which represent connections between lexical frames. A lexical frame A can attach to 190 a lexical frame B when there is an empty slot in B that is of the same phrasal type as A. Thus there is no grammar per se. Rather lexical frames compete with one another for attachment sites. This allows potential attachments that grammar-based parsing systems would never consider. The implemented model speci es the lexical frames and the dynamics of the attachment competitions. The issue of how attachments could actually be rep- resented in neural tissue is not considered; rather the modeling is at a higher level. A sentence is parsed correctly if all the proper attachments are made, and no improper attachments are formed. Like humans, the system could not parse RC/RCs or RC/NCs. This failure arose because the verbs engendered competi- tions that could not be resolved, due to the number of potential attachment sites (arising from three subjects). In contrast, the system could parse NC/RCs. How- ever, the given explanation of why an NC/RC is successfully processed (p. 124) is unclear, and further discussion with the authors has not clari ed the matter [pers. comm.]. Unlike humans, the system was not sensitive to N3-type; a pronoun N3 in an RC/RC still led to parsing failure. Also, the system cannot explain the V2-drop phenomenon. If V2 cannot be attached, replacing V2 with the nal verb would simply result in that verb not being attached. In contrast, humans appear to not expect V2, but to attach the nal verb properly. 12.2.2 Interference in Working Memory Lewis [Lew96] notes that similarity of between items stored in working mem- ory causes interference in a range of di erent modalities. He suggests that such interference may also apply to syntactic representations in working memory. Such an approach could account for the e ects of similarity-based interference for NPs 191 held in working memory [Lew02, Lee03, Kaa04]. However, this approach cannot fully capture other aspects of complexity phe- nomena. In particular, Lewis suggests that it may not be possible to maintain three unattached subjects in working memory, due to their syntactic similarity. This would account for the di culty of an RC/RC or an RC/NC. However, it does not explain the relative ease of an NC/RC or of crossed-serial dependencies in Dutch. 12.2.3 Dependency Locality Theory Gibson and colleagues were responsible for elucidating many of the above com- plexity phenomena. Their extensive work in this area has lead to the Dependency Locality Theory (DLT) [Gib00], which provides a distance-based complexity met- ric. It is based on the idea that complexity increases as the distance increases between two items that must be integrated together in the syntactic tree. Dis- tance is measured as the number of new discourse referents that intervene between these items, where a new discourse referent is a tensed verb, or an NP that is not an indexical ( rst- or second- person) pronoun.1 Integration cost is taken to increase with distance because the activation of the rst item is taken to decrease as activation is redirected to new discourse referents; thus more energy is required to reactivate the rst item during integration. A cost of 1 Energy Unit (EU) is generated for each intervening discourse element, and for generating the new dis- course referent itself. Perceived complexity corresponds to maximal integration cost. For example, for an RC/RC construction such as : 1The discourse is presumed to always include a speaker and a listener, so pronouns referring to either do not introduce a new referent. 192 23. The vase that the man who Jen dated bought fell. the highest cost occurs at the verb bought, which has a cost of 7 EUs: 1 EU for the construction of bought + 2 EUs for attachment to man (across Jen and dated) + 4EUs for co-indexing the gap following bought with the relativizer that (across man, Jen, dated, and bought). It is proposed that this high cost corresponds to the unacceptability of such a structure. For an NC with a transitive verb, an RC/NC has a larger maximal cost than an RC/RC (due to integrating across an explicit object in the NC). However an NC/RC has a lower cost than an RC/RC because a long-distance integration of a gap across an embedded clause is not required. This accounts for the di erence in di culty between an RC/NC and NC/RC [Gib98, Gib00]. While the DLT can explain a range of complexity phenomena, it has di culty in fully accounting for some aspects of the phenomena associated with double center-embeddings. Under the DLT?s assumption that an indexical pronoun does not introduce a new discourse referent, integrating across such an entity does not generate any cost, accounting for the N3-type e ect. This would imply that there should also be an e ect of N3-type in crossed-serial dependencies. However, as we have seen, making N2 and N3 pronouns had no e ect, contrary to the DLT prediction [Kaa04]. Furthermore, Warren and Gibson [War02b] tested the discourse-referent hypothesis, and did not get the predicted results. In this study, subjects read critical sentences in which an object-extracted RC modi ed the main subject. The subject of the RC was a de nite NP. Whether or not this NP had a referent was manipulated in a contextual sentence presented just before a critical sentence. The presence or absence of a previous referent a ected reading times at the RC?s verb, but not at the main verb. That is, an NP that added a new discourse referent incurred a local cost (at its own verb), but did not 193 a ect processing in the higher clause (at the main verb), contrary to the DLT?s prediction. A local e ect of discourse-referent processing could still potentially explain the N3-type e ect in o -line complexity ratings. However, it can?t account for the interaction of N3-type with the felicity of V2-drop [Whi04d], because this is a non-local e ect concerning higher level clauses (the outer RC and the main clause). The DLT has di culty in accounting for the V2-drop e ect itself. An earlier version of the DLT, the SPLT [Gib98], posited that complexity corresponds to the storage cost of syntactic predictions, not integrations. Under that metric, it was proposed that the parser drops the prediction for the outer RC?s verb due to high memory costs [Gib99]. However, an assumption underlying the SPLT was contradicted by experimental evidence [Gib00, Gib04]; the SPLT was transformed into the DLT, where prediction cost is constant, and integration cost increases with distance. Hence, under the DLT, prediction cost cannot explain V2-drop, since the prediction cost for the outer RC and the inner RC are the same. While it?s true that V2 induces the highest integration cost, this cost is incurred after the verb is encountered, and thus cannot account for dropping the prediction of that verb before it occurs. Furthermore, integration cost at V2 is independent of whether the outer RC is center-embedded or right-branching, but V2-drop is felicitous only when the outer RC is center-embedded [Gib99]. Thus, integration cost cannot account for the V2-drop e ect. The DLT also makes the wrong prediction about complexity in some impor- tant cases. The RC/RC?s high cost results from the summation of the integration costs for the second verb and the rst RC?s gap. However, if these costs are de- coupled, as in the following sentence: 24. The woman who the man who Sue dates irted with hit him. 194 complexity is still very high, while the maximal integration cost is lower. Here the intransitive verb irted signals that the gap for the rst who is not in the object position. So the integration cost of irted is only 3EUs. The integration cost of the gap following with is 4EUs, and the integration cost is of hit is 5 EUs. Thus the maximal cost is only 5EUs. However a much easier sentence like: 25. The fact that the man who Sue is dating rides a motorcycle scares her. has a higher integration cost, of 6EUs (at scares). This analysis depends on the assumption that integration of a gap is not attempted following an intransitive verb, as is consistent with studies on ller- gap processing for intransitive verbs [Bol91, Sus01]. However, if it were argued that such an integration is attempted and does incur a cost, this claim would then destroy the DLT account of the di erence between an NC/RC and an RC/NC. That account hinges on the assumption that there is no long-distance integration of a gap across the RC for an NC/RC. However, there is evidence that the RC possibility for an NC is actively evaluated. In a potential NC, a manipulation of the potential ller?s appropriateness as the verb?s object had an e ect at the verb, indicating that the possibility of a gap is actively considered [Pea98]. Thus, if it were argued that an integration cost for a possible gap is incurred at an intransitive verb, such a cost would surely also apply to a potential NC?s verb. However, in that case, there would be no di erence in integration cost for an NC/RC versus an RC/NC. Nor could it be argued that the possibility of a gap in an NC is dropped in a potential NC/RC due to increased complexity; this would incorrectly predict that a potential NC/RC which turns out to be an RC/RC, such as (26), is uninterpretable. 26. The proposal that the student who Bill advises made at the meeting im- 195 pressed everyone. Another incorrect prediction occurs for Japanese. A sentential complement within a sentential complement (SC/SC) of the form: 27. NP-nom [NP-nom [NP-nom V1 Comp] V2 Comp] V3 has its highest integration cost at V3 = 5EUs. An SC of the form: 28. NP-nom NP-dat [NP-nom NP-dat NP-acc V1 Comp] V2 has its highest integration cost at V2 = 6EUs. However, the SC is easier than the SC/SC [Bab99]. 12.2.4 Summary We have seen that none of the above approaches can fully account for the data. Vosse & Kempen?s model [Vos00] replicates some complexity phenomena, but cannot explain the V2-drop or N3-type e ects. The proposal of interference in working memory [Lew96] cannot explain the pattern of an NC/RC versus an RC/RC or an RC/NC. The DLT metric [Gib00] is based on the distance be- tween items that must be integrated together. It is currently the leading account of complexity phenomena. However, it cannot account for the V2-drop e ect, the interaction of the V2-drop e ect with N3-type, the lack of N3-type e ect in crossed-serial dependencies, and the di culty of an RC/RC when V2 is an intransitive verb. 196 Chapter 13 Parsing Models In this chapter, I review those models that deal more directly with parsing and hierarchical representations. The desiderata for such a model are as follows. Neurobiologically plausible hierarchical representation of thematic roles (the- matic tree). Thematic tree should be suitable for long-term storage. Neurobiologically plausible working-memory representations that support construction of the thematic tree. Explanation of similarity-based interference in working memory. Parsing algorithm for using working-memory representations to construct thematic tree. Parsing algorithm should account for all complexity phenomena not ex- plained by similarity-based interference. First I consider possible solutions to general problems related to representing the thematic tree. Then I review various parsing models. In each section, I discuss how well these models and theories meet the above criteria. 197 13.1 Representation of the Thematic Tree on a Computer I start with a discussion of how a thematic tree would be represented on a computer, and which aspects of such an encoding are neurally plausible and which are not. It is hoped that such a discussion will illuminate the di culties involved in formulating a neurally plausible representation of the thematic tree. 13.1.1 How Computer memory can be conceptualized as an array of registers. A memory address is associated with each register, where memory addresses systematically increase as array position increases. An address allows access to a particular register. High-level computer languages allow a variable name to be mapped onto a memory address. (This mapping is done automatically by the compiler.) Thus items can be stored and retrieved from memory based on variable names. The most fundamental requirement of the thematic tree is that words are associated with thematic roles. On a computer, this is accomplished by creating variables and setting those variables to certain values. For example, an Agent variable could be set to a pattern that encodes Mary. Thus some memory register is labeled Agent and set to a particular value, which represents Mary. To represent a hierarchy, it must be possible to combine multiple bindings to- gether into a unit, and to refer to that entire unit. A high-level computer language allows a data structure, called a record, which groups di erent items together. For example, a record might consist of Theme, Agent, and Verb variables. The compiler maps these variables to consecutive memory addresses. Thus variables are grouped together by putting them next to each other in memory. The memory address of the rst variable can then be used to refer to the 198 entire entity. This is known as a pointer. Thus, a variable could take a pointer as its value, indicating that the value corresponds to the entire unit starting at that memory address. For example to encode (Agent = Mary, Verb = knows, Theme = (Agent = Ted, Verb = likes, and Theme = Sue)), two records are created, each having the Agent, Verb, and Theme variables. Call one record Main, and the other Sub. The variables in Sub would be set to the corresponding values from the embedded clause (i.e., Ted, likes, and Sue). The Agent and Verb in Main would be set to their corresponding values from the main clause, while the Theme would be set to the address of Sub. See Figure 13.1. In summary, binding is done by assigning a particular pattern to a particu- lar memory address. Hierarchy is created by assigning variables to consecutive memory addresses, and referring to the resulting unit by its memory address. Note, however, that the two kinds of structure building operations - binding and hierarchy formation - are not necessarily logically di erent. A binding is an association of terminal items - a word and a role. Hierarchy is created by asso- ciations of bindings with other bindings. Thus, in both cases, two or more items are associated together. In a computer, the basic binding operation corresponds to assigning a pattern to a memory location. This operation cannot be directly recursively applied, because it is not possible to physically assign one location to another location. Rather, to perform higher-level associations, a location is referred to by its memory address. Thus, there is a dichotomy between the way in which a basic binding is implemented and a hierarchy is formed. This dichotomy arises because one component of a basic binding is a physical location; this forces a di erent way of associating bindings with each other, based on referring to a location by a unique identi er (its memory address.) 199 1200 Mary 1232 knows 1264 1392 1296 1328 1360 1392 Ted 1424 likes 1456 Sue Figure 13.1: Example of encoding Mary knows that Ted likes Sue in computer memory. The left column represents memory addresses, which systematically increase. The right column represents registers. The programmer would declare a record having Agent, Verb, and Theme variables. For each instance of this record the compiler would map these variables onto speci c consecutive addresses. Here the record Main starts at 1200 and the record Sub starts at 1392. The value of Main?s Theme variable is a pointer to Sub. Mary, knows, Ted, etc. correspond to numbers that have been associated with each token. (For simplicity, the problem of how to determine whether a register?s value should be interpreted as a memory address is ignored. ) 200 13.1.2 Di erence from Neural Networks In a computer, a central executive governs serial access to memory. In con- trast, in a biological neural network, there are many, simple, massively intercon- nected processing units. Of course, it would be possible to construct a computer- like memory in an arti cial neural network. For example, a set of nodes could be wired together to form register-like group, and such registers could be wired together to form record-like units. Each unit could have an identifying number associated with it (perhaps coded within its connection weights), that would act like a memory address. Thus values could be lled into the record-like units, and the identi er of a unit could be used like a pointer to link together di er- ent units. Marcus [Mar01] has proposed such a scheme, where each record-like unit (called a treelet) encodes hierarchical relationships between the registers. However, given its massive parallelism, it seems highly unlikely that the brain emulates a computer-like architecture. It is plausible that the basic binding operation could be performed in the same way as a computer. That is, a group of nodes could encode a particular role, and a pattern across those nodes could represent the value. For example, a certain group of nodes could be used to represent the Agent, where the activation pattern across those nodes could encode Mary. A di erent set of nodes could encode the Theme, etc. However, this computer-like approach breaks down when it comes to encoding hierarchy. Without memory addresses, it is unclear how items can be grouped into a unit. Two basic approaches have been proposed: combining activity patterns to yield a new pattern, or inducing correlated ring between two patterns. 201 13.2 Possible Neural Network Representations of the The- matic Tree 13.2.1 Production of an New Pattern One approach to the binding problem is to represent each item by a large vector (i.e. a distributed activation pattern over n nodes), and to de ne op- erations which combine two or more vectors to yield a new vector (activation pattern). This new vector could then be combined with other vectors to produce a hierarchical encoding. Touretzky and Hinton proposed a scheme based on the outer product of the two vectors [Tou88]. (That is, the resultant vector is comprised of all pairwise products between the items in the two vectors.) However, the size of the resultant vector is the product of the dimensions of the constituent vectors, giving an unbounded increase in size as more and more bindings are performed. Instead, to avoid exponential explosion and to allow calculations to be performed iteratively over a xed set of cells, the combinatory operation should yield a vector that is the same length as the constituent vectors. Thus the combination is a reduced representation (RR) of the constituent vectors [Hin90]. Reduced Representations that are Learned Pollack [Pol90] proposed a scheme wherein the reduced representation is com- prised of the hidden units? activations in a network trained by back-propagation to auto-associate. See Figure 13.2. Rohde [Roh02] used a similar approach in a system which developed an RR representation of the syntactic structure of a sentence. This encoding could be queried to yield the relationships speci ed by 202 the sentence. Such an approach has the advantage that the rules of processing (the grammar) are learned along with the representations. However, we will see below that such an architecture is not actually robust enough to parse and encode arbitrary hierarchical structure. Reduced Representations based on Statistical Properties A di erent approach is to prede ne combinatory vector operators with the de- sired properties. Under this method, each item vector is large (dimension 1,000 to 10,000) and satis es certain statistical properties; the combinatory operators rely on these statistical properties. Thus item representations do not directly encode any semantic information about an item, but rather act as an abstract represen- tation that allows combination with other items. Plate [Pla95] has proposed a binding scheme for real-valued vectors, based on the convolution of their outer product. Kanerva [Kan95] has proposed a scheme that operates in a bit-wise fashion over binary vectors, where each element has an equal probability of being 0 or 1. Because Kanerva?s scheme is simpler, we will focus on it. Both proposals employ two di erent combinatory operators, corresponding to the bind and group (merge) operations. Let ?@? represent the binding operation, and \+? represent the grouping (merge) operation. In Kanerva?s scheme, the bind operation is bit-wise exclusive-or. That is, the two constituent vectors are aligned; at each position, if only one element is a 1, the result is a 1; otherwise it is a 0. See Figure 13.3. Merging is implemented as a normalized sum of the constituent vectors, by taking a bitwise majority. That is, at each position, if there are more 1?s than 0?s the result is a 1; otherwise it is a 0. Ties (which could arise for an even number of constituent vectors) are broken probabilistically, with equal chance of giving a 0 or 1. 203 Agent Verb Theme Agent Verb Theme hidden layer Figure 13.2: Example of network that learns to form an RR encoding. Each box represents a group of nodes of the same size, and each arrow represents full interconnectivity between two groups of nodes. For each training item, the input and output layers are set to the same value. Using the back-propagation training algorithm, the network learns to recreate the input on the output layer. As a result, the hidden layer (in conjunction with the learned weights) forms a condensed representation of the input. This condensed representation could then be used as one of the values on the input layer. For example, in the Mary knows Ted likes Sue example, the patterns for Ted, likes, and Sue would rst be activated over the corresponding sets of input nodes. The resulting pattern on the hidden layer constitutes an RR encoding of this information. Then the input layer is set to Agent = Mary, Verb = knows, and Theme = the hidden layer pattern. The new hidden layer pattern then represents the encoding of the entire sentence. Such an encoding is decoded by activating the pattern on the hidden layer to get the component values on the output layer. An output item that is itself an RR encoding can then be fed back to the hidden layer again to be decoded. 204 0 1 1 0 1 0 . . . @ 1 1 0 0 1 1 . . . 1 0 1 0 0 1 . . . . . . + 1 1 0 0 1 1 . . . 1 1 0 0 1 0 . . . 1 1 0 0 1 0 . . . 0 1 1 0 1 0 Figure 13.3: Example of bind and merge operations. Of course, inverse operators must also speci ed, so that information can be ex- tracted from an RR encoding. Because composition of two vectors yields a vector in the same representational space, there is compression of the constituent vec- tors, thereby introducing noise. In order to clean up noisy vectors, it is assumed that there is an item memory which stores the patterns of all base vectors. Such a memory could be based on an associative recurrent network. When presented with a vector, the item memory activates any vector that has a similarity measure above some threshold. Given two vectors, similarity is measured as the fraction of elements that have the same value in both vectors. For unrelated vectors, this measure has the expected value of 0.5. That is, because each bit in a vector has equal probability of being a 0 or 1, the probability that the corresponding bit in another unrelated vector has the same value is 50%. The merge operation yields a result that is similar to its constituent vectors. For example, a + b gives a similarity with a of .75 and with b of .75. In contrast, the bind operator yields a result that is not similar to its constituent vectors. Because merge yields a vector similar to its constituents, the unmerge opera- tion is performed by comparing a vector to item memory, to retrieve all similar vectors. The unbind operator is the same as the bind operator - bitwise exclusive- or. When this operation is used for unbinding it will be represented as ?#?. An unbind is then followed by a comparison to item memory. For example, consider 205 the vector a@b + c@d. It is the case that exclusive-or distributes over the merge operation.1 Thus, unbinding with b would give a@b#b + c@d#b = a + c@d#b. The resulting vector is similar to a. It is also similar to c@d#b, but that vector is not stored in item memory. Thus, unbinding with b and comparing with item memory gives a, as desired. Systematic application of the compositional operations allows encoding of hierarchical structure. First let?s consider why two operators are necessary. In order to encode structure, an operator cannot be associative. That is, for some operator , it should not be the case that (a b) c = a (b c), because if this were the case, grouping information would be lost, so hierarchy could not be represented. The merge and bind operators are both associative, so neither one alone can create a structured representation. However, it is the case that (a + b + c)@d 6= a + b + (c@d). This can be seen by expanding rst expression to a@d + b@d + c@d, which does not equal a + b + c@d. Thus the combination of these two operators allows grouping information to be encoded. How can these operations be used to represent the thematic tree? Under such an encoding scheme, a role is not represented by a certain group of nodes, but rather is also represented by a pattern. The representation of our favorite example could be Agent @ Mary + Verb @ knows + Theme @ (Agent @ Ted + Verb @ knows + Theme @ Sue). Here each constituent item is a large, binary vector with the property that each element has an equal probability of being a 0 or a 1. The whole expression denotes another vector with the same dimension and properties. As discussed above, an item vector does not encode any lexical 1That is, a@(b+c+d) = a@b+a@c+a@d. When an odd number of items is being merged, this equality is exact. For an even number of items, a@(b + c) a@b + a@c, because noise is introduced during each merge (to break ties). 206 information about that word. Rather, an item vector could be associated with a particular word via corresponding connection weights which allow activation of a neural assembly which does encode semantic and phonological information about that word. As more and more combinatory operations are performed, the resulting repre- sentation becomes noisier and nosier, until it may become impossible to reliably extract the encoded information. This problem can be solved by recording inter- mediate results. In the above example, the vector Agent @ Ted + Verb @ knows + Theme @ Sue could be saved in item memory. 13.2.2 Temporal Encoding An alternative possibility is to represent relationships by the timing of ring of the constituent items. Shastri [Sha93, Sha99], and Hummel and Holyoak [Hum97] have used synchronous ring to encode thematic binding relations for certain classes of propositions/sentences. For example, Mary is encoded as an Agent by having the nodes representing Mary and Agent re together. Other pairings re synchronously during other time slots. See Figure 13.4. Electrophysiological studies have produced evidence that the brain may indeed rely on a temporal synchrony for some types of binding. Various studies on the visual system have shown that cells representing the features of a single object synchronize, while those representing features of di erent objects do not. Such low-level synchronization occurs in the gamma range [Eng01]. It is unclear if synchronization plays a role for higher level processing, although brain-imaging evidence suggests that verbal working memory relies on a temporal encoding involving oscillatory activity in the theta range [Kli96, Kli99, Rag01, Jen02]. Accordingly, it has been suggested that di erent types of binding rely on di erent 207 Agent Theme Ted Sue Time Figure 13.4: Example of temporal encoding of Ted = Agent and Sue = Theme. The lines to the right of each node represent the ring pattern for that node. For simplicity, each word and role is represented here as a single node. However, the same type of encoding could be used for a distributed representation of each item. 208 frequencies, with low-level sensory binding occurring the gamma band, access to distributed representations in semantic long-term memory relying in the alpha band, and encoding in working and short-term memory occurring in the theta band [Kli96, Kli99]. However, a weakness of the temporal approach for a parse representation is that time is linear. It is not clear how to map hierarchical structure onto a linear encoding in a general manner. While one frequency could be nested inside an- other, it seems that there would be an insu cient number of possible harmonics to represent complex syntactic structure. Another possibility, adopted by [Hum97], is for all of the nodes along some path of the tree to re synchronously. Each path res in a di erent temporal slot, until the whole tree is traversed. However, this raises the issue of how the cells representing the nodes are recruited, and how proper timing is coordinated among these cells. In the Hummel and Holyoak model, timing was coordinated by excitatory and inhibitory connections. Thus the desired tree had to already be directly encoded by the connectivity between cells in order to instantiate the temporal encoding. [Sha93] assumed that tem- poral pairings could be generated on the y. However, they did not address how this was accomplished, nor how nested relationships would be represented. Thus, no one has shown how a temporal encoding of hierarchical structure could be generated on the y in a general way. This requires solving two problems: (1) how syntactic structure is mapped onto the temporal encoding; (2) how this temporal encoding is activated during processing. Another problem is that a temporal encoding is not directly suitable for long-term storage. However, storage could be based on a record of information that would allow a temporal encoding to be re-instantiated. 209 13.2.3 Summary and Conclusions Due to the distributed nature of neural representations, it is likely that bind- ings are represented by associating patterns of activity. There have been two types of proposals as to how this may be accomplished. In an RR encoding, patterns are combined via the creation of a new pattern. This approach seems well-suited to representing hierarchical structure. It also provides a representation suitable for long-term storage, as the resulting pattern can be stored via connection weights. In a temporal encoding, patterns are associated by correlated ring of the activated nodes. It is di cult to see how to encode arbitrary hierarchical structure in such a framework, and the resulting representations are not directly suitable for storage in long-term memory. However, brain imaging evidence suggests that verbal working memory employs a temporal encoding (based on a theta-band carrier wave), and that the same type of working- memory representation may be used during sentence processing. Thus, an RR encoding seems more suitable for representing the thematic tree, but experimental evidence indicates that a temporal encoding may be employed during sentence processing. However, recall that it is not su cient to simply specify how relationships are represented in a thematic tree. It is also necessary to specify how the thematic tree is constructed from a sequence of words. To process center-embeddings and crossed-serial dependencies, stack-like and queue- like representations are required in working memory. Note that such working memory representations are separate from the thematic tree. Thus, it may be the case that working memory relies on a temporal representation which subserves the construction an RR encoding of the thematic tree. 210 Furthermore, a temporal encoding is more appropriate than an RR encod- ing for intermediate representations. In an RR encoding, the constituent vectors loose their identity, because they are combined into a new pattern. In contrast, in a temporal coding, the constituents retain their individuality and remain di- rectly accessible. Because the purpose of working memory is to provide access to individual, previously processed constituents (e.g., unattached subjects resulting from center-embedded clauses), a representation that retains their individuality is desirable. Furthermore, we have seen from the LPE model how to represent order information temporally. Given that processing of center-embeddings and crossed-serial dependencies relies on order information, such a representation may be well suited for intermediate representations. In sum, an RR encoding is well suited for the long-term storage of arbitrary hierarchical structure. A temporal encoding is well suited for representing in- dividual, ordered constituents in working memory, and is consistent with brain imaging data. 13.3 Parsing Models In the previous section, I reviewed how a thematic tree might be represented. In the following, I review model that deal more directly with the parsing process - how a sequence of words is converted into the thematic tree. 13.3.1 SRNs A recurrent neural network has feedback connections. (See Figure 13.5.) Therefore, information from one time step of processing can in uence a sub- sequent time step. Such feedback connections essentially provide a memory that 211 Input Hidden Output Context Figure 13.5: Architecture of a recurrent network. The hidden units connect into the context units, which feed back to the hidden units. Thus the hidden units? previous activations can a ect their subsequent activations. can be used to process time-varying input, such as a sequence of words compris- ing a sentence. Elman [Elm90] showed that Simple Recurrent Neural networks (SRNs) trained on sequences via a variant of the back-propagation algorithm can learn to predict the next item in the sequence. Thus, the network learns the structure of set of sequences (i.e., a grammar). Although some have hailed such results as demonstrating the ability of SRNs to handle natural language [Chr99], such a conclusion is clearly unwarranted. It is insu cient to simply predict upcoming words. Rather, as words are encoun- tered they must be integrated into a representation of the sentence?s meaning. Thus some researchers have created parsing systems that use an SRN-like net- work to learn grammatical rules, supplemented by another network which allows hierarchical representations. One approach used a temporal speci cation of the syntactic tree, in which each constituent was assigned to a separate phase, and the relationships between constituents were encoded by the ring of Parent, Sibling, and Grandparent nodes [Lan01]. However, to represent the structure of a complex sentence, a large 212 number of distinct phases would be required, which is an unrealistic assumption. In another approach, the meaning of a sentence was represented as a list of three-part propositions, and a subset of the system was rst trained to created a distributed (RR) encoding of any sequence of propositions [Roh02]. For example, the encoding of: 29. Jim knows Sue likes big dogs. is: (knows, agent, jim) (knows, theme, likes) (likes, agent, sue) (likes, theme, dogs) (dogs, mod, big) Here, each word denotes a distributed encoding, based on semantic features. Note that this representation does not directly encode hierarchical structure; rather, structure is inferred from matching words. However, if a word is repeated in a sentence, this will lead to ambiguity. For example, the encoding of 30. Jim knows Sue knows Don. is: (knows, agent, jim) (knows, theme, knows) 213 (knows, agent, sue) (knows, theme, don) This encoding could mean Sue knows Jim knows Don. This problem could be remedied by associating an identi er with each instance of a word, requiring an additional mechanism. In contrast, if hierarchy is directly represented, this is not a problem, as repeated words are di erentiated by their positions in the hierarchy. Once the RR encoding mechanism was trained, the system was then trained to produce such an RR encoding for a sentence, as follows. The propositions representing the meaning of a sentence were fed through the system to get their RR encoding. Then the words in that sentence were presented, and the SRN part of the system was trained to produce that RR encoding in response to that sequence of words. Following training on a wide range of sentences, the repre- sentational ability of the system was tested by presenting a novel sentence, and querying the resulting RR encoding on each proposition comprising the meaning of that sentence. While the model showed good generalization abilities for simple sentences, performance rapidly deteriorated for more complex structures. For example, the average error rate for a sentence with six propositions was approximately 10%, according to the less stringent multiple-choice criterion. That is, the response to a query was counted as correct if it was closer to the correct response than to other distractor items. This error rate is per proposition, so the probability of correctly encoding the entire sentence is (1:0 0:1)6, about 50 %. For eight- proposition sentences, the error rate was about 20%, so the probability of correctly representing a sentence was only about 15%. Error rates on center-embedded structures were particularly high. For a four- 214 proposition sentence having a subject-modifying RC, the error rate was 25%, so the probability of correctly encoding the entire sentence was about 30%. In contrast, humans can act out the meaning of such sentences with 90% accuracy [Cap96]. Thus, these simulations have not demonstrated that such a connection- ist system is capable of developing representations than can parse and encode complex syntactic structure. While the inability of recurrent networks to handle center-embeddings has been touted as a desirable feature because humans also have di culty with center- embeddings [Chr99, Ore00], we have seen that some doubly center-embedded clauses are actually rather easily processed by humans [Gib98]. Thus SRN-based parsers have not demonstrated the strong generativity nec- essary for handling natural language. This problem is especially acute for center- embeddings, but is also present for right-branching structures. Because such systems do not explicitly model variables and recursive processing, embeddings cannot reliably be processed. 13.3.2 LTSMs Some researchers have investigated networks more suitable for processing center-embeddings. The primary reason that an SRN has di culty in processing center-embeddings is that previous information becomes more and more degraded at each time step. This degraded information cannot then su ciently in uence the error measures that drive learning in the back-propagation algorithm. Thus, the network has di culty in learning the correspondence between widely sepa- rated items (such as the main subject and main verb in a sentence with a center- embedded clause). One way to solve this problem is to provide separate, gated blocks of context units (i.e., registers). That is, each register has its own gating 215 network. When the gating network is \open", information can be written to a register. When it is \closed", the information in a register cannot be overwritten, but register activations can still drive the learning process. Thus information can be held over time without becoming degraded. This architecture is called a Long Short-Term Memory (LSTM) network [Hoc97]. Gers and Schmidhuber showed that an LSTM network can learn to predict the upcoming token for strings of the form anbn [Ger01](corresponding to center- embedded clauses, as discussed in section 11.2). The network could generalize to larger values of n that it had been trained on, because it had learned to implement a counter in a register. For each a, activation within the register increased by a xed amount. When a b occurred, activation was decreased by the same amount. Thus, an end-of-string token was predicted when the counter reached 0. Again, this system makes predictions, without forming a structured represen- tation. To create a structured representation, a counter would not be su cient. Rather the system would have to also learn to store previous a values, to be inte- grated with the appropriate b values to form some type of hierarchical encoding. For example, each a value could be stored in its own register. If this approach were taken, the network would not be able to generalize to higher values of n than it had been trained on, because it would have to learn to store each a in a particular, separate register. As two center-embeddings does seem to be the limit for humans, this is not necessarily a problem. However, it is unclear how such a system could account for the complexity phenomena. For example, if such a system could parse an NC/RC it would probably be able to parse an RC/NC without di culty, contrary to human performance. Furthermore, parsing based on a counter mechanism is suspect. Linguists have noted that natural languages are all based on structural grammatical rules, 216 rather than numeric rules [Cho80, Ber84]. For example, no language has a rule than applies to the nth word of a sentence. The SRN and LSTM have the advantage that grammatical rules are learned, based on the input characteristics. However, linguists have forcefully argued that children possess grammatical knowledge that cannot be derived solely from the statistics of the input [Cho59, Jac02]. It may be the case that an LSTM-like system in the brain learns to parse language based on some pre-existing linguistic primitives. For example, the ability to manipulate stack-like representations may be innate; an LSTM-like system may then learn what operations to perform when. O?Reilly and Frank [Ore03] have proposed a LSTM-like system which uses neurobiologically plausible supervised and reinforcement learning. The reinforce- ment learning part of the system (modeled after the basal ganglia) learns to gate relevant information into registers, while the supervised learning part of the sys- tem (modeled after the prefrontal cortex) learns what transformations to perform on that information. Such a system has good potential for learning grammatical rules (which would operate over pre-existing linguistic representations). 13.3.3 Pulvermuller The previous models used distributed representations, and learned from ex- amples. The present model uses localist representations, with hand-coded weights and activation functions. In [Pul03], Pulvermuller presents a proposal for how a grammar could be neurally implemented. This proposal is based on a set of pairwise sequence detectors. A sequence detector is activated by A followed by B, not B followed by A, where A and B are syntactic categories activated by word nodes. This mechanism is based on nodes that support di ering activation states (inactivation, primed, ignition, and reverberation) and on connections of di ering 217 S A B strong connection weak connection Figure 13.6: Example of detector S which recognizes sequence A B, from [Pul03]. strengths (strong, weak). Essentially, external input leads to ignition and then reverberation. Nodes that receive reverberating input from strong connections enter the primed state. If a node is already in a primed state, and receives a second volley of reverbatory input (over any connection strength), it too ignites. These dynamics allow the sequence node S in Figure 13.6 to become ignited for A then B, but not B then A. If A is activated rst, node S is primed, and then receives reverbatory input when B is activated, leading it to ignite. However, if B is activated rst, node S is not primed (due to the weak connection), and then A fails to ignite S. Pulvermuller proposes that grammars are based on sets of such pairwise se- quence detectors. He also assumes that there are di erent levels of reverberation, which could be used to implement stack-like processing. Thus this work focuses on how word sequences could be recognized. Given the localist representations assumed in the model, it is di cult to see how a neurally plausible representation of hierarchical structure could be obtained. 218 13.3.4 Summary These models have focused on demonstrating the capacity to parse to natural language, and cannot account for the detailed pattern of complexity phenomena discussed in the previous chapter. In Rohde?s model [Roh02], the thematic tree was represented as an RR encoded list of triplets, where the RR encoding was developed via learning (back propagation). The grammatical rules for construct- ing this representation from a string of words were also learned. However, the trained network could not reliably process complex sentences, showing particu- lar problems with center-embedding. Learning of grammatical rules based on a LTSM-like model [Hoc97, Ore03] could potentially produce better results. In order to allow strong generativity, such learning should be based on grammatical primitives. Localist models have not demonstrated how hierarchical structure could be represented in a neurally plausible fashion [Vos00, Pul03]. 219 Chapter 14 The TPARRSE Model Having discussed the neurobiological and experimental constraints on parsing, and related research, I now turn to my proposed parsing model. As we have seen, there are actually three somewhat independent aspects to parsing: (1) the rules which determine what to do with an incoming word, based on the previously processed words and the grammar of the language; (2) the working memory representations that support the application of the grammatical rules to form the thematic tree; (3) the representation of the thematic tree itself. As discussed previously, the model focuses on neurobiologically plausible accounts of (2) and (3). For now, the algorithm that operates over these representations is considered at a symbolic level. Also as discussed, this is a theoretical model. It has not been implemented in full. Rather the representations and connectivity are speci ed based on computational principles. I start with a brief overview of the model, and then specify the model in detail. As discussed in section 13.2.3, an RR encoding is suitable for representing the thematic tree, in that it allows representation of hierarchy, and is suitable for long-term storage. However, it is less suitable for a working memory representa- tion, because individual constituents are no longer directly accessible, due to the distributed nature of the representation. In contrast, a temporal encoding retains 220 the individuality of constituent items, but is less suitable for representing hierar- chy and for long-term storage. Therefore, a temporal encoding is more suitable for the working memory encoding. This assumption is in line with EEG evi- dence for oscillatory phenomena associated with holding items in verbal working memory. Thus, the model proposes a dual representation - a distributed RR encoding of hierarchical structure which is generated from a temporal working-memory representation. The model consists of the following speci cations: The basic RR encoding operations (RR primitives) How the thematic tree is represented via RR primitives The basic WM operations (WM primitives) How syntactic information is recorded via WM primitives The parsing algorithm that operates over the WM primitives to create the RR encoding of the thematic tree. The resulting model is dubbed TPARRSE (Temporal Parsing And Reduced Rep- resentation Semantic Encoding). The underlying principles of the model are best presented incrementally. For unambiguous, right-branching sentences, the thematic tree can be produced with- out reliance on the temporal WM representation. Therefore, I rst present the RR encoding, and the portion of the parsing algorithm that handles right-branching sentences. I then present the temporal WM encoding, and the full parsing algo- rithm. 221 14.1 RR encoding Unlike models in which the system is trained to form an RR encoding, I assume that the RR primitives are innate. This has several advantages. (a) It provides a systematic way of representing structure. Therefore, it is possible to encode any complex set of relationships, because the combinatory operators can reliably be recursively applied. (b) Innate combinatory operations allow di erent brain areas to use a uniform representation. It is a general feature of neural processing that task performance is distributed across di erent brain areas. If each brain area involved in parsing were to develop its own RR encoding, this would make communication between areas much less e cient. (c) Speci c properties of the RR encoding can be exploited in the parsing algorithm. Therefore, I assume that the thematic tree is represented via an RR encoding such as those proposed by Plate and Kanerva, as discussed in section 13.2.1. In Chapter 15, I present an example using Kanerva?s system. This system was chosen because it is simpler. However, the general properties of the primitives are common to both systems. The parsing model is based on these general properties, as described next. 14.1.1 Primitives Each terminal item (word, morpheme, or thematic role) is represented by a large vector with certain statistical properties. There are two combinatory operations: bind (@) and merge (+). Merge creates a vector that is similar to the constituent vectors, while merge creates a vector that is not similar to the constituent vectors. In the following, vectors will be given in boldface. There is an item memory which records the identity of all terminal items. 222 When a vector is compared to item memory, item memory returns all vectors which are more similar to the comparison vector than would be expected by chance. Thus unmerging is performed by comparing a vector to item memory. In addition there is an unbind operation (#), such that a@b#b = a. Under the decoding operations, bind distributes over merge. That is, using unbind and unmerge to decode a@(b+c) gives the same result as decoding a@b + a@c. As we see in section 14.1.3, this distributivity is crucial in allowing incremental construction of the thematic tree. In section 13.2.1, Kanerva?s speci cation of such system [Kan95], which relied on binary vectors, was presented. This system has the drawback that the bind and unbind operators are the same as each other. Thus, it is not possible to represent a@b@b because this gives a. Plate [Pla95] presented a more complex system based on real-valued vectors, which does not have this drawback. However, it relies on high precision in activation values, which may not be realistic. As discussed above, the TPARRSE model relies on the general properties given here. Plate and Kanerva?s proposals are existence proofs that systems with these general properties exist. The proposal is that the brain uses an encoding with similar properties, although this encoding may not directly correspond to either system. 14.1.2 Representation of the Thematic Tree Next we consider how these RR primitives are used to represent the thematic tree. This section focuses on how information is encoded, not how the repre- sentation is created. (The latter topic is discussed in the sections specifying the parsing algorithm.) We rst discuss some basic linguistic de nitions, and then specify how these 223 ideas are implemented in the RR encoding. In analyzing language, verbs are often thought of as functions, or predicates, which take arguments. For example, the verb loves is a function that takes Agent and Theme arguments, and speci es a relationship between those two entities. The number and type of these argument categories are determined by the predicate. Love takes two arguments, while a verb like sleep only takes one (the Agent). Other parts of speech are also predicates and can impose similar restrictions. For example the adverb because requires an entire clause or proposition for syntactic and semantic completeness: 31. *Because John, ... Because John is sick, ... A category whose occurrence is not restricted by any semantic feature of another category, but rather can co-occur with any member of a certain part-of- speech class, is called a modi er or adjunct. For example, the time modi er on Tuesday may appear with any verb. 32. I slept on Tuesday. I loved the movie on Tuesday. Recall that the bind operator creates a new item which is not similar to its constituent items. Therefore, it is used to represent the predicate-argument rela- tionship. Because the resultant vector is unlike the constituent vectors, the bind operation encapsulates the constituent vectors, allowing a hierarchical encoding of argument structure. In contrast, the merge operator is used to join together items within a clause, such as a verb?s arguments, or an argument and an adjunct. For example, the RR encoding of: 33. Sue kissed Bill on Saturday. 224 is: sue + kissed@bill + Vmod@on@saturday This encodes that sue is the subject of kissed, and Bill is the object of kissed 1. Saturday is the object of on, and the prepositional phrase (PP) on Saturday modi es the verb kissed. The verb?s arguments and adjunct (the PP) are joined together by the merge operator. Verbs and prepositions are bound directly to their objects, as the presence of these categories is predicted by the semantic properties of that verb or preposition. The subject of the sentence remains unbound. Other semantically determined relationships are represented by special prede ned items (identi ed with capital letters). For example, a PP which is a verb adjunct is bound to the prede ned item Vmod. As we discuss in section 15.1, in order to decode an RR encoding, it is nec- essary to know the identities of the predicates. Therefore, each predicate is also bound to a prede ned \hook" P, which can be used to retrieve the predicate. Thus the encoding of (33) is actually: sue + kissed@(P + bill) + Vmod@(P + on@(P + saturday)) For brevity, we will continue to use the notation predicate @ argument to mean predicate @ (P + argument). If a verb does not have an object, it is only bound to P. In this case, P will be given explicitly. In a passive sentence, the subject is the Theme and not the Agent, as in the sentence: 1The representation of kissed would actually be Past + kiss @ (...) . However, for brevity, I will treat verbs as unitary items. 225 34. Bill was kissed by Sue. In this case, the verb is bound to a trace of the subject, denoted Subj, and the Agent is expressed as a predicate giving: bill + kissed@Subj + Agent@sue This encodes the thematic roles of the verb?s arguments, while retaining the information that the subject Bill is the focus of the sentence. A ditransitive verb, such as gave, requires an additional thematic role, the Goal. For example, in the sentence: 35. John gave Bill the dog. Bill is the Goal. The RR encoding of the sentence is: john + gave@(the + dog) + Goal@bill If an argument is itself a clause, the same encoding rules recursively apply. For example, in: 36. Mary said that the man arrived from the beach. the Theme of the verb said is the sentential complement the man arrived from the beach. The RR encoding is: john + said@(the + man + arrived@P + Vmod@from@(the + beach)) Here said is bound to the encoding of its sentential complement. 2 2We only consider sentential clauses whose complementizer does not add additional seman- tics. That is, we don?t consider sentences involving wh-movement, such as John knows when Mary came. Such sentences would have to be handled di erently, in order to include the com- plementizer. 226 A traditional syntactic tree uses geometry to encode semantic dependencies. For example, all phrases which constitute a clause C lie below the clausal node representing C. In the RR encoding, we delineate such relationships by using the term enclosing scope. This refers to the items to which an item x is bound. The enclosing scope determines where x is attached in the thematic tree. For example, in the sentential complement above, the phrases the + man, ar- rived@P, and Vmod@from@(the + beach) all have the enclosing scope said. This gives an implicit representation of co-constituency. Because Vmod@from@(the + beach) has the same enclosing scope as the verb arrived, it modi es that verb. The enclosing scope of the PP from@(the + beach) is Vmod, indicat- ing that it modi es a verb. If this PP were not bound to Vmod, it would be associated with the subject of the complement clause instead, yielding a reading equivalent to: 37. John said that the man from the beach arrived. Because bind distributes over merge, the RR encoding of the above example is equivalent to: john + said @(the + man) + said@arrived@P + said@Vmod@(the + beach) Thus, an RR encoding is comprised of NPs having various enclosing scopes; each NP?s enclosing scope speci es its role. For example, the enclosing scope of the + beach is said @Vmod@from, indicating that this NP is the object of the preposition from, and this PP modi es the verb having the enclosing scope said. Therefore, an RR encoding can be incrementally constructed by maintaining the enclosing scopes that apply to each NP or clause, as we see next. 227 14.1.3 Generating the RR encoding We now turn to how we generate the RR encoding incrementally during pars- ing. For now, we address unambiguous, right-branching sentences. A right- branching clause begins to the right (i.e., at the end) of the parent clause. These structures are easy to process because there are no incomplete dependencies in the parent clause when the embedded clause is introduced. Therefore, the RR encoding of such sentences can be produced directly from the input, without relying on the temporal WM representation. Two Stages of RR encoding Within the RR portion of the system, there are two stages. The rst stage groups together words into phrases. When a new word signals the conclusion of a category, its RR encoding and its syntactic type are passed to the second stage. Thus the types of items received at the second stage are: noun phrases, adjective phrases, verbals (verb plus auxiliaries), adverbs, prepositional phrases modifying verbs, complementizers (which introduce complete embedded clauses), relative pronouns (which introduce relative clauses), and conjunctions. The second stage uses the syntactic information to incrementally attach the rst-stage pieces into higher level structure, yielding clauses. The assumption of multiple stages of RR encoding is driven by several fac- tors. Processing of phrases qualitatively di ers from processing of clauses in that phrases can be parsed via a nite-state machine, while clauses cannot. That is, phrases cannot be center-embedded within one another. For example, consider the adjective phrase pretty in pink and the noun phrase the girl. It is not possi- ble to say the pretty in pink girl. It is computationally more e cient to process 228 phrases di erently than clauses because phrases can be processed via a more re- stricted mechanism [Abn89]. The segmentation of sentences into phrases is also supported by prosodic patterns [Abn95]. Furthermore, this two-stage approach is reminiscent of the dynamic program- ming schemes proposed in symbolic, context-free processing systems [She76]. These systems decouple the processing of the internal details of a phrase from the determination of the hierarchical position of the phrase in the tree. Intuitively, this captures the insight that a category is likely to have the same internal struc- ture irrespective of where it is attached in the tree . If reanalysis is required, we save processing resources if reprocessing the internal details of the category is not part of restructuring the category as a whole within a tree [She76, Lew95]. This is another reason that is computationally more e cient to process phrases separately from clauses. The distinction between phrasal processing and clausal processing is borne out via reanalysis phenomena. When processing a noun phrase, its internal structure can easily be revised as more information becomes available. For example, the interpretation of the brown dog is readily changed if the word house follows. However, once processing of a noun phrase is complete, is di cult to restructure that phrase. Consider [Mar80]: 38. The cotton clothing is made from grows in Mississippi. Once the cotton clothing has been detected as a noun phrase, it is di cult to rein- terpret it as the NP the cotton followed by a relative clause starting with clothing. In this unusual case, the detection of a phrase boundary fails, yielding a di cult reanalysis. These phenomena are consistent with a parsing strategy that uses a greedy algorithm to process phrases (i.e., if the next word can be incorporated 229 into the current phrase, do so), followed by a separate process that composes phrases into clauses [Abn91]. In the remainder of this article, we concentrate on how these phrasal building blocks are combined in the clausal stage of analysis.3 RR Processing Units The RR encoding of a sentence is generated clause by clause, in order to minimize the amount of information stored in WM. In addition to the temporal representation of syntactic structure, WM contains \variables" that encode other information needed for the parsing process. Each variable corresponds to a neural area dedicated to representing speci c information. The variable CurRR holds the RR encoding of the clause currently being processed, CurSc holds the enclosing scope within the current clause, and TotSc holds the enclosing scope for the current clause as a whole. TotRR holds the RR encoding of the entire sentence. A sentence is processed as follows. A verb, thematic role, or Vmod is stored in CurSc. Each NP or PP is bound to CurSc and the result is merged with CurRR, forming the encoding of the current clause. When the current clause is complete, it is chunked: CurRR is bound to TotSc, and the result is merged with TotRR; if appropriate, CurSc and a clausal predicate are incorporated into TotSc. At the conclusion of the sentence, TotRR holds the encoding of the 3This is not to say that there is no interaction between the two stages. For example, in processing the sentence John gave her earrings, the rst stage could be\aware"that consecutive arguments are expected after gave. Therefore, it is more e cient to interpret her earrings as two NPs (where her is an accusative pronoun) than as a single NP (where her is a possessive pronoun), because the former possibility completes the argument structure of the verb. Thus it is assumed that clausal information could in uence processing in the rst stage. 230 entire thematic tree. This algorithm is speci ed in more detail in Figure 14.1.4 An Example Table 14.1 presents the processing of the following sentence: 39. Sue likes the vase that Joe bought. At the conclusion of the sentence : CurRR = sue + likes@(the + vase) + likes@C@(joe + bought@(the + vase)) C is a prede ned predicate applied to an embedded clause which is not a verbal or adverbial argument, such as a relative clause. Because bind distributes over merge, this is equivalent to: sue + likes@(the + vase + C@(joe + bought@(the + vase)) Thus, placing the encoding of the RC within the enclosing scope likes attaches it to the NP which is also in that enclosing scope, namely the + vase. Therefore, upon encountering the RC following the vase, it is not necessary to alter the existing RR encoding of the vase in order to convert it into an NP modi ed by an RC. Rather, merging of new information is all that is required. Thus the speci c form and properties of the RR encoding allow incremental construction of the thematic tree. Once categories are bound to each other and RR encoded, they are not directly decomposable, but operations can apply to the vector as a whole. For example, CurRR can merge as a unit to TotRR. 4This algorithm assumes that the verb is in the active voice. The passive voice will be addressed in future work on reanalysis. 231 x CurSc CurRR TotSc sue sue likes likes sue the+vase likes sue + likes @ (the + vase) that likes@C joe joe likes@C bought bought joe likes@C the+vase bought joe + bought @ (the +vase) likes@C Table 14.1: WM variables after each item x is processed from sentence 39. The relative pronoun that introduces the predicate C and starts a new clause, giving TotRR = sue + likes@(the + vase). It also causes its referent, the + vase, to be stored, so that it can be accessed when a gap is encountered. During processing of the relative clause, the parser determines that the object of bought is a gap, corresponding to the referent of the relative pronoun. At the end of the sentence, chunking is invoked, yielding the nal value of TotRR given in the text. 232 In addition to the processing outlined here, there must also be integration of grammatical and semantic features. For instance, the subject must match the verb?s features, such as number, person, and animacy. The details of these integrations are beyond the scope of the article. However, we do assume that such information must be available, and that the RR encoding of a phrase allows access to these features. It is not always the case that a clause following a verb is part of its argument. Consider: 40. Mary kissed Bill when Joe won. Here a main clause is followed by an adverbial clause. The enclosing scope of the adverbial should indicate that the attachment point is outside of the verb phrase. 5 The desired RR encoding is: mary + kissed@bill + when@(joe + won@P) In this case, kissed should not be transferred from CurSc to TotSc after when is encountered. Thus, when a new clause is initiated with an adverb, CurSc is erased without being incorporated into TotSc. 14.2 Temporal Working Memory As discussed in section 11.2, stack-like functionality is necessary for parsing center-embedded clauses. I propose that a serial list provides this functionality. The serial list is based on the same principles as the serial representation of letter 5Syntactic tests into possible co-referents for pronouns in the adverbial clause show us that the adverb should be attached outside of the verb phrase in what linguists refer to as an adjunction structure. 233 /* Initialize*/ set WM variables to empty /* process input */ for each item x if (current clause is complete) /* chunk */ /* integrate current clause */ TotRR = TotRR + TotSc @ CurRR CurRR = empty /* integrate current scope */ (if x is not an adverb) TotSc = TotSc @ CurSc CurSc = empty end if if start of new clause /* branch */ /* integrate new scope */ if (x is a relative pronoun) TotSc = TotSc @ C else (if x is an adverb) TotSc = TotSc @ x end if /* integrate x itself */ if (x is a verb) CurSc = x else if (x is a PP) CurSc = Vmod if (x is an NP or PP) CurRR = CurRR + CurSc @ x end for Figure 14.1: Basic algorithm for generating the RR encoding of a sentence having only right-branching clauses. 234 order used in the SERIOL model. In the TPARRSE model, two lists are used, and the relative timing of ring across lists encodes syntactic information. Such a representation could be used like a stack, or could be used to parse crossed-serial dependencies. I rst present the basic operations on a serial list, and then discuss how syntactic information is represented. Then we will be in a position to see how temporal WM is used to parse center-embedded clauses. 14.2.1 Primitives The neural substrate of a temporal list is the same as in the SERIOL model, described in section 6.2.2. For ease of presentation, I will rst consider list items that are comprised of a single node. Then vector list items will be addressed. Nodes that represent list items oscillate in synchrony and send lateral inhi- bition to each other [Lis95]. Timing of ring of an oscillatory node is driven by input level. A high input level allows a node to re near the trough of the cycle (where excitability is low). Lower input levels push ring later into the cycle, because ring is delayed until excitability increases enough to cross threshold. It has been proposed that an after-depolarization (ADP) can maintain short-term memory across oscillatory cycles in the absence of external input [Lis95]. The ADP is a slow, steady increase in excitability observed in cortical cells follow- ing spiking, peaking at approximately 200 ms post-spike [And91]. The temporal gradient of the ADP can maintain the ring order of elements across oscillatory cycles, as demonstrated in a simulation [Lis95]. For example, consider nodes A, B, C ring in sequence during one oscillatory cycle. During the next cycle, node A will have the highest ADP (because its ADP has been increasing for the longest period of time), and node C will have the lowest ADP (because it red most recently). Therefore, node A will cross ring threshold rst, then node B, 235 A B C A B Time A AB C Figure 14.2: Illustration of timing of ring of list elements A, B, and C. Each new element is activated at the peak of the oscillatory cycle. Previously activated items move forward with respect to the cycle, due to the ADP. Over time, A, B,and C come to re successively within a single cycle. then node C. Thus, the ring order is preserved across cycles, providing a work- ing memory. The lateral inhibition between nodes is required to maintain this sequential ring. If this lateral inhibition is removed, nodes A, B, and C will eventually start to re at the same time as each other. How then is the initial ring pattern established ? As long as a node is rst activated after all active nodes have already red (near the peak of the oscillatory cycle), the correct ring order will be maintained. In successive cycles, each newly activated item will re earlier within the cycle (as a result of the ADP), until it can re no earlier, due to lateral inhibition from previous node, or to reaching the trough of the cycle (for the node activated rst). See Figure 14.2. Thus, one basic operation is the Append operation, which adds a new item to a list by activating the corresponding node(s) during the peak of the oscillatory cycle. The new item then comes to re after all of the previously active items. Read out from a list occurs implicitly during every oscillatory cycle, as each item res. This ring could be used to drive other computations. It is assumed that any such computations are always activated during the trough of the oscil- latory cycle, so as to recover the full list. The proposed list items are not single units, but rather large, binary vec- 236 tors. I propose that a bank of cells exists for each vector position. A vector is represented by synchronous ring across these banks. All vectors in a list are represented by these same banks of cells. That is, a subset of cells in each bank is activated for each vector. Thus, for a 1 (or 0) in the same position in two di er- ent vectors, di erent subsets of cells from the same bank are active on di erent temporal subcycles. Each bank of cells consists of two populations, one popula- tion representing 1, and the other population representing 0. I assume that each subset is activated stochastically from the population of cells; it is possible for a cell that is already representing an item x to be recruited to represent a di erent item y. However, it is unlikely that all cells representing x will be reassigned. Thus, the activation of a new item can reduce the activation level of previous items somewhat. The two populations of cells within a bank reciprocally inhibit each other through fast connections, because the two possibilities (0 or 1) are mutually ex- clusive within a single sub-cycle (i.e. within a vector position). Cells across banks (positions) send fast but weak excitation to each other, to promote nearly syn- chronous ring within a subcycle, preventing ring drift across vector positions. Cells also inhibit one another across banks via slower inhibitory connections. These slower inhibitory connections create the sub-cycles and maintain sequen- tial ring. Thus the fast excitatory and inhibitory connections serve to coordinate ring within a vector (sub-cycle), while the slower inhibitory connections serve to keep di erent vectors separate from one another (across sub-cycles). See Figure 14.3. Indeed, Abbott [Abb91] has demonstrated that a network with fast excita- tion, and fast, as well as slow, inhibition allows convergence to a series of attractor states. The fast connections allow convergence to a stored pattern, while the slow inhibition deactivates that pattern, allowing formation of a new pattern. In the 237 Abbott model, the patterns are determined by connection weights. In the present model, the patterns (vectors) are determined by ADP level, and the slow inhi- bition serves primarily to separate patterns, rather than to directly deactivate patterns.6 Thus oscillatory cells with the above inter-connectivity allow a repeating en- coding of a sequence of vector items. It would be possible to have another, separate set of oscillatory cells with the same inter-connectivity (within that set). Thus separate temporal lists could be maintained. If the cells oscillate in syn- chrony across lists, this would allow synchronization of ring across lists. For example, the rst item activated on list A (denoted A1) would come to re at the trough of the oscillatory cycle, as would the rst item activated on list B (B1). Thus A1and B1could be initially activated at di erent times, but would come to re in synchrony. The other important memory function is the Delete operation. An list item is deleted via inhibition. If all active items are to be deleted, a general inhibitory signal could be broadcast. However, if a subset of the active items is to be deleted, the speci c items to be inhibited must be identi ed. How might this occur? One possibility is deletion is based on serial position. That is, it could be recorded that deletion should start at the nth item on a list. However, this would require the addition of a counting mechanism. Furthermore, such a mechanism is suspect, 6The speci cation of separate populations representing 0 and 1 is contingent on the particular RR encoding used in the model (based on binary vectors). However, the general scheme of an array of cell banks is applicable to any type of RR encoding, although it would be more di cult to maintain a WM memory representation if the activation level within a position mattered. In the current scheme, a cell is either active or not; the positional value is determined by which cells are ring (0 or 1). If the positional value depended on activation level, then both activation level and timing of ring would have to be preserved in the WM representation. 238 1 3 2 2 1 1 3 3 1 1 2 2 3 1 3 3 2 2 N N+1 N+2 0 1 Fast excitatory Fast inhibitory Slower inhibitory Figure 14.3: Proposed architecture for a WM list, illustrated for positions N to N+2. In this example, 100, 110, and 001 are encoded across those positions on successive oscillatory subcycles. Each large circle represents a bank of nodes coding for the same value and position. A subset of those nodes is shown by the small circles. Each column represents a vector position. The top row encodes 0?s, while the bottom row encodes 1?s. The number in each node re ects the oscillatory subcycle in which in res. Fast connections coordinate ring within a sub-cycle, while slower inhibitory connections separate subcycles. 239 as linguists have observed that natural language seems to crucially eschew use of counting predicates [Cho80, Ber84]. Rather, natural language operates under structural constraints. Therefore, it is assumed that partial deletion from a list is based on the structural identity of a list item. That is, when a list item that will later require a partial deletion is rst activated, the syntactic features of that item are stored in a WM variable. When partial deletion is required, inhibition is triggered when the currently ring list item matches the stored value. Compari- son between activation patterns is generally taken to be a fundamental function of neural networks. Thus identi cation based on identity does not require any additional mechanisms. Further details of this deletion process will be discussed in the following section. 14.2.2 Representation of Syntactic Information We have seen how separate, but synchronized, temporal lists could be neu- rally instantiated in WM. I propose that working memory uses such lists to store syntactic information about all incomplete clauses. This parallel representation of the sentence serves two purposes. (1) When the RR encoding cannot be di- rectly generated from the input (in the case of center-embedding or crossed-serial dependencies), it allows processing of interrupted clauses to be re-instantiated. (2) When a parsing error occurs because the wrong choice was made at a point of ambiguity, the temporal encoding allows an alternative RR encoding to be generated (in some cases). The present work focuses on the rst possibility. The second will be addressed in future work. I propose that syntactic structure is encoded by employing two separate lists - one for noun phrases, and one for predicates and verb adjuncts. The lists are synchronized so that items which are in the same position in di erent lists re 240 at the same time. The relationship between an NP and a verb is encoded by their relative ring times. Subjects and adjuncts re asynchronously with their verbs (with subjects ring prior to their verbs in English), while objects re synchronously with their verbs. For example, the WM representation of (41) is: sue bill E called on@Monday 41. Sue called Bill on Monday. The top row is the noun list, and the bottom row is the verb list. Each column represents a temporal slot. That is, the rst items in each list re together, then the next items, and so on. The ller item E occupies the rst slot on the verb list in order to establish the proper ring relationships. A PP which modi es a verb is recorded on the predicate list. Each item on the lists is accompanied by a tag eld, which records syntactic features. Recall that the rst stage of RR processing returns syntactic information along with the encoding of the phrase. The syntactic information speci es the type of the phrase; this type determines which list the phrase is Appended to, and is used to update the internal state of the parser. When the lists are used to generate an RR encoding, it is necessary to access this syntactic information so that the parser can return to the correct internal state. A tag- eld item is not a vector, but rather is a unitary feature. For example, the encoding of 41 with the tag elds is: NP NP sue bill E called on@Monday E V PP 241 where the corresponding tag elds are given in the outer rows. NP, V, etc. represent that nodes representing that feature are active during that temporal slot. The start of an embedded clause is also marked in the tag eld. For example the encoding of (42) up to fell is: E NP NP, Cl, GapReq Gap E the + vase sue Gap on@wednesday E E bought PP E E V 42. On Wednesday, the vase that Sue bought fell. Cl marks sue as the start of an embedded clause, and GapReq indicates that a gap is required (i.e., it is a relative clause). If the relative clause is subject extracted, the verb is tagged instead. For example, the encoding of (43) is: E NP NP, Cl, Gap E the +woman bill on@wednesday E knows PP E E 43. On Wednesday, the woman who knows Bill ... This unambiguously and e ciently encodes a relative clause with a gap in the subject position. The proposed temporal encodings can be generated by sequentially Appending each item as it is received to the proper list (under the assumption that a subject also generates an Append(E, verb list) and a PP generates an Append(E, noun list)). Recall that each item comes to re as early as possible on its respective 242 list. Thus the rst items on both lists will re together, then the second items, etc. As discussed in section 13.2.2, the problem of how to represent a hierarchical structure temporally is a di cult one. There must be a mapping from a two- dimensional structure (a tree) onto one-dimensional structure (time). Given this mapping, there must be a way to initiate the correct ring times on the y. Our proposal for the structure of WM solves these problems. Because the oscillatory cycle provides a reference frame, two items that are activated at di erent times on di erent lists can come to re synchronously. Because the order of ring is maintained across cycles, information can be encoded by the sequence of ring, not just by the synchrony of ring. Because each list item is an RR encoding, this allows structure to be represented within a single temporal slot. Thus this scheme solves the problem of how to temporally represent arbitrary hierarchical on the y. This temporal representation maintains the identity of phrasal subcomponents, allowing them to referenced separately from each other. In the next section, I discuss how this representation is used for parsing of center-embedded clauses. 14.3 Processing Center-embedded Clauses Now, we will see how the addition of the temporal encoding allows us to han- dle the problem of center-embedding. Recall that the encoding of a clause is constructed in CurRR, and then transferred to TotRR when a new clause be- gins. However, when a center-embedded clause is encountered, the RR encoding of the current clause is not integrated into TotRR. Rather, CurRR and CurSc are set to empty, so that they will only encode the embedded clause. When the embedded clause is complete, its temporal representation is deleted from the lists 243 and is replaced with its RR encoding. The information on the lists is then used to re-instantiate the RR encoding of the higher clause; this process is denoted WM-RR encoding. During WM-RR encoding, the information on the lists is RR-encoded as each pair of list entries re. Afterward, the parser is in the same state that it was prior to the center- embedded clause, except that the RR encoding of the center-embedded clause is included in CurRR. Next we consider an example. This processing requires an additional WM variable, CntrSc, which maintains the enclosing scope of a center-embedded clause. We will see how parsing of (42) proceeds. Initially, the WM variables and lists are empty. After the vase, CurRR = Vmod@(on@wednesday) + the + vase, CurSc = empty, and the lists are: E NP E the + vase on@wednesday E PP E At that, CntrSc is set to C, to record the enclosing scope of the upcoming clause. CurSc and CurRR are set to empty, so that only the upcoming clause will be RR encoded. After bought, CurRR = sue + bought @(the + vase), CurSc = bought, and the lists are: 244 E NP NP, Cl, GapReq Gap E the + vase sue Gap on@wednesday E E bought PP E E V At fell, the center-embedded clause is complete. A partial delete of list items is initiated at the start of the embedded clause, and then CntrSc@CurRR is Appended to the lists, giving: E NP ChunkedCl E the + vase C@(sue+bought@(the+vase)) on@wednesday E E PP E E CurRR, CurSc and CntrSc are set to empty. The information on the lists is then used to recover the RR encoding of the main clause, giving: CurRR = Vmod@(on@wednesday) + the + vase + C@(sue + bought@(the + vase) Importantly, the + vase is encountered as a separate entity, in prin- ciple allowing access to grammatical and semantic features necessary for integration with the upcoming verb. (However, details of such integration are beyond the scope of the present work.) Now fell can be processed as usual (i.e., directly incorporated into the current clause). Thus the lists work like a stack to maintain unattached subjects. A center- embedded clause overwrites processing of the higher clause; the RR encoding of 245 the higher clause is later re-generated from the temporal WM encoding. Next we consider in more detail the deletion of the center-embedded from the lists. 14.4 Partial Deletion Recall that partial deletion is performed by storing the identity of the tar- get item, and then matching to this stored value. Given that rules in natural language operate on syntactic structures, I assume that an item is identi ed by its syntactic features. When a center-embedded clause is encountered, the corre- sponding syntactic features are recorded in a WM variable denoted Dtag. In our example, Dtag would be set to (Np, Cl, GapReq). When the embedded clause is complete, inhibition is initiated at the item having these syntactic features. That item, and all successive items on both lists are inhibited, thereby deleting the temporal representation of the relative clause from working memory. Next we consider lower-level details of how the inhibition is triggered. I pro- pose that matching works on the principal of dis-inhibition. That is, Dtag features inhibit the inhibition of list items, while tag- eld features inhibit Dtag features. When deletion is required, its initiation is inhibited by the activity of Dtag fea- tures. When the corresponding features are active in the tag eld, they inhibit the Dtag features. Therefore, inhibition is no longer inhibited by Dtag, and is triggered. Inhibition then continues for the remainder of the oscillatory cycle. If no features are initially active in Dtag, a full deletion of all list items is automati- cally carried out. See Figure 14.4 for a schematic of the proposed network. When deletion occurs, only those cells that are currently ring should be inhibited. This is accomplished via a gating mechanism that allows the inhibitory signal to reach a cell only if that cell is active. (See Figure 14.4.) 246 Deletion Required Perform Deletion Dtag Tag Field List Node Excitatory Inhibitory F1 F2 FnF3 F1 F2 FnF3 F1 F2 FnF3 Gating Node Figure 14.4: Proposed architecture of deletion network. The tag eld is comprised of syntactic features F1, F2, F3 ... Fn, with multiple instances of each feature (two instances shown here). Each feature has inhibitory connections to the cor- responding feature in Dtag, and each feature in Dtag inhibits the node which drives the deletion process. When the tag- eld features inhibit all of the Dtag features, the perform-deletion node is activated and deletion is initiated. Dele- tion is sustained via the self-excitatory connection. The gating node becomes activated only if it receives excitation from both the perform-deletion node and the list node. In that case, the list node is inhibited. Thus inhibition only applies to active list nodes, and does not a ect list nodes that red prior to the initiation of deletion. (Only a single list node is shown. A similar circuit is required for each list node.) 247 Note that the proposed matching mechanism does not require an exact match between the tag eld and Dtag. There is an asymmetry. For deletion to be initiated, all of the active Dtag features must also be active in the tag eld, but all tag- eld features do not have to be active in Dtag. Thus, a match occurs when the tag- eld features are a superset of the Dtag features, but not when the Dtag features are a superset of the tag- eld features. As we will see in Chapter 16, this asymmetry is crucial in explaining complexity phenomena. 14.5 Arbitrary Hierarchical Structure Thus far we have seen how a single right-branching or center-embedded clause is handled. However, arbitrary combinations of branching patterns can occur in natural language. The parsing algorithm can easily be extended to handle arbitrary hierarchical structure. Aside from right-branching and center-branching structures, a clause could also be preposed or left-branching. A preposed clause occurs when an adverbial is moved to the front of the higher clause, as in: 44. When the vase fell, Mary was upset. A left-branching clause is a sentential subject, as in: 45. That the vase fell upset Mary. Preposed and left-branching clauses are processed the same way as center-embedded clauses. If a right-branching clause occurs within a non-right-branching clause, its predicate is stored in CntrSc, so that it can be deleted when the higher clause is complete. Thus CntrSc holds any clausal predicates that must be deleted at some point, while TotSc holds purely right-branching predicates. If a predicate is assigned to CntrSc, it is also recorded on the lists, so that it can be recovered if necessary. That is, if a non-right-branching clause A is being 248 processed, and another embedded clause B is encountered, B will overwrite the information in CntrSc pertaining to A. Once the processing of B is complete, the information pertaining to A can be re-instantiated during WM-RR encoding. This allows recursion, and therefore arbitrary branching patterns can be pro- cessed. In contrast, predicates that are assigned to TotSc are not maintained on the lists, because they cannot be overwritten, and should never be re-processed. For example, the Cl syntactic feature (which corresponds to the C predicate) is activated in the tag eld of a center-embedded RC, but is not activated for a purely right-branching RC (because C is stored in TotSc in this case). In section 15.3, the full parsing algorithm and simulations of the algorithm are presented. 249 Chapter 15 Computational Demonstrations In this chapter, I present implemented demonstrations of some aspects of the above theoretical model. In the rst section, I present the decoding of an RR encoding of a sentence. In the second section, I show that the proposed single- node dynamics of a WM allow a serial encoding of items activated during di erent oscillatory cycles. In the third section, a simulation of the full parsing algorithm is presented. 15.1 Decoding an RR encoding In the following, I use Kanerva?s [Kan95] scheme to demonstrate that the information recorded in the RR encoding of a sentence can indeed be extracted. Recall that the unmerge operation is performed by comparing to item memory, and that all predicates in the RR encoding are bound to P to allow them to be recovered. Each unbind is followed by a comparison to item memory, to clean up the result of the unbind. Decoding proceeds as follows. First the vector is compared to item memory. This retrieves the unbound constituent, i.e., the Agent. Then the vector is unbound with P to retrieve any predicates. The vector is unbound with each of those predicates to retrieve their arguments. 250 We rst illustrate decoding at the conceptual level, then we present a numer- ical example. Consider the vector: Q = ann + loves@(P + joe) where ann, loves, and joe are stored in item memory. Comparing Q to item memory yields the Agent ann, because it is the only constituent vector that is unbound. The vector Q is then unbound with P, yielding: Q#P = ann#P + loves@P#P + loves@joe#P which is similar to loves. It is also similar to ann#P and to loves@joe#P, but these vectors are not stored in memory and act as noise. Thus, comparing Q#P to memory yields loves. The vector Q can then be unbound with loves to yield joe. If too many relationships were encoded within a single vector, so much noise could be introduced that it would not be possible to retrieve the base vectors. This can be remedied by storing intermediate results. The result of an unbinding which is not a base vector can then be cleaned up by retrieving the similar item from memory. Therefore, we assume that if an argument is itself a clause, its encoding has also been recorded. Next I present a numerical example of the encoding and decoding of a sentence containing an embedded clause. The sentence: 46. John told Mary that Bill gave Sue money. was encoded, where john, mary, bill, sue, told, gave, Goal, and money were vectors stored in item memory. 3000 distractor vectors were also stored in item memory. Vectors were of dimension 10,000, and were randomly generated under 251 the constraint that each vector position had equal probability of being a 0 or a 1. The encoding of the sentential complement Bill gave Sue money was also recorded in memory: V1 = bill + Goal@(sue + P) + gave@(money +P) The encoding of the entire sentence was: V = john + Goal@(mary + P) + told@(V1 + P) V was then decoded as follows. (All items exceeding a similarity cuto of 0.52 are presented, with the similarity value in parentheses.) V was compared to item memory to get the subject. This yielded john (0.75; i.e., 75% of V?s positional values matched john?s). V was unbound with P, and the result was compared to memory to get any predicates. This yielded told (0.63) and Goal (0.62). V was unbound with told to get the Theme, yielding V1 (0.62) and bill (0.57). This similarity to bill is appropriate since V contains told@bill (because bill is the subject of V1) 1. V was unbound with Goal, yielding mary (0.62). This process was repeated with V1. The subject was bill (0.75). Unbinding V1 with P yielded gave (0.63) and Goal (0.62). Unbinding V1 with gave yielded money (0.63) and unbinding with Goal yielded sue (0.63). Thus, it was possible to retrieve the information stored in the RR encoding of the sentence. The goal of this implementation was to demonstrate the feasibility of RR encoding and decoding. Thus the operations were performed algorithmically ( 1Thus the Theme and the subject of a sentential complement are represented in the same way; both are bound to the verb. This similarity in representation may be related to the ease with which a Theme can be reanalyzed as a the subject of a sentential complement in ambiguous sentences lacking that, such as John knows Bill likes Sue. 252 i.e., not implemented as a neural network). However, the proposed operations are neurobiologically plausible. Because the merge, bind, and unbind functions operate on corresponding bits across two vectors, they could be implemented within a neural network using one-to-one connections between the areas over which the input vectors are represented. The unmerge operation requires an auto- associative memory; recurrent networks employing distributed representations have long been touted as a natural framework for this kind of memory. 15.2 Temporal WM The goal of the following simulation is to demonstrate the basic functionality of a WM list, which allows a serial representation of the order of items activated at di erent times. This is a network simulation, with each list item being represented by a single node. A list node is always initially activated during the peak of the oscillatory cycle. Thus a mechanism that coordinates timing of activation with the oscillatory cycle is assumed. However, this mechanism is beyond the scope of the present work. Thus, it was not implemented as part of the network. Following [Lis95], list nodes are modeled as units that undergo a sub-threshold oscillatory drive, exhibit an increase in excitability after ring (ADP), and send lateral inhibitory inputs to each other. We use i to denote the ith node to be Appended. The membrane potential, V , of a node is given by: V (i; t) = O(t) + A(i; t) I(t) + E(i; t) where O denotes the oscillatory drive, A denotes ADP, I denotes inhibitory input, and E denotes excitatory external input. A node res when V exceeds a threshold, TH. TH is speci ed relative to resting potential, and set to 10mV. Firing causes the node?s ADP component to be reset, and inhibition to be sent to the other 253 nodes. The oscillatory function O has a cycle length of 200ms, and linearly increases from -5mV to 5mV during the rst half of the cycle, and decreases back to -5mV during the second half. The ADP and inhibition are modeled by functions of the form: F(t; M; T) = M (t=T)1:5 exp(1 t=T) which increases to a maximal value (controlled by M) and then decreases (on a time scale controlled by T). The ADP is given by: A(i; t) = F(t ti; MA; TA) where ti denotes the time at which the ith node last red. (A(i; t) is 0 if the node has not yet red.) The inhibition is given by: I(i; t) = nX j=1 F(t tj; MI; TI) where n gives the number of nodes, and F is 0 if node j has not yet red, or if i = j. The following values were used: TA = 230ms, MA = 13mV, TI = 5ms, MI = 3mV. The external input, E, is such that node i receives an input of amount MA commencing at time F + i P, where F and P are constants. Node i continues to receive this input at each time step until it res. F is set to 100ms (the peak of the rst cycle), and P is assumed to be a multiple of 200ms, so that items to be Appended are activated at the peak of a cycle. A simulation using the above equations, with P = 200 and 8 nodes, yielded all nodes ring in the correct sequence after 9 cycles. Nodes 1-8 red at times 31, 48, 61, 71, 80, 88, 95, 103, respectively (times are given relative to the start of 254 the cycle). Thus, when a new item was activated at each successive peak of the oscillatory cycle, all items came to re sequentially within a cycle, as desired. If, however, external input is applied out of sync with the oscillatory cycle, incorrect orderings result. For example,P = 173 yielded the following after 9 cycles: nodes 5, 4, 1, 2, 3, 6, 7, and 8 red at times 30, 47, 60, 70, 70, 85, 85, and 85, respectively, while P = 227 yielded nodes 5, 6, 4, 7, 1, 2, 3, and 8 ring at times 30, 47, 60, 60, 76, 76, 89, and 89. 15.3 Parsing Algorithm In the present work, the aim is to show that the proposed algorithm is vi- able at the computational level, by demonstrating that the algorithm is powerful enough to handle the complex structures found in natural language. Therefore, the algorithm was implemented at the symbolic level. 15.3.1 Implementation The full parsing algorithm is given in Figures 15.3.1 and 15.3.1. This algorithm was implemented using a positional variable taking one of four values (before subject, before verb, before object, or after object) to determine the branching direction of an embedded clause. Each input sentence was represented by a sequence of two-character symbols, representing phrases formed by a rst stage of RR encoding. The rst character was alphabetic, specifying syntactic type, and the second character was numeric, distinguishing di erent instances of the same syntactic type. For example, the input representation of a sentence with a sentential complement containing a preposed adverbial clause having a right- branching relative clause, like: 255 47. John said that after Mary dropped the vase that Jim bought, Jane got a new vase. is: N1 V1 C1 A1 N2 V2 N3 R1 N6 V3 N4 V4 N5 where ?N? speci es an NP, ?V? a transitive verb, ?C? a complementizer, ?A? an adverb, and ?R? an relative pronoun. The output for each sentence is a string specifying the RR encoding of the sentence. For the above sentence, the desired output is the string: N1 + V1@A1@(N2 + V2@N3) + V1@A1@C@(N6 + V3 @ N3) + V1@(N4 + V4@N5) The model was tested on a variety of sentences containing multiple embeddings of relative, sentential, adverbial, and noun-complement clauses. These inputs are given in section 15.3.2. The correct output was generated for all of the sentences, except for RC/RCs and an NC/RC, consistent with human performance. The reason that the algorithm failed on these structures is discussed in the following chapter. 15.3.2 Stimuli The following lists the stimuli used for the parsing simulation. For ease of comprehension, an example sentence is presented for each input sequence. The correct output was generated for all sentences, except 12 and 13 (RC/RCs) and 27 (NC/RC). 256 Chunk Current Clause /* Remove current clause from lists */ if (Dtag is empty) Empty lists else Partial delete starting at Dtag; Dtag = empty /* Integrate current clause */ if (part of a center-embedded clause) Append CntrSc @ CurRR to lists if (starting right branch) CntrSc = CntrSc @ CurSc else WM-RR encode else /* right branching only */ TotRR = TotRR + TotSc @ CurRR TotSc = TotSc @ CurSc end if Branch on predicate x /* Record start of clause */ if (lists not empty) Dtag will get tag field of clause /* integrate new scope into clausal scope */ if (starting a center-embedded clause) CntrSc = x elseif (starting a right branch inside a center-embedded clause) CntrSc = CntrSc @ x else /* right branching only */ TotSc = TotSc @ x end if /* reset encoding of current clause */ CurSc, CurRR = empty Figure 15.1: Chunking and branching procedures for the full RR encoding algo- rithm . 257 /* Initialize */ set WM variables and lists to empty /*Process input */ for each item x if (x starts an embedded clause) if (x is an adverb) CurSc = empty if (no incomplete dependencies) Chunk current clause Branch on x else if (x resumes a higher clause) Chunk current clause end if if (x is a verb) CurSc = x else if (x is a PP) CurSc = Vmod if (x is a subject NP) CurSc = empty if (x is an NP or PP) CurRR = CurRR + CurSc @ x if (x is an NP, PP, or verb) Append x to lists end for Figure 15.2: Full RR encoding algorithm, using Chunk and Branch operations speci ed in Figure 3. 258 N1 V1 N2 1. The cat chased the rat. *Two clauses* N1 V1 C2 N2 V2 N3 2. The man knows that the cat chased the rat. N1 R1 I2 V3 N3 3. The cat which was chased ate the fish. N1 R1 N2 V2 V1 N3 4. The cat which the dog chased ate the fish. N1 R1 V2 N2 V1 N3 5. The cat which chased the rat ate the fish. N1 V1 N2 R1 V2 N3 6. The cat chased the rat which ate the cheese. N1 V1 N2 R1 N3 V2 7. The cat chased the rat which the dog bit. 259 A1 N1 V1 N2 N3 V2 N4 8. After the cat chased the rat, the dog ate the meat. N1 V1 N2 A1 N3 V2 N4 9. The dog ate the meat after the cat chased the rat. C1 N1 V1 N2 V2 N3 10. That the dog ate the chocolate bothered Bill. N4 C1 N1 V1 N2 V2 N3 11. The fact that the dog ate the chocolate bothered Bill. . *Three clauses* N1 R1 N2 R2 N3 V3 V2 12. The rat which the cat which the dog hates chased V1 N4 ate the cheese. N1 R1 N2 R2 V3 N3 V2 13. The rat which the cat which hates the dog chased 260 V1 N4 ate the cheese. N1 R1 V2 N2 R2 N3 V3 14. The dog which chased the cat which the rat feared V1 N4 ate the meat. N1 R1 V2 N2 R2 V3 N3 15. The dog which chased the cat which chased the rat V1 N4 ate the meat. N1 V1 N2 R1 V2 N3 R2 16. The dog chased the cat which chased the rat which V3 N4 ate the cheese. N1 V1 N2 R1 V2 N3 17. The dog chased the cat which chased the rat 261 R2 N4 V3 which the lion liked. N1 V1 N2 R1 N3 18. The dog chased the cat which the rat R2 V4 N4 V3 which ate the cheese feared. N1 V1 N2 R1 N3 19. The dog chased the cat which the rat R2 N4 V4 V3 which the lion liked feared. N1 R1 V2 C1 N3 V3 N4 20. The man who thinks that the cat chased the rat V1 N5 ate the cheese. N1 R1 N2 V2 V1 C3 N3 21. The man who the lion chased thinks that the rat V3 N4 ate the cheese. 262 N1 V1 N2 R1 V2 C3 N3 22. The lion chased the man who thinks that the rat V3 N4 ate the cheese. N1 V1 C2 N2 V2 C3 23. The man knows that the girl thinks that N3 V3 N4 the cat chased the rat. N1 V1 C2 N2 V2 N3 24. The man thinks that the rat ate the cheese A1 N4 V4 N5 after the dog bit the cat. N1 V1 C1 A0 N2 V2 N3 25. The man knows that when the cat chases the rat, N4 V4 N5 the lion chases the dog. 263 N4 C1 N5 R1 N1 V1 26. The fact that the dog which Sue adopted V2 N2 V3 N3 ate the chocolate bothered Bill. N4 R1 N5 C1 N1 V1 N5 27. The woman who the fact that the dog ate the chocolate V2 V3 N3 bothered hit Bill. *four clauses* N1 R1 V2 C2 N2 V3 N3 28. The man who knows that the cat chased the rat A1 N4 V4 N5 V1 N6 after the dog ate the meat ate the pie. 264 Chapter 16 Complexity Next we consider how the proposed processing accounts for complexity phe- nomena. Recall that we want to explain the following. RC/RC { Very di cult { E ect of NP type Easier if N3 is a pronoun. N3 pronoun is easier than an N1 or N2 pronoun. { V2-drop Felicitous for double center-embedding. Not felicitous for center-embedded RC within a right-branching RC. { N3-type x V2-drop interaction - V2-drop is not felicitous for pronoun N3 Noun Complements { NC/RC easier than RC/RC. 265 { RC/NC as hard as RC/RC. Crossed-serial Dependencies { Easier than double center-embeddings. { Pronoun N3 has no e ect. Similarity-based interference { Increase in di culty as number of similar items increases. { For a xed number of similar items, increase in di culty as their prox- imity to each other increases. First we will see how the proposed processing of center-embedded clauses ac- counts for the RC and NC phenomena. Then the relative ease of crossed-serial dependencies will be addressed, followed by a discussion of similarity-based in- terference. 16.1 Center Embedding 16.1.1 RC/RC Consider processing of the sentence: 48. The vase that the man that Sue dated bought fell. At the rst that, center-embedded processing is invoked. At the second that, center-embedded processing is again invoked, overwriting the information in the WM variables pertaining to the rst RC. When bought is encountered, the lists are: 266 NP NP, Cl, GapReq NP, Cl, GapReq Gap the + vase the + man sue Gap E E E dated E E E V and Dtag is set to the values of the tag eld of the inner RC (NP, Cl, GapReq). At this point, deletion of the inner RC from temporal WM is required. Recall that explicit read out of a list is always initiated at the trough of the oscillatory cycle. Thus, the \Deletion Required" node is activated at that point. During the rst temporal slot, inhibition is prevented because the tag eld does not match Dtag. However, a match does occur during the second slot, and inhibition is triggered. However, this inhibition is premature; it really should have been initiated at N3. Therefore, N2 is erroneously deleted from the lists, giving: NP the + vase E E Thus, because list items are read out in order, and N2?s syntactic features match N3?s, information about the outer RC is erroneously deleted from WM. I propose that this premature deletion is the fundamental cause of the di culty of an RC/RC. Note that the the V2-drop e ect [Gib99] falls out naturally from this account. During WM-RR encoding following deletion, processing of the main clause is re-instantiated, because only the main-clause subject remains on the lists. Thus only the main-clause verb is expected. However, in the case of a center-embedded RC within an right-branching RC, such as (49), incorrect deletion does not arise. 267 49. I like the vase that the man that Sue dated bought. Following the vase, the word that signals the start of a right-branching clause. All items on the lists (i.e. the temporal encoding of the main clause) are deleted. In this case, the subject of the outer RC is not tagged with the Cl feature, because the C predicate is permanently stored in TotSc. Therefore the lists up to bought are: NP, GapReq NP, Cl, GapReq Gap the + man sue Gap E E dated E E V and Dtag is (NP,Cl, GapReq). During deletion of the inner RC, a match does not occur at the rst temporal slot because the required Cl feature is not ac- tive. Therefore the sentence is processed correctly, explaining why V2-drop is not felicitous in this case [Gib99]. An RC/RC could be processed correctly if N2 could be distinguished from N3. I propose that this is why a pronoun N3 makes an RC/RC seem easier [Gib98, War02a]. When N3 is a pronoun, this additional syntactic information is re ected in its tag eld. Thus, N3 will have have the syntactic features (NP, Pr, Cl, GapReq). This tag eld will overwrite N2?s features in Dtag. During deletion of the inner RC, matching is performed against N3?s syntactic features. In this case, a match does not occur at N2 because it does not possess the Pr feature. Thus deletion of the inner RC proceeds correctly. During WM-RR encoding, center-embedded processing of the outer RC is re-instantiated. The outer RC is processed, and then processing of the main clause is re-instantiated. Thus the sentence is processed correctly. This implies that V2-drop should not be felicitous for a pronoun N3, as we have demonstrated [Whi04d]. 268 However, this analysis only applies when N3 is a pronoun. Recall that a match occurs when all of Dtag?s features are active in the tag eld. If one of the other subjects is a pronoun (while N3 is not) a premature match will occur at N2, because N2 would possess all of N3?s features. Thus the match asymmetry arising from the disinhibition mechanism explains why an N3 pronoun is easier than an N1 or N2 pronoun [War02a]. 16.1.2 Noun Complements A clause that must be a relative clause contains the GapReq feature, while a potential noun complement does not. Note that this syntactic feature is in- dependently motivated. Consideration of the following examples demonstrates that the parser needs to keep track of some important di erences between these constructions. 50. a. The fact that John read magazines surprised Sue. b. The fact that John read in the newspaper surprised Sue. c. *The fact which John read magazines surprised Sue. d. The fact which John read in the newspaper surprised Sue e. The item that John read in the newspaper surprised Sue. f.* The item that John read magazines surprised Sue. Read is a verb that can be used in either transitive or intransitive construc- tions. This is why (a) is grammatical as a noun complement, with or without the presence of the direct object magazines. This structure has a third reading as a relative clause, as shown in (b). The contrast between (b) and (c) shows that the complementizer which can only introduce a relative clause. The fact that 269 the ungrammaticality of (c) is easily detectable shows that the parser must have some device to indicate that the requirement of a gap to be bound to the head of the relative clause is not satis ed. The easily detected contrast between (a) and (f) shows that the parser must directly mark whether a noun complement is permissible; it is not su cient to indicate the lexical choice of the complemen- tizer. The GapReq feature ful lls these requirements, specifying an additional constraint on an embedded clause. It is necessary to store this feature in the tag eld to allow retrieval of this information (during WM-RR encoding) if a center-embedded clause interrupts the processing of an RC or a potential NC. Thus the temporal encoding of (51) up to fell is: NP NP, Cl NP, Cl, GapReq Gap the + fact the + vase sue Gap E E E bought E E E V 51. The fact that the vase that Sue bought fell upset her. Deletion of the RC will proceed correctly because N2 does not possess the GapReq feature, explaining the ease of an NC/RC [Gib98]. However, in the case of an RC/NC, N2 would contain all of N3?s features. Again, due to the matching asymmetry, Dtag (containing N3?s features) would match at N2, triggering premature inhibition and the erroneous removal of the RC from temporal WM. This explains the di culty of an RC/NC [Gib00]. Therefore, an RC/NC should show the same pattern of N3-type and V2-drop e ects as an RC/RC. Intuitively, this seems to be the case, although experimental studies have not been done to con rm this. 270 This analysis implies that an RC/RC that could have been an NC/RC, such as (52), should be processed correctly. 52. The proposal that the student who Amy advises made at the meeting in- trigued us. This is because the lack of a GapReq feature is determined at the start of the clause. Thus, N2 would not have the GapReq feature, allowing the inner RC to be correctly deleted. For such an RC/RC, N3-type should not have an e ect, in contrast to unambiguous RC/RC. This prediction is unique to the TPARRSE model, and will be experimentally tested in the future. (However, note that perceived overall complexity may still remain high, due to the reanalysis triggered by made at. The speci c prediction is that N3-type should have no e ect at the processing of the nal verb phrase in a self-paced reading study, in contrast to the e ect previously observed for the unambiguous case [Whi04d]. ) 16.1.3 Summary The above complexity phenomena are explained by the following key assump- tions: Subject NPs are stored serially in working memory, with accompanying syntactic features. Items are deleted from working memory by sequentially \searching"for tar- get syntactic features; the search is initiated at the rst item. There is an asymmetry to the search process due to the disinhibition mech- anism; for inhibition to be triggered, all target features must be active in the list item, but not necessarily vice versa. 271 Premature deletion (starting at N2 rather than N3) follows from the rst two assumptions, and explains RC/RC di culty and the V2-drop e ect. The blocking of this premature deletion via syntactic di erences between N2 and N3 explains the relative ease of an NC/RC and a pronoun N3. Because deletion of the inner RC proceeds correctly, V2-drop is not felicitous for a pronoun N3, or for a center- embedded RC within a right-branching RC. The third assumption explains the di culty of an RC/NC and a pronoun N2. Thus, the proposed account is based on the nature of WM representations. This approach di ers from previous accounts, which depend on capacity in some way. The DLT proposes capacity limitations in re-exciting previously processed constituents [Gib98]. The interference account is based on a maximal number of unattached subjects [Lew96]. In the Vosse & Kempen model [Vos00], correct attachment depends on relative inhibitory strength. We have seen in Chapter 13 that none of these proposals can fully explain the complexity phenomena. In contrast, the novel account o ered by the TPARRSE model covers all of these phenomena. Next we see how the model also explains crossed-serial dependencies. 16.2 Crossed-Serial Dependencies The processing of a center-embedding requires deletion of the embedded clause from WM, in order to correctly associate the separated higher-level subjects and predicates. However, processing of cross-serial dependencies does not require such deletion, because the subjects and verbs occur in the same order. Thus the verbs can be slotted into WM, and then the RR encoding can be read o . For example, consider processing of the English gloss of (9): 53. Joanna has the men Hans the horses helped to-teach to-feed. 272 The NPs are processed by Appending them to the noun list, giving 1: NP NP NP NP joanna the + men hans the + horses E E Then the verbs are Appended, giving: NP NP NP NP joanna the + men hans the + horses E has-helped to-teach to-feed E V V V Now the verbs line up with their objects, and the information in WM can be RR-encoded in the manner of a right-branching sentence, giving: Joanna + has-helped@(the + men) + has-helped@to-teach@hans + has-helped@to-teach@to-feed@(the + horses) which is equivalent to: joanna + has-helped@(the + man + to-teach@(hans + to- feed@(the + horses)) Thus the sentence can be processed without deleting individual embedded clauses from WM. 2 Therefore, processing of crossed-serial dependencies is more e cient 1It is assumed that the auxiliary between the rst and second subjects is saved, to be joined with the rst verb. 2Partial RR encodings could be created by performing WM-RR encoding after each verb. 273 that center-embeddings, accounting for their reduced complexity[Bac86]. Be- cause partial deletion is not required, making N3 a pronoun should not in uence processing. This is exactly the result observed in [Kaa04]. We have now seen how the WM lists can used like stacks for processing center- embeddings, and like queues to process crossed-serial dependencies. Because the lists are not actually stacks (i.e., there is no pop operation that removes the last element on a list), processing can break down, explaining complexity phenomena associated with double center-embeddings. In contrast, the dual seriality of the lists can be used directly to represent crossed-serial dependencies, enabling more reliable processing. 16.3 Interference in Working Memory Recall that a WM list is comprised of banks of cells, and each activated list item draws on a subset of each of those banks. A new list item can therefore \steal" cells from already activated items. In this way, adding new items to WM can degrade the representation of previous items. The more similar the new item is to a previous item (i.e. the greater the number of matching bits), the more the opportunity for degradation of the previous item. Thus WM representations will become more degraded as the number of items increases, and as their similarity to each other increases. In addition to these factors, WM representations are also likely to be degraded over time. As an item continues to re across oscillatory cycles, its activation may decay, the synchronization across positional banks may decrease, and/or constituent cells may re in the wrong subcycle. Recall that fast within-position inhibition and fast, weak across-position exci- tation are proposed to stabilize the representation of a single item (within a oscil- 274 latory subcycle), whereas slow inhibition serves to separate items. The amount of fast excitation would have to be rather narrowly tuned in order to avoid causing cells that should re at upcoming subcycles to re prematurely, but yet promote almost synchronous ring within a subcycle. Within each position, it is easiest to maintain separation between subcycles n and n+1 when the polarity switches (0 to 1 or 1 to 0) across subcycles, due to the additional support of the fast inhibi- tion. When the polarity doesn?t switch between subcycles, it is more likely that cells belonging to subcycle n+1 will re prematurely in subcycle n. If the sub- cycles remain separated, this is not a problem. However, if this occurs on a large scale, n and n+1 may collapse into a single subcycle. As the number of positions in which the polarity stays the same increases, this merging of subcycles becomes more likely, as there are fewer non-switching positions to drive strong separation between subcycles. Thus, when two similar items re in consecutive subcycles, it is more di cult to maintain the separation between those items, accounting for the observed increase in perceived complexity as proximity between similar items increases [Lew02, Lee03, Kaa04]. In the DLT [Gib98, Gib00], capacity constraints emerge from the general as- sumption of a xed pool of resources. The TPARRSE model provides a more detailed proposal for the nature of such capacity limitations. However, we have seen that capacity limitations per se cannot explain all of the complexity phe- nomena. Rather, the underlying structure of WM in the TPARRSE explains interference e ects and general distance e ects, while the proposed manipula- tions over those representations explain the speci c pattern of complexity e ects observed for doubly center-embedded clauses. 275 Chapter 17 Conclusion I rst speculate brie y on future directions of research related to the TPARRSE model, and then summarize the most important points of this dissertation. 17.1 Future TPARRSE Research As for explaining behavioral data, I have concentrated on complexity e ects in the present work. I believe that the proposed TPARRSE representations could also explain reanalysis phenomena - namely why some ambiguous sentences are easy to reanalyze, and some are not. Current explanations assume that reanalysis operations are carried out a representation corresponding to a syntactic tree. In contrast, I propose that reanalysis operations are carried out over the WM lists. When an unexpected word or phrase occurs, the information encoded on the WM lists is reinterpreted to generate a new RR encoding of the previously processed material. If the established WM representation is incompatible with the correct interpretation, the sentence will be di cult to reanalyze. One avenue of future work will focus on specifying the nature of these reanalysis operations in detail, to allow a comprehensive account of the standard reanalysis cases. Thus far, I have concentrated on the representations that encode hierarchical 276 structure, and have left the speci cation of the neural implementation of the parsing algorithm for future work. In general, I assume that the parsing rules are implemented by gating nodes which appropriately direct the ow of activation to neural areas which implement the WM variables, the WM lists, and the merge, bind, Append and Delete operations. I assume that the ability to perform these basic operations is innate. Thus language acquisition entails learning to store the relevant information in WM variables, and to invoke the appropriate primitive operations. A recent model of sequential-task learning, in which the basal ganglia gate individual \stripes" of prefrontal cortex [Ore03], seems ideal for this task. A stripe could correspond to a WM variable. Perhaps such an implemented model could develop the functionality of CurSc, CurRR, etc., as well as triggering of the required control operations. Therefore, future work on a neurally plausible implementation of the proposed parsing rules will focus on the application of this learning algorithm. I would also like to pursue experimental investigations into the TPARRSE model. The model predicts that N3-type should not have an e ect at the nal verb for an RC/RC that could have been an NC/RC. This will be tested in a self-paced reading study. An EEG study has shown that theta-band amplitude increases as a sentence is processed [Bas02]. This amplitude may index syntactic working memory load [Bas02]. If so, this amplitude should be sensitive to the syntactic structure of a sentence. For example, working memory load should decrease following the completion of a center-embedded clause, implying that theta-band amplitude should also decrease at that point. This prediction will be tested in an EEG study 277 17.2 Conclusion The goals of this work have been three-fold: (1) to advocate a particular approach to computational modeling; (2) to apply this approach to the problem of letter-position encoding; (3) to apply this approach to the problem of sentence parsing and the representation of hierarchical structure. The approach places an emphasis upon developing computational theories, rather than on implementation of models. It emphasizes rst understanding ma- ture neural systems, rather than developing learning algorithms. Understanding what the mature system is doing would then provide strong constraints for inves- tigations into how the system develops, because the endpoint is known. The approach is truly interdisciplinary. Strong emphasis is placed on explain- ing the details of a wide range of relevant behavioral data. Such data provides clues as to what algorithms the brain is using. Although brain imaging is amazing, revolutionary, etc., I believe that behavioral data actually reveal more about how the brain is doing what it is doing. Consideration of neural architecture also pro- vides information/constraints on what algorithms the brain is using. Knowledge of computational theories of neural processing provides the building blocks for formulating a model that meets the behavioral and neurobiological constraints. The resulting model of a particular task speci es how information is mapped onto neural representations, and how one type of neural representation is trans- formed into another type. This abstract approach allows consideration of the big picture, without being limited by implementational constraints. Ideally, it leads to novel, veri able predictions. I rst applied this approach to understanding how the brain encodes letter order during visual word recognition. The architecture of the visual system de- 278 termined the lowest level of the model. Behavioral data on letter perceptibility, word priming, error patterns, and visual- eld e ects provided information about how this initial representation is transformed into a lexical representation. The goal of explaining this behavioral data led to the SERIOL model. The fact that the model led to an experiment which identi ed the source of the asymme- try of the length e ect, which had been a subject of debate for half a century [Mel57, Bou73, Ell88, Jor03, Naz03], con rms the viability of the model and of the overall approach. These results, in conjunction with the experimental results on the N e ect, demonstrate that although the SERIOL model is abstract, it is highly speci c. I have also applied this approach to the question of how hierarchical infor- mation is encoded during sentence processing. This is a more di cult task be- cause (1) the problem is much harder and (2) there is much less relevant data available. Neural constraints were limited to generalities - xed connectivity and local processing, although imaging data on oscillatory phenomena associated with verbal working memory were suggestive. Behavioral data was primarily in the form of complexity e ects. The consideration of these factors, the compu- tational demands of the task, and insights from the SERIOL model have led to the TPARRSE model. This model is unique in several ways: (1) the use of a neurobiologically - motivated sequential representation [Lis95] for stack-like and queue-like processing; (2) the dichotomy of representations used in temporal WM versus the thematic tree; (3) an account of complexity e ects based on the speci cs of WM representations and manipulations. It is hoped that this model too will lead to informative experimental results. 279 Bibliography [Abb91] Abbott, L.F. (1991) Firing-Rate Models for Neural Populations. In Benhar, O., Bosio, C., Del Giudice, P. and Tabet, E., (Eds). Neural Networks: From Biology to High-Energy Physics. ETS Editrice: Pisa. [Abn89] Abney, S. (1989) A computational model of human parsing. Journal of Psycholinguistic Research, 18, 129-144. [Abn95] Abney, S. (1995) Chunks and dependencies: Bringing processing evi- dence to bear on syntax. In J. Cole, G. Green, and J. Morgan (Eds.), Computational Linguistics and the Foundations of Linguistic Theory. CSLI. [Abn91] Abney, S. & Johnson, M. (1991) Memory requirements and local ambi- guities of parsing strategies. Journal of Psycholinguistic Research, 20, 233-250. [And91] Andrade, R. (1991) Cell excitation enhances muscarinic cholinergic re- sponses in rat association cortex. Brain Research, 548, 81-93. [And89] Andrews, S. (1989) Frequency and neighborhood e ects on lexical access: Activation or search? Journal of Experimental Psychology: Learning, Memory and Cognition, 15, 802-814. 280 [And96] Andrews, S. (1996) Lexical retrieval and selection processes: E ects of transposed-letter confusability. Journal of Memory and Language, 35, 775-800. [And97] Andrews, S. (1997) The e ect of orthographic similarity on lexical re- trieval: Resolving neighborhood con icts. Psychonomic Bulletin and Review, 4, 439-461. [Auc01] Auclair, L. & Chokron, S. (2001) Is the optimal viewing position in reading in uenced by familiarity of the letter string? Brain and Cog- nition, 46, 20-24. [Bab99] Babyonyshev, M. & Gibson, E. (1999) The complexity of nested struc- tures in Japanese. Language, 75, 423-450. [Bac86] Bach, E., Brown, C., & Marslen-Wilson, W. (1986) Crossed and nested dependencies in German and Dutch: A psycholinguistic study. Lan- guage and Cognitive Processes, 1, 249-262. [Bal95] Balota, D.A., & Abrams, R.A. (1995) Mental Chronometry: Beyond onset latencies in the lexical decision task. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 1289-1302. [Bal94] Balota, D.A., Cortese, M.J., Sergent-Marshall, S.D., Spieler, D.H. & Yap, M.J. (2004) The English Lexicon Project: A web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. http://elexicon.wustl.edu, Washington University. 281 [Bas02] Bastiaansen, M., van Berkum, J. & Hagoort P. (2002) Event-related theta power increases in the human EEG during online sentence pro- cessing. Neuroscience Letters, 323, 13-16. [Beh98] Behrman, M. et al. (1998) Visual complexity in letter-by-letter reading: \pure" alexia is not pure. Neuropsychologia, 36, 1115-1132. [Ber97] Berry, M.J., Warland, D.K. & Meister, M. (1997) The structure and precision of retinal spike trains. Proceedings of the National Academy of Science, 94, 5411-5416. [Ber84] Berwick R. & Weinberg, A. (1984) The Grammatical Basis of Linguistic Performance, Cambridge, MA: MIT Press. [Bie87] I. Biederman, I. (1987) Recognition-By-Components: a theory of hu- man image understanding. Psychological Review, 94, 115-147. [Bin92] Binder, J. & Mohr, J. (1992) The topography of callosal reading path- ways, a case-control analysis. Brain, 115, 1807-1826. [Bli73] Bliss, T. V. & Lomo, T. (1973) Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. Journal of Physiology, 2, 331-356. [Bro93] Browosky, R. & Besner, D. (1993) Visual word recognition: A multi- stage activation model. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 813-840. [Bou73] Bouma, H. (1973) Visual interference in the parafoveal recognition of initial and nal letters of words. Vision Research, 13, 767-782. 282 [Bol91] Boland, J.E. & Tanenhaus, M.K. (1991) The role of lexical represen- tations in sentence processing. In G.B. Simpson (Ed.) Understanding Word and Sentence. Amsterdam: North-Holland. 331-366. [Bra04] Brain and Language (2004), 88(3). [Bur88] Burgess, C. & Simpson G. B. (1988) Cerebral hemispheric mechanisms in the retrieval of ambiguous word meanings. Brain and Language, 33, 86-103. [Bur02] Burt, J.S. & Tate, H. (2002) Does a reading lexicon provide ortho- graphic representations for spelling? Journal of Memory and Language, 46, 518-543. [Bri68] Brindley, G. & Lewin, S. (1968) The sensations produced by electrical stimulation of the visual cortex. Journal of Physiology, 196, 479-493. [Bro72] Brown, J.W. (1972) Aphasia, Apraxia, and Agnosia. Charles C. Thomas: Spring eld, Ill. [Bry94] Brysbaert, M. (1994) Interhemispheric transfer and the processing of foveally presented stimuli. Behavioural Brain Research, 64, 151-161. [Bry96] Brysbaert, M., Vitu, F. & Shroyens, W. (1996) The right visual eld advantage and the optimal viewing position e ect: On the relation between foveal and parafoveal word recognition. Neuropsychology, 10, 385-395. [Bry04] Brysbaert, M. (2004) The importance of interhemispheric transfer for foveal vision: A factor that has been overlooked in theories of visual 283 word recognition and object perception. Brain and Language, 88, 259- 267. [Buc04] Buckmaster, P.S., Alonso, A., Can eld, D. R. & Amaral, D. G. (2004) Dendritic morphology, local circuitry, and intrinsic electrophysiology of principal neurons in the entorhinal cortex of macaque monkeys. Journal of Comparative Neurology, 470, 317-329. [Bur00] Burle, B. & Bonnet, M. (2000) High-speed memory scanning: A behav- ioral argument for a serial oscillatory model. Cognitive Brain Research, 9, 327-337. [Cap96] Caplan, D., HildeBrandt, N. & Makris, N. (1996) Location of lesions in stroke patients with de cits in syntactic processing in sentence com- prehension. Brain, 119, 933-949 [Car87] Carpenter, G.A. & Grossberg, S. (1987) A massively parallel architec- ture for a self-organizing neural pattern recognition machine, Computer Vision, Graphics, and Image Processing, 37, 54-115. [Cho59] Chomsky, N. (1959) On certain formal properties of grammars. Infor- mation and Control, 1, 91-112. [Cho59b] Chomsky, N. (1959) Review of B. F. Skinner, Verbal Behavior. Lan- guage, 35, 26-58. [Cho80] Chomsky, N. (1980) Rules and Representations. Columbia University Press: New York. [Chr99] Christiansen, M.H. & Chater, N. (1999) Connectionist natural lan- guage processing: The state of the art. Cognitive Science, 23, 417-437. 284 [Cis03] Cisse, Y., Grenier, F., Timofeev, I. & Steriade, M. (2003) Electrophys- iological properties and input-output organization of callosal neurons in cat association cortex. Journal of Neurophysiology, 89, 1402-1413. [Coh00] Cohen, L. et al. (2000) The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. Brain, 123, 291-307. [Coh02] Cohen, L. et al. (2002) Language-speci c tuning of visual cortex? Func- tional properties of the Visual Word Form Area. Brain, 125, 1054-1069. [Coh03] Cohen, L. et al. (2003) Visual word recognition in the left and right hemispheres: anatomical and functional correlates of peripheral alex- ias. Cerebral Cortex, 13, 1313-1333. [Col77] Coltheart, M., Davelaar, E., Jonasson, J.T., & Besner, D. (1977) Ac- cess to the internal lexicon. In S. Dornic (Ed.) Attention and Perfor- mance VI: The Psychology of Reading. Academic Press. [Cor98] Cornelissen, P. et al. (1998) Coherent motion detection and letter po- sition encoding. Vision Research, 38, 2181-2191. [Dav99] Davis, C. (1999) The Self-Organising Lexical Acquisition and Recogni- tion (SOLAR) model of visual word recognition. Unpublished Doctoral Dissertation, University of South Wales. [Deh87] Dehaene, S., Changeux, J.P. & Nadal J.P. (1987) Neural networks that learn temporal sequences by selection. Proceedings of the National Acadamy of Science, 84, 2727-2731. 285 [Deh02] Dehaene, S. et al. (2002) The visual word form area: A prelexical representation of visual words in fusiform gyrus. Neuroreport, 13, 321- 325. [Deh04] Dehaene, S., Jobert, A., Naccache, L., Ciuciu, P., Poline, J.B., Le Bihan, D. & Cohen, L. (2004) Letter binding and invariant recognition of masked words: behavioral and neuroimaging evidence. Psychological Science, 15, 307-313. [Del00] Delorme, A., Richard, G. & Fabre-Thorpe, M. (2000) Ultra-rapid cat- egorisation of natural scenes does not rely on colour cues: a study in monkeys & humans. Vision Research, 40, 2187-2200. [Dem92] Demonet, J. et al. (1992) The anatomy of phonological and semantic processing in normal subjects. Brain, 115, 1753-1768. [Duc03] Ducrot, S., Lete, B., Sprenger-Charolles, L., Pynte, J. & Bil- lard, C (2003) The Optimal Viewing Position E ect in Be- ginning and Dyslexic Readers. Current Psychology Letters, 10, http://cpl.revues.org/document99.html. [Eng01] Engel, A. K. & Singer, W. (2001) Temporal binding and the neural correlates of sensory awareness. Trends in Cognitive Science, 5, 16-25. [Evi99] Eviatar, Z. (1999) Cross-language tests of hemispheric strategies in reading nonwords. Neuropsychology, 13, 498-515. [Ell88] Ellis, A. W., Young, A.W. & Anderson, C. (1988) Modes of word recog- nition in the left and right cerebral hemispheres. Brain and Language, 35, 254-273. 286 [Elm90] Elman, J.L. (1990) Finding structure in time. Cognitive Science, 14, 179-211. [Est76] Estes, W.K., Allemeyer, D.H. & Reder, S.M. (1976) Serial position functions for letter identi cation at brief and extended exposure dura- tions. Perception & Psychophysics, 19, 1-15. [Eve81] Evett, L. J. & Humphreys, G. W. (1981) The use of abstract graphemic information in lexical access. Quaterly jounral of Experimental Psy- chology, 30, 569-575. [Far96] Farid, M. & Grainger, J. (1996) How initial xation position in uences visual word recognition: a comparison of French and Arabic. Brain and Language, 53, 351-368. [Fac01] Facoetti, A. & Molteni, M. (2001) The gradient of visual attention in developmental dyslexia. Neuropsychologia, 39, 352-357. [Fel01] Fellous, J.M., Houweling, A.R., Modi, R.H., Rao, R.P., Tiesinga, P.H. & Sejnowski T. J. (2001) Frequency dependence of spike timing reli- ability in cortical pyramidal cells and interneurons. Journal of Neuro- physiology, 85, 1782-1787. [Fer92] Ferrara, V.P., Nealey, T.A. & Maunsell, J.H. (1992) Mixed parvocel- lular and magnocellular geniculate signals in visual area V4. Nature, 358, 756-761. [Fie02] Fiebach, C. et al. (2002) fMRI evidence for dual routes to the mental lexicon in visual word recognition. Journal of Cognitive Neuroscience, 14, 11-23. 287 [Fit04] Fitch, T. & Hauser, M. (2004) Computational constraints on syntactic processing in a nonhuman primate, Science, 377-380. [Fos02] Foster, D. H. & Gilson, S. J. (2002) Recognizing novel three- dimensional objects by summing signals from parts and views. Pro- ceedings of Royal Society of London. Series B Biological Science, 269, 1939-1947. [Fre76] Frederiksen, J.R. & Kroll, J.F. (1976) Spelling and sound: Approaches to the internal lexicon. Journal of Experimental Psychology: Human Perception and Performance. 2, 361-379. [Fri01] Friedmann, N. & Gvion, A. (2001) Letter position dyslexia. Cognitive Neuropsychology, 18, 673-696. [Fuk88] Fukushima, K. (1988) Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1, 119-130. [Ger01] Gers, F. A. & J. Schmidhuber, J. (2001) LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages. IEEE Transactions on Neural Networks 12, 1333-1340. [Ges95] Geschwind, N. (1965) Disconnection syndromes in animals and man. Brain, 88, 237-294 and 585-644. [Gib98] Gibson, E. (1998) Linguistic complexity: Locality of syntactic depen- dencies. Cognitiong, 68, 1-75. [Gib99] Gibson, E., & Thomas, J. (1999) Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processesg, 14, 225-248. 288 [Gib00] Gibson E. (2000). The dependency locality theory. In Marantz, Miyashita n& O?Neil (Eds.), Image, Language, Braing. Cambridge, MA: MIT Press. 95-126. [Gib04] Gibson, E., Desmer, T., Watson, D., Grodner, D., & Ko, K. (2004) Reading relative clauses in English. Submitted. [Gra89] Grainger, J., O?Regan, J.K., Jacobs, A.M. & Segui, J. (1989) On the role of competing word units in visual word recognition: The neigh- borhood requency e ect. Perception and Psychophysics, $5, 189-195. [Gra96] Grainger, J. & Jacobs, A. (1996) Orthographic processing in visual word recognition: A multiple readout model. Psychological Review, 103, 518-565. [Gra04a] Granier, J.P. & Grainger, J. (2004) Letter position information and printed word perception: The relative-position priming constraint. Submitted. [Gra04b] Grainger, J. & Whitney, C. (2004) Does the huamn mnid raed wrods as a wlohe? Trends in Cognitive Sciences, 8, 58-59. [Ham82] Hammond, E.J. & Green, D.W. (1982) Detecting targets in letter and non-letter arrays. Canadian Journal of Psychology, 36, 67-82. [Har75] Harcum, E.R. & Nice, D.S. (1975) Serial processing shown by mutual masking of icons. Perceptual and Motor Skills, 40, 399-408. [Har01] Hari, R., Renvall, H. & Tanskanen, T. (2001) Left minineglect in dyslexic adults. Brain, 124, 1373-1380. 289 [Hau04] Hauk, O. & Pulvermuller F. (2004) E ects of word length and fre- quency on the human event-related potential. Clinical Neurophysiol- ogy, 115, 1090-1103. [Hay03] Hayward, G. (2003) After the viewpoint debate: where next in object recognition? Trends in Cognitive Sciences, 7, 425-427. [Hel95] Hellige, J.B., Cowin, E.L. & Eng, T.L. (1995) Recognition of CVC syl- lables from LVF, RVF, and central locations: Hemispheric di erences and interhemispheric interactions. Journal of Cognitive Neuroscience, 7, 258-266. [Hle97] Hellige, J.B. & Scott, G.B. (1997) E ects of output order on hemi- spheric assymetry for processing letter trigrams. Brain and Language, 59, 523-30. [Hell99] Hellige, J.B. & Yamauchi, M. (1999) Quantatitive and qualitative hemispheric asymmetry for processing Japanese kana. Brain & Cog- nition, 40, 453-463. [Hel99] Helenius, P., Tarkiainen, A., Cornelissen, P. L., Hansen, P. L., & Salmelin (1999) Dissociation of normal feature analysis and de cient processing of letter-strings in dyslexic adults. Cerebral Cortex, 9, 476- 483. [Hin90] Hinton, G.E. (1990) Mapping part-whole hiearchies into connectionist networks. Arti cal Intelligence, 46, 47-75. [Hoc97] S. Hochreiter, S. & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. 290 [Hop95] Hop eld, J.J. (1995) Pattern recognition computation using action po- tential timing for stimulus representation. Nature, 376, 33-36. [Hum90] Humphreys, G.W., Evett, L.J. & Quinlan, P.T. (1990) Orthographic processing in visual word identi cation. Cognitive Psychology, 22, 517- 560. [Hum92] Hummel, J. & Biederman, I. (1992) Dynamic binding in a network for shape recognition. Psychological Review, 99, 487-517. [Hum97] Hummel, J.E., & Holyoak, K.J. (1997) Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104, 427-466. [Ino09] Inouye, T. (1909) Die sehstorungen bei schussverletzungen der ko- rtikalen sehsphare nach beobachtungen an versundeten der letzten japanische kriege. W. Engelmann. [Jac02] Jackendo , R. (2002) Foundations of Language. Oxford University Press. [Jen02] Jensen, O & Tesche, C. (2002) Frontal theta activity in humans in- creases with memory load in a working memory task. European Journal of Neuroscience, 15, 1-6. [Jor03] Jordan, T.R., Patching, G. R. & Thomas, S. M. (2003) Assessing the role of hemispheric specilisation, serial-position processing, and retinal eccentricity in lateralised word recognition. Cognitive Neuropsychol- ogy, 20, 49-71. 291 [Kaa04] Kaan, E. & Vasic, N. (2004) Cross-serial dependencies in Dutch: Test- ing the in uence of NP type on processing load. Memory and Cogni- tion, 32, 175-184. [Kan95] Kanerva, P. (1995) A family of binary spatter codes. In F. Fogelman- Soulie and P. Gallineri (eds.), ICANN ?95, Proceedings International Conference on Arti cial Neural Networks, 1, 517-522. [Kha04] Khader, P & Rosler, F. (2004) EEG power and coherence analysis of visually presented nouns and verbs reveals left frontal processing di erence. Neuroscience Letters, 354, 111-114. [Kli96] Klimesch, W. (1996) Memory processes, brain oscillations and EEG syncronization. International Journal of Psychophysiology, 24, 61-100. [Kli99] Klimesch, W. (1999) EEG alpha and theta oscillations re ect cogni- tive and memory performance: a review and analysis. Brain Research Reviews, 29, 169-195. [Kli01] Klimesch, W., Doppelmayr, M., Wimmer, H., Schwaiger, J., Rohm, D" Gruber, W. & Hutzler, F. (2001) Theta band power changes in normal and dyslexic children. Clinical Neurophysiology, 112, 1174-1185. [Kor85] Koriat, A. & Norman, J. (1985) Reading rotated words. Journal of Experimental Psychology: Human Perception and Performance, 11, 490-508. [Kwa99a] Kwantes, P.J. & Mewhort, D.J. (1999) Modeling lexical decision and word naming as a retrieval process. Canadian Journal of Experimental Psychology, 53, 306-315. 292 [Kwa99b] Kwantes, P.J. & Mewhort, D.J. (1999) Evidence for sequential process- ing in visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 25, 276-231. [Lan01] Lane, P. & Henderson, J. (2001) Incremental Syntactic Parsing of Nat- ural Language Corpora with Simple Synchrony Networks. IEEE Trans- actions on Knowledge and Data Engineering, 13(2). [Lav01a] Lavidor, M., Ellis, A., Shillcock, R. & Bland, T. (2001) Evaluating a split processing model of visual word recognition: E ects of word length. Cognitive Brain Research, 12, 265-272. [Lav01b] Lavidor, M., Babko , H. & Faust, M. (2001) Analysis of standard and non-standard visual format in the two hemispheres. Neuropsychologia, 39, 430-439. [Lav02a] Lavidor, M. & Ellis, A. (2002) Word length and orthographic neigh- borhood size e ects in the left and right cerebral hemispheres. Brain and Language, 80, 45-62. [Lav02b] Lavidor, M. & Ellis, A. (2002) Orthographic Neighborhood e ects in the right but not in the left cerebral hemisphere. Brain and Language, 80, 63-76. [Lav02c] Lavidor, M., Ellis, A.W. & Pansky, A. (2002) Case alternation and length e ects in lateralized word recognition: Studies of English and Hebrew. Brain and Cognition, 50, 257-271. 293 [Lav03] Lavidor, M., & Walsh, V. (2003) A magnetic stimulation examina- tion of orthographic neighborhood e ects in visual word recognition. Journal of Cognitive Neuroscience. 2003, 15, 354-363. [Lav04a] Lavidor, M., Hayes, A., Shillcock, R., & Ellis, A.W. (2004) Evaluating a split processing model of visual word recognition: e ects of ortho- graphic neighborhood size. Brain and Language, 88, 312-320. [Lav04b] Lavidor, M. & Walsh, V. (2004) The nature of foveal representation. Nature Reviews Neuroscience, 5, 729-735. [Lee03] Lee, S. & Nakayama, M. (2003) E ects of syntactic and phonological similarity in Korean center-embedding constructions. Poster presented at the 16th Annual CUNY Conference on Sentence Processing, Cam- bridge, MA. [Lef04] Le , A. (1994) A historical review of the representation of the visual eld in primary visual cortex with special reference to the neural mech- anisms underlying macular sparing. Brain and Language, 88, 268-278. [Leg01] Legge, G.E., Mans eld, J.S. & Chung S.T. (2001) Psychophysics of reading. XX. Linking letter recognition to reading speed in central and peripheral vision. Vision Research, 41, 725-743. [Lef78] Lefton, L.A., Fisher, D.F. & Kuhn, D.M. (1978) Left-to-right process- ing of alphabetic material is independent of retinal location. Bulletin of the Psychonomic Society, 112, 171-174. 294 [Lev00] Levitan, S. & Reggia J.A. (2000) A computational model of lateral- ization and asymmetries in cortical maps. Neural Computation, 12, 2037-2062. [Lew96] Lewis, R. L. (1996). Interference in short-term memory: The magical number two (or three) in sentence processing. The Journal of Psy- cholinguistic Research, 25, 93-115. [Lew95] Lewis, R. L. (1998) Renanalysis and limited repair parsing: leaping o the garden path. In J. Fodor, & F. Fereirra (Eds.), Reanalysis in sentence processing. Dordrecht: Kluwer, 247-285. [Lew02] Lewis, R. L. & Nakayama, M. (2002). Syntactic and positional similar- ity e ects in the processing of Japanese embeddings. In Nakayama, M. (Ed.) Sentence Processing in East Asian Languages. Stanford: CSLI Publications. [Lie03] Liederman, J., McGraw-Fisher, J., Schulz, M., Maxwell, C., Theoret, H. & Pascual-Leone, A. The role of motion direction selective extras- triate regions in reading: a transcranial magnetic stimulation study. Brain and Language, 85, 140-155. [Lis95] Lisman, J.E., Idiart, M.A.P. (1995) Storage of 7 2 short-term mem- ories in oscillatory subcycles. Science, 267, 1512-1515. [Lor04] Lorusso, M.L., Facoetti, A. & Molteni, M. (2004) Hemispheric, atten- tional, and processing speed factors in the treatment of developmental dyslexia. Brain and Cognition, 55, 341-348. 295 [Lov93] Lovegrove, W. (1993) Wof syeakness in the transient visual system: A causal factor in dyslexia? Annals of the the New York Academy of Sciences, 682, 57-69. [Mac04] MacKeben, M., Trauzettel-Klosinski, S., Reinhard, J., Durrwatchter, U., Adler, M., & Klosinski, G. (2004). Eye movement control dur- ing single-word reading in dyslexics. Journal of Vision, 4, 388-402, http://journalofvision.org/4/5/4/, doi:10.1167/4.5.4. [Mag01] Magee, J.C. (2001) Dendritic mechanisms of phase precession in hip- pocampal CA1 pyramidal neurons. Journal of Neurophysiology, 86, 528-532. [Mar80] Marcus, M. (1980) A theory of syntactic recognition for natural lan- guage. MIT Press: Cambridge, MA. [Mar01] Marcus, G. (2001) The Algebraic Mind. MIT Press: Cambridge, MA. [Mau83] Maunsell, J.H.R. & Van Essen, D.C. (1983) Functional properties of neurons in the middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and orientation. Journal of Neurophysiology, 19, 1127{1147. [Mau90] Maunsell, J.H., Nealey, T.A. & De Priest, D.D. (1990) Magnocellular and parvocellular contributions to responses in the middle temporal visual area (MT) of the macaque monkey. Journal of Neuroscience, 10, 3323-3334. 296 [Mas82] Mason, M. (1982) Recognition time for letters and nonletters: E ects of serial position, array size, and processing order. Journal of Experi- mental Psychology, 8, 724-738. [McC03] McCandliss, B., Cohen, L. & Dehaene, S. (2003) The visual word form area: expertise for reading in the fusiform gyrus. Trends in Cognitive Sciences, 7, 293-299. [McC81] McClelland, J.L. & Rumelhart, D.E. (1981) An interactive activation model of context e ects in letter perception: Part 1. An account of basic ndings. Psychological Review, 88, 375-407. [Mel57] Melville, J.P., (1957) Word-length as a factor in di erential recognition. American Journal of Psychology, 70, 316-318. [Mew69] Mewhort, D.J.K., Merikle, P.M. & Bryden, M.P. (1969) On the transfer from iconic to short-term memory. Journal of Experimental Psychol- ogy, 81, 89-94. [Mon04] Monaghan, P., Shillcock, R. & McDonald, S. (2004) Hemispheric asym- metries in the split-fovea model of semantic processing. Brain and Lan- guage, 88, 339-354. [Mon98] Montant, M. Nazir, T.A., Poncet, M. (1998) Pure alexia and the view- ing position e ect in printed words. Cognitive Neuropsychology, 15, 93-140. [Moz91] Mozer, M. (1991) The Perception of Multiple Objects: A Connectionist Approach. MIT Press. 297 [Naz03] Nazir, T. A. (2003) On hemispheric specialization and visual eld ef- fects in the perception of print: A comment on Jordan, Patching, and Thomas. Cognitive Neuropsychology, 20, 73-80. [Naz04a] Nazir, T.A. (2004) Reading habits, perceptual learning, and recogni- tion of printed words. Brain and Language, 88, 294-311. [Naz04b] Nazir, T.A., Kajii, N., Frost, R & Osaka, N. (2004) Script character- istics modify the way we perceive isolated words: Visual eld e ects in the perception of French, Hebrew, Kanji and Hiragana words. In preparation. [New04] New, B., Ferrand, L., Pallier, C. & Brysbaert, M. (2004) Re-examining word length e ects in visual word recognition: New evidence from the English Lexicon Project. Submitted. [Nic76] Nice, D.E, & Harcum, E.R. (1976) Evidence from mutual masking for serial processing of tachistoscopic letter patterns. Perceptual and Motor Skills, 42, 991-1003. [Nig93] Nigrin, A. (1993) Neural Networks for Pattern Recognition. MIT Press. [Nob94] Nobre, A., Allison, T. & McCarthy, G. (1994) Word recognition in the human inferior temporal lobe. Nature, 372, 260-263. [Ore84] O?Regan, J.K., Levy-Schoen, A., Pynte, J. & Brugaillere, B. (1984) Convenient xation location within isolated words of di erent length and structure. Journal of Experimental Psychology: Human Perception and Performance, 18, 185-197. 298 [Ore03] O?Reilly, R.C. & Frank, M. J. (2003) Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia. ICS Technical Report 03-03, University of Colorado, Boulder. [Pea98] Pearlmutter, N. & Mendelsohn, A. (1998) Serial versus parallel sen- tence processing. Paper presented at the 11th Annual CUNY Confer- ence on Human Sentence Processing. Rutgers University, New Jersey. [Per98] Perea, M. (1998) Orthographic neighbors are not all equal: Evidence using an identi cation technique. Language and Cognitive Processes, 13, 77-90. [Per03] Perea, M. & and Lupker, S. J. (2003) Transposed-letter confusability e ects in masked form priming. In: S. Kinoshita and S.J. Lupker, Eds., Masked Priming: State of the Art. Psychology Press. 97-120. [Per04] Perea, M. & Lupker, S. J. (2004) Can CANISO activate CASINO? Transposed-letter similarity e ects with nonadjacent letter positions. Journal of Memory and Language, 51, 231-246. [Per95] Peressotti, F. & Grainger, J. (1995) Letter-position coding in random consonant arrays. Perception & Psychophysics, 57, 875-890. [Per99] Peressotti, F. & Grainger, J. (1999) The role of letter identity and letter position in orthographic priming. Perception & Psychophysics, 61, 691-706. [Plsa93] Plaut, D.C. & McClelland, J.L. (1993) Generalization with componen- tial attractors: Word and nonword reading in an attractor network. 299 In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, 824-829, Erlbaum. [Pla95] Plate, T.A. (1995), Holographic reduced representations. IEEE Trans- actions on Neural Networks, 6, 623-641. [Pog90] Poggio, T. & and Edelman, S. (1990) A network that learns to recognize three-dimensional objects. Nature, 343, 263-266. [Pol90] Pollack, J. (1990) Recursive distributed representations. Arti cial In- telligence, 46, 77-105. [Pol02] Polk, T. & Farah, M. (2002) Functional MRI evidence for an abstract, not perceptual, word-form area. Journal of Experimental Psychology: General, 131, 65-72. [Pri03] Price, C. & Devlin, J. (2003) The myth of the visual word form area. Neuroimage, 19, 473-481. [Pul03] Pulvermuller, F. (2003) The Neuroscience of Language : On Brain Circuits of Words and Serial Order. Cambridge University Press. [Ore00] O?Reilly, R.C & Munakata, Y. (2000) Computational Explorations in Cognitive Neuroscience. MIT Press. [Rag01] Raghavachari S., Kahana M., Rizzuto D., Caplan J., Kirschen M., Bourgeois B., Madsen J. & Lisman J. (2001) Gating of human theta oscillations by a working memory task. Journal of Neuroscience, 21, 3175-3183. [Ray75] Rayner, K. (1975) Parafoveal identi cation during a xation in reading. Acta Psychologia, 4, 271-82. 300 [Ray76] Rayner, K. & McConkie, G. (1976) What guides a reader?s eye move- ments? Vision Research, 16, 829-837. [Reg01] Reggia, J.A., Goodall, S.M., Shkuro, Y. & Glezer, M. (2001) The cal- losal dilemma: explaining diaschisis in the context of hemispheric ri- valry via a neural network model. Neurological Research, 23, 465-471. [Rie97] Rieke, F., Warland, D., De Ruyter van Steveninck, R. & Bialek, W. (1997) Spikes: Exploring the Neural Code, MIT Press. [Rey04] Reynolds, M. & Besner, D. (2004) Neighborhood density, word fre- quency and spelling-sound regularity e ects in naming: Similarities and di erences between skilled readers and the Dual Route Cascaded computational model. Canadian Journal of Experimental Psychology, 13-31. [Roh02] Rohde, D.L.T. (2002). A connectionist model of sentence comprehen- sion and production. Unpublished PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. [Roh01] Rohm, D., Klimesch, W., Haider, H. & Doppelmayr, M. (2001) The role of theta and alpha oscillations for language comprehension in the human electroencephalogram. Neuroscience Letters, 310, 137-40. [Sch04] Schoonbaert, S. & Grainger, J. (2004). Letter position coding in printed word perception: E ects of repeated and transposed letters. Language and Cognitive Processes. In press. 301 [Sei89] Seidenberg, M.S. & McClelland, J.L. (1989) A distributed, develop- mental model of word recognition and naming. Psychological Review, 96, 523-568. [Sha93] Shastri, L., & Ajjanagade, V. (1993) From simple associations to sys- tematic reasoning. Behavioral and Brain Sciences, 16, 417-494. [Sha99] Shastri, L. (1999) Advances in SHRUTI { A neurally motivated model of relational knowledge representation and rapid inference using tem- poral synchrony. Applied Intelligence. 11, 79-108. [She76] Sheil, B.A. (1976) Observations on context-free parsing. Statistical Methods in Linguistics, 6, 71-109. [Sik02] Sikaluk, P. D., Sears, C. R., and Lupker, S. J. (2002). Orthographic neighborhood e ects in lexical decision: The e ects of nonword ortho- graphic neighborhood size. Journal of Experimental Psychology: Hu- man Perception and Performance, 28, 661-681. [Sta75] Stanners, R.F., Jastrzembski, J., E., & Westbrook, A. (1975) Fre- quency and visual quality in a word-nonword discrimination task. Jour- nal of Verbal Learning and Verbal Behaviour. 14, 259-264. [Ste97] Stein, J. & Walsh, V. (1997) To see but not to read; the magnocellular theory of dyslexia. Trends in Neuroscience, 20, 147-152. [Ste03] Stevens M. & Grainger J. (2003) Letter visibility and the viewing posi- tion e ect in visual word recognition. Perception & Psychophysics, 65, 133-151. 302 [Sus01] Sussman, R.S. & Sedivy, J.C. (2001) The time-course of processing syntactic dependencies: Evidence from eye movements during spoken narratives. In J.S. Magnuson and K. M. Crosswhite (Eds.) University of Rochester Working Papers in the Language Sciences, 2, 52-70. [Tar99] Tarkiainen, A., Helenius, P., Hansen, P. C., Cornelissen, P. L., & Salmelin (1999) Dynamics of letter string perception in the human occipitotemporal cortex . Brain, 11, 2119-2132. [Tal00] Talcott, J.B., Witton, C., McLean, M.F., Hansen, P.C., Rees, A., Green, G.G. & Stein JF (2000). Dynamic sensory sensitivity and chil- dren?s word decoding skills. Proceedings of the National Academy of Sciences, 97, 2952-2957. [Tou88] Touretzky, D. S. & Hinton, G. E. (1988) A distributed connectionist production system. Cognitive Science, 12, 423-466. [Van02] Van Rullen, R. & Thorpe, S.J. (2002) Sur ng a spike wave down the ventral stream. Vision Research , 42, 2593-2615. [Vic96] Victor, J.D. & Purpura, K.P. (1996) Nature and precision of temporal coding in visual cortex: a metric-space analysis. Journal of Neurophys- iology, 76, 1310-1326. [Vid01] Vidyasagar, T.R.(2001) From attentional gating in macaque primary visual cortex to dyslexia in humans. Progress in Brain Research, 134, 297-312. [Vid04] Vidyasagar, T.R. (2004) Neural underpinnings of dyslexia as a disorder of visuo-spatial attention. Clinical Experimental Optometry, 87, 4-10. 303 [Vos00] Vosse, T. & Kempen, G. (2000) Syntactic structure in human parsing: A computational model based on competitive inhibition and a lexicalist grammar. Cognition, 75, 105-143. [War02a] Warren, T. & Gibson, E. (2002) The in uence of referential processing on sentence complexity. Cognition, 85, 79-112. [War02b] Warren, T. & Gibson, E. (2002) Evidence for a constituent-based dis- tance metric in distance-based complexity theories. Poster presented at the CUNY Conference on Human Sentence Processing. [War80] Warrington, E. & Shallice, T. (1980) Word-form dyslexia. Brain, 103, 99-112. [Was95] Wassle, H., Grunert, U., Rohrenbeck, J. & Boycott, B. (1989) Cortical magni cation factor and the ganglion cell density of the primate retina. Nature, 341, 643 - 646. [Wes87] Westheimer G., (1987) Visual Acuity. Chapter 17. In Moses, R. A. and Hart, W. M. (eds) Adler?s Physiology of the Eye, Clinical Application. St. Louis: The C. V. Mosby Company. [Whi01a] Whitney, C. (2001) How the brain encodes the order of letters in a printed word: The SERIOL model and selective literature review. Psy- chonomic Bulletin and Review, 8, 221-243. [Whi01b] Whitney, C. (2001) Position-speci c e ects within the SERIOL frame- work of letter-position coding. Connection Science, 13, 235-255. [Whi02] Whitney, C. (2002) An explanation of the length e ect for rotated words. Cognitive Systems Research, 3, 113-119. 304 [Whi04a] Whitney, C. (2004) Hemisphere-speci c e ects in word recognition do not require hemisphere-speci c modes of access. Brain and Language, 88, 279-293. [Whi99] Whitney, C. & Berndt, R.S. (1999) A new model of letter string en- coding: Simulating right neglect dyslexia. Progress in Brain Research, 121, 143-163. [Whi04b] Whitney, C. & Lavidor, M. (2004) Orthographic neighborhood e ects: The SERIOL model account. Submitted. [Whi04c] Whitney, C. & Lavidor, M. (2004) Why word length only matters in the left visual eld. Neuropsychologia. In press. [Whi04d] Whitney & Weinberg (2004) Interaction between Subject Type and Un- grammaticality in Doubly Center-Embedded Relative Clauses. Poster presented at the 17th Annual CUNY Sentence Processing Conference, University of Maryland. [Wol74] Wolford, G. & Hollingsworth S. (1974) Retinal location and string posi- tion as important variables in visual information processing. Perception & Psychophysics, 16, 437-442. [You85] Young, A. W. & Ellis A.W. (1985) Di erent methods of lexical access for words presented to the left and right visual hemi elds. Brain and Language, 24, 326-358. [Zie98] Ziegler, J. & Perry, C. (1998) No more problems in Coltheart?s neigh- borhood: resolving neighborhood con icts in the lexical decision task. Cognition, 68, B53-B62. 305