ABSTRACT Title of dissertation: PROCESSING INFORMATION ON INTERMEDIATE TIMESCALES WITHIN RECURRENT NEURAL NETWORKS Oliver L. C. Rourke, Doctor of Philosophy, 2016 Dissertation directed by: Professor Dan A. Butts Department of Biology The cerebral cortex has remarkable computational abilities; it is able to solve prob- lems which remain beyond the most advanced man-made systems. The complexity arises due to the structure of the neural network which controls how the neurons interact. One surprising fact about this network is the dominance of ‘recurrent’ and ‘feedback’ connections. For example, only 5-10% of connections into the earliest stage of visual processing are ‘feedforward’, in that they carry information from the eyes (via the Lateral Geniculate Nucleus). One possible reason for these connec- tions is that they allow for information to be preserved within the network; the underlying ‘causes’ of sensory stimuli usually persist for much longer than the time scales of neural processing, and so understanding them requires continued aggrega- tion of information within the sensory cortices. In this dissertation, I investigate several models of such sensory processing via recurrent connections. I introduce the transient attractor network, which depends on recurrent plastic connectivity, and demonstrate in simulations how it might be involved in the processes of short term memory, signal de-noising, and temporal coherence analysis. I then show how a certain recurrent network structure might allow for transient associative learning to occur on the timescales of seconds using presynaptic facilitation. Finally, I consider how auditory scene analysis might occur through ‘gamma partitioning’. This process uses recurrent excitatory and inhibitory connections to preserve information within the neural network about its recent state, allowing for the separation of auditory sources into different perceptual cycles. PROCESSING INFORMATION ON INTERMEDIATE TIMESCALES WITHIN RECURRENT NEURAL NETWORKS by Oliver L. C. Rourke Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2016 Advisory Committee: Professor Dan Butts, Chair/Advisor Professor Radu Balan Professor Wojciech Czaja Professor David Jacobs Professor Quentin Gaudry c© Copyright by Oliver Rourke 2016 Table of Contents List of Figures v List of Abbreviations vii 1 Introduction 1 1.1 Basic Anatomy and Physiology of a Neuron . . . . . . . . . . . . . . 1 1.2 Neural Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Leaky Integrate-and-Fire model . . . . . . . . . . . . . . . . . 3 1.2.2 Firing Rate Models . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Synaptic Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Long-Term Plasticity . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Short-Term Synaptic Plasticity . . . . . . . . . . . . . . . . . 9 1.3.2.1 Presynaptic Facilitation . . . . . . . . . . . . . . . . 9 1.3.2.2 Presynaptic Depression . . . . . . . . . . . . . . . . 10 1.3.2.3 Transient Associative Plasticity . . . . . . . . . . . . 11 1.4 From Neurons to Networks of Neurons . . . . . . . . . . . . . . . . . 12 1.4.1 Structures in Networks of Neurons . . . . . . . . . . . . . . . 12 1.4.1.1 Hierarchical Processing in Networks of Neurons . . . 13 1.4.1.2 Recurrent and Feedback Connections . . . . . . . . . 14 1.4.2 Attractors in Neural Networks . . . . . . . . . . . . . . . . . . 17 1.4.2.1 The Hopfield Network . . . . . . . . . . . . . . . . . 17 1.4.2.1.1 Temporary Information Storage . . . . . . . 19 1.4.2.1.2 Classifying Inputs . . . . . . . . . . . . . . 20 1.4.2.2 Periodic Attractors and Cortical Oscillations . . . . . 20 1.4.2.3 Slow Manifolds . . . . . . . . . . . . . . . . . . . . . 22 1.4.2.4 Dynamic Field Theory (Bifurcation Theory) . . . . . 23 1.5 Models of Short-Term Memory . . . . . . . . . . . . . . . . . . . . . 23 1.5.1 Nature of Short-Term Memory . . . . . . . . . . . . . . . . . . 24 1.5.2 Experimental Evidence of Short-Term Memory . . . . . . . . . 25 1.5.3 Persistent Activity Models of Short-Term Memory . . . . . . . 27 1.5.4 Activity Silent models of Short-Term Memory . . . . . . . . . 28 1.5.4.1 STM from Transient Associative Modifications . . . 29 ii 1.5.4.2 STM via Synaptic Facilitation . . . . . . . . . . . . . 30 1.6 The Auditory System . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.6.1 Sound generation . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.6.2 Detection and Representation of Sounds . . . . . . . . . . . . 33 1.6.3 Auditory Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.6.3.1 Identification and Localization . . . . . . . . . . . . 34 1.6.3.2 Auditory Streaming . . . . . . . . . . . . . . . . . . 35 1.7 Models of Auditory Streaming . . . . . . . . . . . . . . . . . . . . . . 38 1.7.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.7.2 Segregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.7.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 1.7.4 Neural Networks for Auditory Streaming . . . . . . . . . . . . 43 1.7.4.1 “A Neural Cocktail-Party Processor” . . . . . . . . 43 1.7.4.2 Local Excitatory Global Inhibitory Oscillator Net- work (LEGION) . . . . . . . . . . . . . . . . . . . . 44 2 Cortical Computations via Transient Attractors 47 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.1 Short-Term Memory via Transient Attractors . . . . . . . . . 52 2.3.2 Maintenance of Information over Time . . . . . . . . . . . . . 58 2.4 Associating Distinct Patterns of Input via Temporal Coherence . . . 59 2.4.1 Separating Signal from Noise . . . . . . . . . . . . . . . . . . 61 2.4.2 Modeling Attention and the Role of Inhibition . . . . . . . . . 62 2.4.3 Model Robustness . . . . . . . . . . . . . . . . . . . . . . . . 64 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.5.1 Alternative Models for Short-Term Memory . . . . . . . . . . 68 2.5.2 Experimental Evidence for Transient Associative Synaptic Plas- ticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.5.3 Extensions of the Transient Attractor Network . . . . . . . . . 70 3 Achieving Transient Associative Plasticity through Synaptic Facilitation 75 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.3.1 Short-Term Memory in a Complete Facilitating Network . . . 83 3.3.2 Facilitating Feature Network with Generalized Features . . . . 85 3.3.3 Extracting Information using Temporal Coherence . . . . . . . 88 3.3.4 Signal De-noising . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 iii 4 Auditory Streaming via Gamma Partitioning 97 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.1 Recurrent Excitatory Connections . . . . . . . . . . . . . . . . 102 4.3.2 Stimulus Pre-Processing . . . . . . . . . . . . . . . . . . . . . 103 4.3.3 Stimuli from Musical Instruments . . . . . . . . . . . . . . . . 104 4.3.4 Dynamic Vocal Stimuli . . . . . . . . . . . . . . . . . . . . . . 109 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Conclusions 121 iv List of Figures 1.1 A diagram of a neuron . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 LIF neuron as a circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Firing rate nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Hebbian LTP/LTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Hierarchy in V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Visual processing hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 15 1.7 Recurrent network structure . . . . . . . . . . . . . . . . . . . . . . . 16 1.8 The ‘Hopfield Network’ . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.9 Delayed recognition task for STM . . . . . . . . . . . . . . . . . . . . 26 1.10 STM via facilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.11 Harmonics and formants . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.12 Auditory segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.13 Continuous features perceived as a stream . . . . . . . . . . . . . . . 42 1.14 The Local Excitatory Global Inhibitory Oscillator Network . . . . . . 46 2.1 Transient attractors in single layer network via associative weight modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.2 Transient attractors can simultaneously store several patterns . . . . 55 2.3 Short-term memory in a ring attractor . . . . . . . . . . . . . . . . . 57 2.4 Maintenance of transient attractor by uniform input . . . . . . . . . . 60 2.5 Network can distinguish between patterns using temporal coherence . 60 2.6 Transient attractors able to de-noise and fill in occluded inputs . . . . 63 2.7 Inhibition as proxy for attention . . . . . . . . . . . . . . . . . . . . . 65 2.8 Transient attractor model robustness to variations in parameters and network homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.1 A sketch of associative plasticity due to synaptic facilitation . . . . . 81 3.2 Short-term memory in a two-layer, pairwise complete network . . . . 84 3.3 Two-layer network with generalized features . . . . . . . . . . . . . . 87 3.4 Temporal coherence analysis with facilitating network . . . . . . . . . 90 3.5 Signal de-noising with facilitating network . . . . . . . . . . . . . . . 90 4.1 Network structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 v 4.2 Cochlear pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3 Gamma partitioning applied to two instruments . . . . . . . . . . . . 106 4.4 Quantifying source separation . . . . . . . . . . . . . . . . . . . . . . 107 4.5 Gamma partitioning across all samples . . . . . . . . . . . . . . . . . 109 4.6 Gamma partitioning of ‘Bohemian Rhapsody’ . . . . . . . . . . . . . 111 4.7 Gamma Partitioning of ‘Flower Duet’ . . . . . . . . . . . . . . . . . . 113 vi List of Abbreviations Hz Hertz LEGION Local Excitatory Global Inhibitory Oscillator Network LTD Long Term Depression LTP Long Term Potentiation PC Principal Component (from PCA) PCA Principal Component Analysis PFC Prefrontal Cortex PNG Polysynchronous Neuronal Group PPV Positive Predictive Value s Second STM Short Term Memory STP Short Term Potentiation TPR True Positive Rate A note on shorthand for synaptic strengths. S = Stimulus, E = Excitatory, I = Inhibitory. Numbers indicate layer (where appropriate): E1 = first layer of excitatory. X is a wildcard for a number. WAXBX = Weight from type A in any layer to type B in the same layer. vii Chapter 1: Introduction 1.1 Basic Anatomy and Physiology of a Neuron Neurons are a type of cell responsible for receiving, processing, and transmitting information. Although there are various types of neurons, they share three main anatomical features: the dendrites, the cell body, and the axon (Figure 1.1). Inputs to the cell body come from the neuron’s dendrites. These inputs change the neuron’s membrane potential, causing either an increase (excitatory inputs) or a decrease (inhibitory inputs). Should the membrane potential be raised to a sufficiently high level, the neuron generates an action potential, also known as a spike. Although action potentials do vary somewhat in their duration, amplitude, and shape (Dayan and Abbott, 2001), they are typically treated as identical events in neural encoding. This action potential then propagates along the axon, at the end of which are located junctions with other cells. These junctions are known as synapses, and cause a postsynaptic current (PSC) into the postsynaptic cell when- ever there is a spike in the presynaptic cell. Although the presynaptic spikes are similar, the magnitude of the PSC varies in accordance with some synaptic strength. The factors that influence this synaptic strength are summarized below (Section 1.3). All neurons may be classified as either excitatory or inhibitory depending on their effect on any postsynaptic neurons. Spikes from excitatory cells always induce a positive PSC, whereas those from inhibitory cells induce a negative PSC. This di- 1 Figure 1.1: A diagram of a neuron. A simple diagram showing cell body, den- drites, and axon. Inset: Details of a chemical synapse. Modified from ‘Complete neuron cell diagram.svg’ by LadyOfHats which has been released into the public domain. vision of excitatory vs. inhibitory is decided by cell type; a single neuron can not contain both excitatory and inhibitory synapses, and cells can not transform from excitatory to inhibitory. 1.2 Neural Processing The processes underlying neural behavior may be quantified; indeed, this is neces- sary if we are to construct computational simulations of networks of neurons. In this section I briefly outline two models that I use in later chapters. 2 Figure 1.2: LIF neuron as a circuit Left: Circuit diagram. Right: Neuron behavior. 1.2.1 Leaky Integrate-and-Fire model In the Leaky Integrate-and-Fire (LIF) model, the neuron’s state is determined by its membrane potential, Vj(t). This is governed by the derivative form of the capacitor equation with a leak current and both excitatory and inhibitory synaptic input currents, C dVj dt = ILeakj (t) + I E j (t) + I I j (t). (1.1) . This is equivalent to a simple circuit (Figure 1.2). The leak current can then be derived from Ohm’s Law ILeakj (t) = gm[Vrest − Vj(t)], (1.2) where gm is the cell membrane conductance, and Vrest is the resting potential. The synaptic input currents are calculated by summing the currents from each individual 3 synapse: IEj (t) = NE∑ i=1 gEij(t)[E E rev − Vj(t)] (1.3) IIj (t) = NE∑ i=1 gIij(t)[E I rev − Vj(t)]. (1.4) Terms of the form E∗rev represent the reversal potentials for the excitatory and in- hibitory currents. We note that EIrev < V < E E rev in a cell which is near its spiking threshold, meaning an increase in excitatory conductance will lead to a positive cur- rent (and vice versa for inhibitory). At rest, the excitatory and inhibitory currents are balanced. Presynaptic spikes change the conductances of the channels, which in turn leads to a shift in the cell membrane potential. In this dissertation, the term synaptic weight (notated as WEij and W I ij) will be used synonymously with the synaptic conductance (gEij and g E ij). The LIF model does not explicitly contain a description of spike generation. Instead, a firing threshold is set. Should the membrane potential reach that threshold, a binary spike is produced, and the neuron’s membrane potential is reset to its rest value: tspk : Vj(tspk) = θ (1.5) limt→t+spk Vj(t) = Vrest. (1.6) Alternatively, in order to approximate inactivated Na+ channels after each spike, 4 we may include an absolute refractory period, tref : IEj (t) =  0 tspk < t < ttpk + tref IEj (t) otherwise (1.7) (Brody and Hopfield, 2003; Miconi and Vanrullen, 2010; Mongillo et al., 2008). Neural behavior can appear to be highly random; there is a wide variability in firing rates of neurons within sensory cortices between trials with the same sensory stimu- lus (Fourcaud and Brunel, 2002; Masquelier, 2013; Movshon, 2000). This variation comes from a variety of external and intrinsic properties; a large amount of effort has gone into teasing these two sources apart. In neural models, however, it is common to combine these factors into one additive noise term, C dVj(t) dt = −RVj(t) + ITotal,j(t) + αNdWt, (1.8) where dWt is the time derivative of Brownian motion (N(0, √ ∆t)). This additive noise is of particular interest when subpopulations of neurons receive the same ex- citatory and inhibitory currents, but where variability within the subpopulations is important. 1.2.2 Firing Rate Models For large networks of neurons, individually modeling each neuron’s membrane po- tential can be computationally expensive; it is often easier to consider groups of neurons with similar properties and track the averaged firing rates (Ermentrout and Terman, 2010). The firing rate may also be thought of as each individual neurons 5 probability of spiking; the probability of spiking over time is assumed to be constant for any given stimulus even if individual spike times are highly variable. Such firing rate models are the basis for most artificial neural networks used in machine learn- ing. We now construct a typical firing rate model for use in simulating biological networks. The dynamics of the firing rate model are highly similar to those of the LIF model described above. The synaptic currents and membrane potential are calculated using equations 1.1 to 1.4. The firing rate is then calculated as a nonlinear function applied to a hidden variable, yj(t) = F (Vj(t)). (1.9) In this dissertation, I use a saturating, rectified function F (x) = max[0, 1−exp(−a(x− b))] (Figure 1.3). Other popular options include the logistic, rectified linear, and softplus functions. 1.3 Synaptic Plasticity The function of networks of neurons is dictated by the strengths of the synapses which connect the neurons to one another. These strengths are constantly changing through processes known as synaptic plasticity. These changes in synaptic strengths can therefore have a significant influence on the function of these networks, and is a point to which we return several times throughout this dissertation. This section, therefore, gives a brief biological and mathematical overview of synaptic plasticity. 6 Figure 1.3: Firing rate nonlinearity. Saturating, rectified function which may be used in firing rate models. We divide these synaptic plasticity processes into two categories, long-term and short-term, depending on the time scales for which the effects persist. Long-term plasticity plays a key role in deciding how networks of neurons are typically con- nected together, whereas short-term plasticity may play a significant role in affecting how networks of neurons process incoming information. 1.3.1 Long-Term Plasticity Long-term plasticity encompasses mechanisms that lead to changes in synaptic con- nectivity that persist for long amounts of time (at least several minutes). Two forms of long-term plasticity are long-term potentiation (LTP) and long-term depression (LTD), which respectively strengthen and weaken the synapses. The mechanisms underlying LTP and LTD are not completely understood, and these processes may vary significantly between different cortical regions. For example, experiments by 7 Figure 1.4: Hebbian LTP/LTD. Experimental recordings of Hebbian plasticity (Dudek and Bear, 1992). When paired with postsynaptic activity, low rates of presynaptic activity lead to LTD (left) while high rates lead to LTP (right). Dudek and Bear (1992) in the hippocampus revealed LTP in the presence of high presynaptic firing rates, and LTD in the presence of low presynaptic firing rates (Figure 1.4). These long-term plasticity mechanisms, when combined with the his- tory of the network’s activation, play a significant role in establishing the network connectivity (the synaptic strengths between the various neurons). One possible explanation for the adaptation of synapses is provided by Hebbian theory (Hebb, 1949). The theory suggests that changes to plasticity follow an asso- ciative learning rule, as summarized by the maxim “Cells that fire together, wire together” (Shatz, 1992). This is a form of correlation-based learning, similar to classical (’Pavlovian’) conditioning. In trying to decode causation behind stimuli, the net- work records correlation. This associative plasticity is believed to play a role in 8 the formation of memories and learning within neural networks, and is a topic to which we will return when discussing the Hopfield Network (Section 1.4.2.1), and information storage within neural networks (Section 1.5). 1.3.2 Short-Term Synaptic Plasticity Synaptic strengths are also known to undergo temporary modifications due to neu- ral activity; these effects typically decay within seconds. These processes may be classified as either facilitation or depression depending on whether they increase or decrease synaptic strengths. Multiple different processes have often been found to function at individual synapses, and these processes may function to varying extents in different synapses (Hennig, 2013; Regehr, 2012). 1.3.2.1 Presynaptic Facilitation Several mechanisms are known to increase the effective transmission of signals fol- lowing periods of elevated presynaptic firing. Decay time is often used to categorize such effects into facilitation (decays over tens of milliseconds), augmentation (de- cays over seconds), and post-tetanic potentiation (decays over tens of seconds to minutes) (Abbott and Regehr, 2004; Purves et al., 2012). These processes have all been linked to the build up of Ca2+ in the presynaptic terminal caused by the large amount of presynaptic activity. In the neural network modeling literature, these processes are typically combined together under the name of ‘facilitation’, with an 9 assumed decay time of a few seconds (Barak and Tsodyks, 2007; Itskov et al., 2011; Mejias and Torres, 2009; Mongillo et al., 2008). Such a process can be modeled through the inclusion of an additional presynaptic facilitation variable, ui(t), which has a multiplicative effect on the synaptic strength W ′ij(t) = ui(t)Wij. (1.10) The facilitation variable is governed by dui(t) dt = 1 τu+ (umax − ui(t)) yi(t)α − 1 τu− (ui(t)− 1) (1.11) (Tsodyks et al., 1998). Thus in the presence of high presynaptic firing rates, yi(t), the facilitation variable increases towards an upper limit umax, and decays towards 1 in the absence of presynaptic firing. 1.3.2.2 Presynaptic Depression Another mechanism exists which reduces the effective synaptic strength following significant presynaptic activity; this is known as synaptic depression. This occurs due to a depletion of resources in the presynaptic terminal, and decays over around 100ms. This process is modeled in a similar manner to the enhancement above by incorporating a presynaptic depression variable xi(t), W ′′ij(t) = xi(t)W ′ ij(t), (1.12) which is governed by dxi(t) dt = 1 τx+ (1− xi(t)) yi(t)α − 1 τx− xi(t)ui(t)yi(t) (1.13) 10 (Tsodyks et al., 1998). In the presence of presynaptic activity, the depression vari- able is driven towards some lower limit (0), and rebounds towards 1 in the absence of firing. 1.3.2.3 Transient Associative Plasticity The above short-term mechanisms all depend solely on the pre- or postsynaptic firing rates, not the combination of the two. It has also been conjectured that a short-term associative effect might exist (Brenowitz and Regehr, 2005), modifying synaptic strengths according to both pre- and postsynaptic firing rates. This process has been associated with several tasks including short-term memory (Sandberg et al., 2003; Szatma´ry and Izhikevich, 2010), auditory streaming (Von der Malsburg and Schneider, 1986), visual object separation (Becker and Plumbley, 1996), corrective prediction in behavioral tasks (Schultz and Dickinson, 2000), and in modification to persistent activity (Brunel, 2003). Additionally, experiments in vivo (Shamma et al., 2013; Sugase-Miyamoto et al., 2008) have suggested that network connectivity does have some short time-scale associative properties. One possible process for this is ‘Short-Term Potentiation’; sometimes, following ac- tivity that would normally cause LTP, the synapse is only momentarily strengthened before returning to its original state. The time scales involved for strengthening are nevertheless on the order of several minutes (Erickson et al., 2011; Hennig, 2013; Malenka, 1991), beyond the ‘intermediate time-scales’ considered in this disserta- 11 tion. 1.4 From Neurons to Networks of Neurons In the body of this dissertation, the majority of the work is done using computer sim- ulations of large populations of neurons and synapses. The rules which individually govern each part have been discussed above in Sections 1.2 and 1.3. Understanding their collective action, however, requires understanding how the network as a single entity behaves. We briefly discuss a few such issues in this section. We start by considering what forms of connectivity are typically present in networks of neurons, and then describe how their collective behavior might be understood by examining the attractors. 1.4.1 Structures in Networks of Neurons The manner in which stimuli are processed in the sensory cortices is largely decided by the connectivity within the networks of neurons. Understanding the general connectivity pattern is therefore of great importance when constructing models of sensory processing. In this section, we summarize a few key facts that are known about the structure of cortical networks of neurons, and then discuss how this struc- ture might allow for various functional objectives to be achieved. 12 1.4.1.1 Hierarchical Processing in Networks of Neurons One of the early significant insights into how neural networks process information was made by observing how neurons in the primary visual cortex of an anesthetized cat respond to visual stimuli. It was already known that inputs from the retina were passed to a region known as the Lateral Geniculate Nucleus (LGN). Cells in the LGN had been observed to respond according to a ‘center-surround’ receptive field (RF), with excitatory connections at the centre of the RF surrounded by in- hibitory inputs (Figure 1.5, top). Hubel and Wiesel (1959) recorded that some cells in the primary visual cortex were sensitive to a more complex feature, specifically to an oriented line at a particular point in space (Figure 1.5, middle). They called such cells ‘simple’. They went on to discover a second type of neuron, ‘complex’ cells, which would respond to a line with a certain orientation but anywhere within a larger region of space (Hubel and Wiesel, 1962) (Figure 1.5, bottom). They re- alized that this could be explained by the sequential combination of features; each ‘simple’ cell’s responses may be calculated by summing the output from a collinear set of LGN cells, and likewise ‘complex’ cells’ may be formed by combining sets of nearby ‘simple’ cells with the same orientation (Figure 1.5). This led to the idea of hierarchical processing (Wurtz, 2009), that is that the cortex forms may detect complicated properties of stimuli by combining a variety of simpler properties. This form of processing has been investigated in a variety of sensory systems, including the visual (Felleman and Van Essen, 1991) and auditory (Kaas et al., 1999) sys- tems, and can extend to higher-level recognition tasks. For example, the output 13 Figure 1.5: Hierarchy in V1. Successive combinations of simple features into more complicated features explains early receptive fields of neurons in early sensory processing. from multiple simple or complex cells might be combined to detect the existence of a certain shape, output from multiple ‘shape-sensitive’ neurons could be combined to detect a certain object, and the output from multiple ‘object-selective’ could be combined to form abstract notions such as a group or identity. This hierarchical organization of the sensory cortices has inspired many neural net- work models (Figure 1.6). Similar hierarchical neural network models have enjoyed great success in the fields of machine learning and machine learning. 1.4.1.2 Recurrent and Feedback Connections The above hierarchical understanding of neural networks is enticing in its ability to explain how a neural structure might allow the network to perform higher order tasks such as object recognition. Investigations of the sensory cortices have, however, 14 Figure 1.6: Visual Processing Hierarchy. A comparison of hierarchy in the visual cortex (left) and in an artificial neural network used for machine vision (right) (Serre et al., 2005). 15 Figure 1.7: Recurrent Network Structure. Map of excitatory to excitatory synapses between different layers in the cat visual cortex. Numbers represent pro- portion of all excitatory to excitatory synapses in the region (about 30% could not be identified in this study). Only a minority of inputs come in from LGN, with no clear structural hierarchy present between the different layers. suggested that this explanation of the network structure is, at best, incomplete. For example, only 5-10 % of the inputs into the monkey primary visual cortex come from the eyes (via the LGN) (Peters et al., 1994), suggesting much of what we ‘see’ is highly dependent on recurrent processing with the visual cortex (results from another study included in Figure 1.7). The reasons behind this connectivity are not well understood, and many different suggestions abound (Olshausen and Field, 2004). Indeed, this is a topic to which we return many times in the body of this dissertation. In the next section, we analyze the attractors in neural networks as dynamical systems in order to better understand the effects that these connections may have on the behavior of networks of neurons, and what functional outcomes might be achieved. 16 1.4.2 Attractors in Neural Networks Several questions about neural network revolve around their long term behavior. For example: “Will the network ever settle into some steady pattern?”, “How long does this take?”, and “What types of patterns might we expect to see?”. Questions such as these may be answered by examining the nature and location of the dynamic attractors within the neural network (Ermentrout and Terman, 2010; Izhikevich, 2007). We initially define the attractor set of a neural network to be the set of asymp- totically stable states in the network. This stability means that all points that are sufficiently close to the attractor evolve towards it; such points are said to lie in the basin of attraction. This also means that attractors are self-sustaining; once a network has entered an attractor, it remains there until it is made to change by some external force. As an illustration of attractors within neural networks, we start by considering a well known recurrent artificial neural network known as the ‘Hopfield Network’. We then discuss other variants of attractors within neural networks which prove relevant to the work in the body of my dissertation. 1.4.2.1 The Hopfield Network The Hopfield network is a recurrent neural network first proposed in 1982 (Hopfield, 1982). Many different variants exist; here we review the model as it was initially presented. 17 All units (neurons) in the model are binary, either on (1) or off (0). A set of N patterns is selected, with each pattern corresponding to a subset of all units being active. Using this set of patterns, weights are then set between all units using wij = 1 N N∑ n=1 (2pni − 1)(2pnj − 1) (1.14) where pni represents the state of the ith neuron in the nth pattern, and wij is the weight between neurons number i and j (Figure 1.8A). This rule is associative in nature, meaning that strong positive connections form between neurons which are typically co-active (and negative connections between those with anti-correlated states). It is therefore broadly similar to the long term plasticity observed in cor- tical networks of neurons (Section 1.3), although the presence of both positive and negative weights coming from the same neuron is not biologically faithful. The behavior of the neural units is governed by Vi = H [∑ i 6=j wijVj − θ ] (1.15) where Vi represents the state of the ith unit, H is the Heaviside step function, and θ is the firing threshold. This is a firing-rate type model with instantaneous synaptic integration and a binary activation function. In the original formulation, each node is randomly selected to have its state updated, although deterministic variants with discrete time steps do exist. 18 Figure 1.8: The ‘Hopfield Network’. A. Associative training of Hopfield network. Two patterns P 1 and P 2) are used to determine recurrent weights. B. Information storage by activating an attractor. Activity is self-sustaining, allowing information to be later recalled by observing the state of the network. C. Different inputs may be classified by the basin of attraction in which they lie, which is often the pre-learned pattern with which they share the greatest similarity. The network established above now contains multiple different attractors. Ideally, the set of attractors created matches with the set of patterns which was used to train the network. It is, however, possible that attractors exist which don’t match with any of the patterns used, and that not all patterns have their own attractor (Orhan, 2014). We now consider two different functions that these attractors might achieve in the networks. 1.4.2.1.1 Temporary Information Storage The attractors in the network are stable, and therefore self-sustaining. This allows for the temporary storage of information within the network, by activating one par- ticular attractor (Figure 1.8B). This information persists unless interrupted, and may later be recalled by observing the state of the network. This form of informa- 19 tion storage does not require (or cause) some permanent changes to the networks structure. For this reason, attractors in neural network have often been used as a model for short-term memory. This is discussed further in Section 1.5 and Chapters 2 and 3. 1.4.2.1.2 Classifying Inputs Novel incoming stimuli cause some pattern of activation within the network; from there, the attractor dynamics decide how the network behaves (Hopfield, 1982). In particular, the network will evolve towards some ‘close’ attractor, or more accurately to the attractor whose basin of attraction contained the initial activation pattern. This allows novel inputs to be mapped onto the set of attractors, ideally classifying novel inputs using the set of training patterns (Figure 1.8C). This behavior might allow neural networks to recognize patterns when they have been partially corrupted or distorted, or to understand novel patterns in terms of pre-learned classes. The Hopfield network is a useful starting point when considering the roles attrac- tors may play in neural networks. There are, however, several other properties of neural networks which may be attributed to the attractor dynamics which are not demonstrated in the classical Hopfield network. 1.4.2.2 Periodic Attractors and Cortical Oscillations Oscillations in neural activity have been observed in many different regions of the brain and at a variety of frequencies. It has often been argued that these oscilla- 20 tions play multiple roles in cortical processing (Arnal and Giraud, 2012; Wang et al., 2010). For example, gamma waves (waves with frequency between 30 Hz and 100 Hz) have been associated with attention (Jensen et al., 2007), stimulus processing (Gray et al., 1989), short-term memory (Lundqvist et al., 2016), and motor control (Bragin et al., 1995). Moreover, gamma waves have been observed to synchronize over large areas of the cortex (Engel et al., 1990; Roskies, 1999), and with the de- gree of synchronicity being correlated with attention (Roelfsema et al., 1997). These gamma waves will play a crucial role in my model of stimulus processing in the au- ditory cortex (Chapter 4). It is possible to better understand these oscillations by examining the network as a whole; in particular, these oscillations may be thought of as periodic attractors in the neural network (attractors which are not a single stable state, but a periodic cycle of states). Analyzing these attractors may provide deeper understanding into these oscillations. For example, Brunel and Hakim (1999) examined interconnected populations of excitatory and inhibitory populations for a variety of connection strengths. They discovered a Hopf bifurcation in the system when it transforms from having a fixed point attractor to a periodic attractor. These oscillations (the periodic attractor) exist because intervals of significant excitatory activity cause waves of dampening inhibitory activity. Properties of this periodic attractor match well with observed gamma waves. Other analysis has discussed lower frequency oscillations (Holcman and Tsodyks, 2006), and how different scales of oscillations might become coupled (Fontolan et al., 2013). 21 1.4.2.3 Slow Manifolds The earlier approach to attractors only considered those which are asymptotically stable (all nearby points will approach the attractor). Alternatively, we may con- sider a broader family of attractors, those which are Lyapunov stable. Under this new definition, not all trajectories which start near the attractor need not tend to- wards it, however no trajectories may travel far away. In neural network literature, these attractors are often referred to as either line attractors, ring attractors, or plane attractors, depending on their topography (Druckmann and Chklovskii, 2010; MacNeil and Eliasmith, 2011; Sandberg et al., 2003; Seung, 1996). In the dynamics literature, this is known as the slow manifold, that is a subspace for a system of differential equations in which no eigenvalues are positive but there is at least one zero eigenvalue. In such systems, nearby trajectories approach the attractor (the manifold) and stay at one point on it (or move relatively slowly). These types of attractors attractors have been proposed to allow for not just the storage of information, but also its aggregation (Goldman et al., 2007). The attrac- tors essentially contain a continuum of stable points. This allows for the storage of continuous variables, and also allows for continual changes to be made. These slow manifolds have been hypothesized to allow for tracking eye location in the goldfish brain (Aksay et al., 2000), and physical location in the monkey brain (Wimmer et al., 2014). 22 1.4.2.4 Dynamic Field Theory (Bifurcation Theory) The above discussion has considered attractors in a neural network to have pre- determined locations and properties; these depend on factors such as the neural connectivity and the firing rate function. It is also possible to consider how the set of attractors might change as the the various factors which influence the network shift. One example of this was the analysis performed by Brunel and Hakim (1999), who investigated how networks with different levels of inhibitory feedback have dif- ferent attractor dynamics. In reality, the dynamics of the network are constantly modulated by external forces such as further sensory information, top-down atten- tion (Larocque et al., 2014; Shamma et al., 2011), and the state of the physical body (the so-called ‘Embodied Mind Thesis’ (Varela et al., 1991)). Dynamic Field Theory attempts to understand the interplay between these forces through bifurcation theory; the various influencing factors are understood as pa- rameters which modify the location/stability of various attractors within the system (Sandamirskaya et al., 2013; Schneegans and Scho¨ner, 2008). This area of study is only starting to be investigated in the literature, and is highly related to the concept of ‘transient attractors’ introduced in Chapter 2. 1.5 Models of Short-Term Memory In this section we consider the challenge of short-term memory (STM) in networks of neurons. We start by defining various term and properties of STM, and then discuss 23 several models which have been proposed. This section provides a background to my work on STM presented in Chapters 2 and 3. 1.5.1 Nature of Short-Term Memory In this dissertation we use the phrase Short-Term Memory (STM) to generally de- scribe the ability to preserve information for short periods of time, typically from 100 ms to a few minutes. This is an essential property behind a wide variety of cor- tical computations. For example, reading this very sentence involves keeping track of various words and combining them into sensible meaning; hopefully the meaning itself perseveres, but the knowledge of individual words is quickly lost. Working Memory (WM) describes a specific type of STM that deals with the active focus upon and modulation of information stored in working memory while performing a task, such as navigating a maze (Dudai, 2002). However, not all papers make such a clear distinction, often using the two terms interchangeably (Mongillo et al., 2008; Szatma´ry and Izhikevich, 2010). STM has two limiting features. These are: 1. Limited Duration. For example, in tests of remembering separate items short-term memory lasts 15-30 seconds (Atkinson and Shiffrin, 1971). This can be consciously extended by repetitively attending to each object (“P Sherman, 42 Wallaby Way, Sydney, P Sherman, 42 Wallaby Way, Sydney, ...”). 2. Limited Capacity. Strictly limited independent of task according to classic 24 “Seven, Plus or Minus Two” theory (Miller, 1956). Although the concept of some exact number has been challenged (Doumont, 2002), possibly even originally intended as little more than a joke (Miller, 1989), but there is a general agreement that capacity is highly limited. The memory capacity can be expanded by grouping objects together. 1.5.2 Experimental Evidence of Short-Term Memory The exact location in the cortex in which short-term memory is stored is an area of active debate. Properties of short-term memory are often tested through the delayed recognition task (Figure 1.9), or similar tests. In this task, the subject must store some piece(s) of information for a period of time if they are to receive a later reward. Early experiments found that a brain region known as the prefrontal cortex (PFC) exhibited continued activity for the duration of such tasks (Fuster and Alexander, 1971; Kubota and Niki, 1971). This activity was later shown to be encoding task- relevant information (Courtney et al., 1997; Zarahn et al., 1997). This apparently indicated that the information was stored in continued activity, leading to the idea that STM is maintained in certain specialized brain regions by persistent neural activity. A series of more recent experimental results, however, have drawn doubt upon such a mechanism underlying all short-term memory (Stokes, 2015). Firing rates are far lower, and far more variable, than expected from a standard persistent attractor 25 Figure 1.9: Delayed recognition task for STM. A standard test of short-term memory. A set of objects are shown to subject, followed by a delay period. Later, one object is shown, and subject must decide if this object is novel or familiar. (Shafi et al., 2007). Moreover, should the subject know in advance when a memory is required, firing rates reflect the memory only immediately before it is required (Barak et al., 2010; Watanabe and Funahashi, 2007). In addition, such a method of memory storage is energetically expensive (Attwell and Laughlin, 2001) and may interfere with the network performing other tasks (Curtis and Lee, 2010). Information storage might also occur within the sensory cortices themselves (Paster- nak and Greenlee, 2005; Postle, 2015; Weinberger, 2012). For example, Petrides (2000) investigated the properties of short-term memory in monkeys with lesions in either parts of the PFC, or parts of the primary sensory cortices. They found that lesions in the sensory cortices influenced how long memories could be maintained, but not how many memories could be simultaneously maintained, whereas lesions in the PFC had the opposite effect. Other experiments have suggested that persistent activity does not represent all information, but instead information pertinent to the immediate mental processes (the so-called ‘Focus of Attention’) (Lewis-Peacock et al., 2012; Olivers et al., 2011). This suggests an alternative model in which the PFC provides executive control over the memories stored in the sensory cortices 26 (Postle, 2015). We next consider various models of short-term memory, classifying them accord- ing to whether information is stored in the pattern of activity or in the network connection strengths. 1.5.3 Persistent Activity Models of Short-Term Memory The standard model of short-term memory is one using a fixed point attractors (Barak and Tsodyks, 2014), as demonstrated in earlier discussions on the Hopfield Network (Section 1.4.2.1.1). Such a persistent attractor mechanism both predicts and depends on the persistent neural activity, as described above. Self-sustenance is typically achieved in neural networks through different combina- tions of recurrent excitation (Goldman, 2009), inhibition (McDougal, 2011), or both (Aksay et al., 2007; Lim and Goldman, 2013; Machens et al., 2005). Variations on these models have discussed issues such as multi-item memory (Edin et al., 2009; Wei et al., 2012) or in cross-regional networks (Dubreuil and Brunel, 2016; Verduzco- Flores et al., 2009). Other models have proposed different forms of self-sustaining activity such as cycle attractors (Lisman and Idiart, 1995) and purely feed-forward circuits (Goldman, 2009). Finally, it has been suggested that short-term memory might be stored in ‘dynamic coding’ within the network (Stokes et al., 2013), which is essentially a chaotic attractor or a periodic attractor with very long orbits; no model has shown how information may be stored or reliably recalled in this way. 27 One interesting subset of papers has investigated how persistent neural activity due to a fixed point attractor might be influenced by transient synaptic modifications. In particular, they have shown how such networks might extend the duration (Barak and Tsodyks, 2007) and stability (Itskov et al., 2011) of the information, while also increasing number of memories which may be simultaneously stored (Rolls et al., 2013). These models share attributes with ‘activity silent models’ (covered below in Section 1.5.4) in their use of transient plasticity; I have chosen to classify them as persistent activity models as they still depend upon some continued neural activity. Although the above models vary greatly in the underlying mechanisms, all store information within self-sustaining patterns of neural activity. Such a process is often credited with the observed delay-period activity during working memory tasks in the PFC. Nevertheless, for the reasons mentioned in Section 1.5.2, these processes are unlikely to fully explain the short-term memory faculties of the mammalian cortex. 1.5.4 Activity Silent models of Short-Term Memory An alternative mechanism that might allow for the storage of information is tempo- rary modifications to the effective connectivity to the network. This in turn changes the attractor structure of the network. In these models, ongoing information storage is not dependent on continued activity, but it may modify the characteristics of any ongoing firing patterns. Here we examine models which have been are capable of 28 transiently storing information solely within synaptic modifications. 1.5.4.1 STM from Transient Associative Modifications Information about recent patterns of neural activation might be stored in short- term associative modifications to the networks connectivity. Such modifications temporarily strengthen the connections between neurons which are co-active, in turn meaning these neurons are more likely to be co-active in the future. This method was first proposed by Buhmann and Schulten (1986). The neurons in this model were governed by Leaky Integrate-and-Fire dynamics (Section 1.2.1). Recurrent plastic connections exist between all neurons, governed by a transient as- sociative rule. This causes positive connections to form between two neurons which are active at the same time, and negative connections to form between two neurons which are active at different times. In the absence of any activity, the weights return to their initial values. When exposed to a pattern for a period of time, this network forms strong positive connections between pairs of neurons which both lie within the pattern and it forms negative connections between neurons which lie in the pattern and those that don’t. It was demonstrated how this may lead to the completion of the pattern should it be shown later with some parts omitted. More recent work has extended this idea of short-term memory via transient associative plasticity. For example, Sandberg 29 et al. (2003) demonstrated how this mechanism could store regions of activity along a line attractor, and recall them in the presence of background noise. This idea has also been extended to periodic attractors, in which information is stored in a repeating pattern of neuronal activation (Szatma´ry and Izhikevich, 2010). 1.5.4.2 STM via Synaptic Facilitation Mongillo et al. (2008) demonstrated that short-term memory might also be stored using non-associative mechanisms such as facilitation (Section 1.3.2). This is only possible in a network with some underlying asymmetry in the connections between neurons; if the connectivity were perfectly homogeneous, facilitation of any individ- ual neuron will influence all other neurons equally. The model presented by Mongillo et al. (2008) contained several non-overlapping groups of excitatory neurons embed- ded in a larger population of excitatory neurons. Connections between neurons in the same group were, on average, several times stronger than other connections (Figure 1.10A). The network also included a population of inhibitory neurons to mediate competition within the excitatory population. A memory is stored in this network by stimulating one of the interconnected groups, causing the neurons within the group to facilitate. Although this facilitation is ap- plied to all connected neurons, the asymmetries in connectivity means that this facil- itation does not affect all neurons equally. Specifically, when a neuron is facilitated, it disproportionately excites neurons in the same group. The stored memory will 30 Excitatory, selective Excitatory, non-selective Inhibitory, non-selective Strong connection Weak connection Excitatory, selective Excitatory, non-selective Inhibitory, non-selective Strong connection Weak connection A B targets all the neurons), the network response is memory-specific: The neurons coding for the loaded item produce a PS; the others stay at base- line activity level. The PS also refreshes the mem- ory by producing additional facilitation, thus enabling subsequent memory reactivations. In the absence of reactivating signals, the memory fades away over a time scale on the order of tF. In the above scenario, the network has a single stable activity state corresponding to the spontaneous activity, thus appropriately timed external signals are required to extract the memo- ry from synaptic to spiking form. A more persist- ent form ofWM requires the selective population to exhibit a bistable activity regime, where the spontaneous state coexists with another stable state (8). Our network can be forced into this regime by increasing spontaneous activity by means of a global nonspecific background input (see SOM for themathematical analysis). Accord- ingly, we simulated the network for increasing levels of background input. In the bistable regime, PSs become persistent without reactivating inputs (Fig. 2B). Each reactivation increases u and decreases x, the latter terminating the PS. The time between subsequent PSs is controlled by the recovery from synaptic depression so that the PSs tend to occur with a period on the order of tD. With a tD compatible with (18), this would cor- respond to cortical oscillations in the theta- range, as observed during WM experiments (30, 31). Because tF >> tD, the decay of the utilization factor is balanced by the increase produced by the PSs, so that u remains at suf- ficiently high levels for subsequent PSs to emerge. Persistent PSs can be terminated by reducing background input, thus restoring the network to the transient regime. A different mod- el of persistent PSs is based on the in rease in asynchronous transmitter release (32). In the sim- ulation presented in Fig. 2B, neurons coding for a given memory exhibit highly coherent firing during the PSs. In more realistic networks, PSs could be broader and consist of random sub- populations of neurons, resulting in less pro- nounced synchrony (fig. S2). If nonspecific background input increases further, the network exhibits bistability between a spontaneous state and an asynchronous state of enhanced firing rate (Fig. 2C). Information about thememory is main- tained in both synaptic and spiking form. This regime exists for sufficiently strong recurrent connections, which could possibly result from extensive learning. The use of residual calcium at synaptic terminals as a memory “buffer” requires low emission rates (Fig. 2, histograms). Moreover, the buffer content is not substantially affected by the neural activity in the rest of the cortex. When we presented the network with a noisy input that targeted a random subset of excitatory neurons for a brief duration, the increased firing of neu- rons receiving the input suppressed the memory- related spiking activity (Fig. 3, teal shading). The information, however, remained in the increased utilization factor in the memory population; hence, the spiking activity resumed after the termination of the suppressing input. The same feature enables the network to keep multiple items simultaneously with interleaved PSs (33). We illustrated this possibility in two different conditions: (i) when the network has a single activity state and PSs result from a sequence of reactivating signals (Fig. 3A) or (ii) when the network exhibits persistent PSs (Fig. 3B). When a new item is presented, the previous memory is temporarily suppressed (dark shading), after which the network maintains both memories by subsequent reactivations of the two populations. When the network is in the regime of externally generated PSs, two-item WM results in the same oscillation frequency in the global activity as that of the one-item memory (30), whereas in the regime of persistent PSs, two-item memory re- sults in higher global frequency. In more realistic implementations of the model, an increase in frequency may be less pronounced also in the persistent regime (fig. S2). Consequently, we propose that WM can be maintained by short-term synaptic facilitation. Accumulation of residual calcium in the pre- synaptic terminals could carry the information about the recalled memory in a working form, reducing the need for metabolically costly action potentials. The memories are transformed into spiking activity, either as a result of global re- activating input to the network or by virtue of the intrinsic network dynamics. Not all encountered stimuli enter WM, and we thus expect the basal modality of the network to be the transient one. The decision to allow items intoWM is mediated by attention, which we suggest is represented by the global excitatory input, either in tonic or oscillating mode. The performance of human observers on memory tasks is positively corre- lated with the level of neural activity during the presentation of the items (34). The model predicts that residual calcium at the presynaptic terminals should be tonically en- hanced during WM, even when there is no noticeable increase in the firing rate. This pre- diction is in contrast to the model of (7) where WM is mediated by propagating calcium wave- 0 0.5 1 0.5 1 0 0.5 1 0.5 1 0 80 160 0 80 160 # ce ll 0 3 4 x u time [s] x u x u x u A B # ce ll 1 2 Fig. 3. Robustness to noise and two-item memory. The first item is loaded into memory at t = 0 (dark shading). The second item is loaded into memory at t = 2.7 s. Teal shading indicates a random nonspecific input to 15% of the excitatory neurons. (A) Periodic sequence of nonspecific external inputs is used to refresh the memory (gray shading). (B) Persistent PSs. Dots, rasters of 10% of the first (0 to 79) and second (80 to 159) populations' neurons; red and blue curves, same as in Fig. 2. www.sciencemag.org SCIENCE VOL 319 14 MARCH 2008 1545 REPORTS t0 t1 Figure 1.10: STM via facilitation by Mongillo et al. (2008). A: Network arrangement, with excitatory neurons divided into groups and stronger connections between neurons in the same group. B: Network behavior, showing spikes from a subset of neurons from two groups (1-80 and 81-160). Grey bars show periods of extra background noise, red line (x) records average depression in each group, blue line (u) average facilitation. Activity at t0 and t1 show external imprinting of memories. When one memory has been imprinted (t0 < t < t1), background noise causes retrieval of that memory. After a second memory is added later (t > t1), noise prompts retrieval of one memory at a time. 31 then spontaneously reactivate in the presence of background noise (Figure 1.10B) due to these stronger connections. They also investigated how such a network might store two memories. This work demonstrates that facilitation can indeed store infor- mation in networks in an ‘activity silent’ manner; the network proposed, however, requires pre-wired groups to store each potential memory. 1.6 The Auditory System Chapter 4 of this dissertation considers how the auditory system processes incom- ing sounds. In this section, we cover some background about how information is encoded in sounds, the steps involved in extracting that information, and how this information is represented within the auditory cortex. 1.6.1 Sound generation In order to understand how sound can be analyzed, it is instructive to consider its generation. Many sounds are produced by acoustic oscillators, such as a vibrating vocal chord or a plucked guitar string. These produce sound waves which are more or less periodic. These periodic waves can be decomposed into a harmonic stack, i.e. a sum of sinusoids with frequencies that are integer multiples of some fundamental frequency. Not all harmonics are represented equally; sound generation may lead to emphasis on certain harmonics (e.g. reed instruments typically contain only odd- numbered harmonics), or emphasis at certain frequencies (known as formants; “ee” vowel contains energy peaks at 280 Hz, 2300 Hz and 2900 Hz). This is illustrated 32 Figure 1.11: Harmonics and formants. A: The nature of the acoustic oscillator influences which harmonics are be generated (black bars), a key component in the timbre of a sound. B: The shape of the vocal tract dictates which ranges of frequency are accented or muted. Peaks in energy are known as formants, and the nature of a vowel is highly dependent upon their locations. Unvoiced speech maintains the general energy distribution but loses harmonics (i.e. grey shading under curves without the black bars). in figure 1.11. Not all sounds, however, are generated from continuous vibrations in a medium of some kind. Examples of such non-harmonic sounds include several consonants and unvoiced speech (whispering). In my dissertation, however, I concentrate on the processing of harmonic stimuli. 1.6.2 Detection and Representation of Sounds Next we consider how the auditory system detects and responds to auditory stimuli. This process starts at the cochlea which detects the amount of energy present at various frequencies (20 Hz to 20 kHz). The phase of the components is also recorded for low-frequency waves (under 3 kHz); this information is used in sound localiza- 33 tion using inter-aural time difference. This information is then transferred to the primary auditory cortex via the auditory nerve. Individual neurons in the primary auditory cortex respond to energy at a certain frequency, and these neurons are arranged tonotopically according to this frequency. These neurons are also known to be sensitive to various other, higher order features. These include the onset and offset properties of the sound (Qin et al., 2007), the distance between harmonics (Bendor and Wang, 2005) (known as ‘spectral pitch’), gradual shifts in spectral composition (Theunissen and Elie, 2014), apparent location of the source (Oswald et al., 1999; Rose et al., 1966), and several other factors. How these various elements are combined to allow for a coherent auditory percept is not well understood. 1.6.3 Auditory Tasks 1.6.3.1 Identification and Localization Perhaps the most obvious reason to process incoming sound waves is to determine information about the process which generated the sound. Such information may be deliberately encoded in the sound (e.g. mating calls including the pop hit “Gangnam Style”) or incidental (a predator’s footsteps, a falling branch). In order to consider the various ways in which information is encoded in sound, it is enlightening to consider the language that has developed to describe sounds (Table 1.1). We note that the term ‘timbre’ is generally used to refer to the combined effect of several 34 effects such as harmonics, coloration, and temporal envelope. 1.6.3.2 Auditory Streaming The auditory landscape typically includes sounds from multiple different sources; for example, the sound at a party may be a mix of several individual voices along with music, traffic, and several other background noises. The ease with which both humans and animals may separate this scene into its constituent parts belies the difficulties within such a calculation. We refer to this process as auditory stream- ing, that is the separation of an input signal into its constituent streams. Streaming is highly linked to, but slightly different from, the so-called ‘cocktail party effect’ (Cherry, 1953), which is concerned with the separation and amplification of one particular stream from an input composed of several sources. The precise definition of an auditory stream (or of its constituent auditory objects) remains much debated (Bizley and Cohen, 2013). However, an important point of agreement is that they are perceptual; streaming describes how the auditory system interprets the input signal. We might hope that each auditory object represents an auditory event, or each auditory stream represents an auditory source, but this is due to the functioning of the cortex as opposed to being inherent in the definition. The factors which impact the division of auditory input into streams include: • Frequency differences (Miller and Heise, 1950). Alternating between two 35 Name Cause Volume Amplitude Tremolo Oscillations in amplitude Pitch Fundamental frequency Vibrato Oscillations in fundamental frequency ‘Clean’ sound Fraction of energy in harmonics ‘In tune’ Simple ratios between harmonics present (e.g. 3:2 = ‘Perfect Fifth’) Rhythym Long scale periodicity(>100ms) Coloration/Formant (e.g. different vowels) Spectral envelope Coloration/Formant glide (e.g. dipthong) Changes in spectral envelope Attack/Decay/Sustain/Release Temporal envelope Where (left/right) Interaural time difference Where (left/right) Interaural intensity difference Where (all) Filter resonance from body and outer ear Where (all) Dynamic cues (deliberate head/ear movements) Table 1.1: Examples of perceived features, and their underlying causes (Erickson, 1975; Haykin and Chen, 2005) 36 tones (a trill) is perceived as two separate streams if the two notes are suffi- ciently different in frequency • Temporal similarity (Schnupp et al., 2011). Minimizing the time differ- ence between tones increases the probability they will be perceived as a single stream • Timbre (Wessel, 1979). Notes with a similar timbre are associated into a single stream • Spectral Continuity (Darwin and Bethell-Fox, 1977) Alternating vowels may be perceived as one or two streams depending on the exclusion or in- clusion of intervening glides • Spatial Location (Deutsch, 1979). One melody played from multiple places will not be perceived as a single stream • Pitch (Brokx and Nooteboom, 1982). When differences in pitch are removed from multiple voices (both voices are modified to be monotone at the same pitch), it is increasingly difficult to separate voices • Attention (Carlyon et al., 2001). Attending to competing tasks can prevent stream segregation • Fine spectral-temporal features (Ding et al., 2014; Qin and Oxenham, 2003). Removing fine spectral-temporal structure does not affect comprehen- sion of a lone speaker, but makes it far more difficult to attend to a speaker if another distraction speaker is present. 37 See Darwin (1997) for further discussion on the factors influencing streaming. As streams are perceptual, the streaming experiments typically ask the subject to react depending on what was perceived. Nevertheless, stream segregation must oc- cur relatively early in cortical processing. For example, subjects have great difficulty determining the relative timing of events when they are perceived to occur within different streams (Bregman and Campbell, 1971). 1.7 Models of Auditory Streaming Chapter 4 of this dissertation presents a novel, neural network approach to explain the process of auditory streaming. Consequently, in this section I will provide a review of different mechanisms which might be involved in this process. I do so using the three key aspects of streaming as proposed by Bregman (1990): segmentation, segregation, and integration. I will then review some neural network models which have been suggested to explain auditory streaming in cortex. 1.7.1 Segmentation Segmentation involves examining the instantaneous auditory information repre- sented in the waveform, and deciding which parts should be combined together into distinct auditory objects (Figure 1.12). This will involve analyzing the sound presented along multiple different dimensions (Table 1.1). It is also known that these features are indeed represented by the neural code (Section 1.6.2), and that 38 Figure 1.12: Auditory segmentation. Snapshot of frequencies over time may allow the sound to be separated into distinct segments using multiple features such as periodicity over frequencies (pitch), width of individual bands (warmth) and shape over multiple bands (color). Possible clustering into segments shown by colored circles above the plot. different parts of the auditory stimulus are more likely to be perceived as a single object if they match in several respectes (Section 1.6.3.2). This makes intuitive sense; sounds generated from one object typically share multiple features, and so forming streams with multiple shared features allows the system to label features from the different sources. Such a decomposition of auditory stimuli according to clusters of shared features is popular in the CASA literature (Brown and Cooke, 1994; Elhilali and Shamma, 2008; Wang, 1996). 1.7.2 Segregation Segregation describes the network’s ability to simultaneously represent the multi- ple objects detected through segmentation. This becomes particularly challenging when we consider that each incoming sound is decomposed into multiple different 39 features; somehow, the cortex must store a label for each feature to know to which auditory object it has been allocated. This challenge is also known as the binding problem, i.e. how the cortex tracks which features are currently ‘bound’ into a single perceptual object (Roskies, 1999; Von der Malsburg, 1981). One of the most widely discussed solutions to auditory segregation is correlation the- ory, which suggests that neurons which represent different features of each object will typically fire synchronously with one another. This synchronicity is thought to be related to the various cortical oscillations occurring within biological neural net- works; it is hypothesized that subpopulations of neurons firing in phase during such oscillations are ‘bound’ together. Evidence for binding via synchronicity has been most thoroughly investigated in the visual cortex, in which synchronous populations of neurons have been found to be associated with the same object (Fries et al., 1997; Singer and Gray, 1995). Such synchronous populations of neurons have also been observed in several different parts of the sensory cortex (Engel and Singer, 2001), including the auditory cortex (Ehret, 1997), although the role that these might play in object segregation is highly contested (Roskies, 1999). 1.7.3 Integration The third step in auditory streaming is integration; that is, joining together the instantaneous segments into a single, continuous, perceptual stream. Experimental evidence suggest that this process is the last to happen, after both segmentation and 40 segregation (Sussman, 2005). Several different mechanisms have been hypothesized to be involved in this auditory integration. One possible mechanism for integration is by assuming a continuous neural repre- sentation of each stream. This is due to the fact that many features that come from one source of sound only evolve slowly. For example, even while the format structure and pitch of a voice may shift quickly during conversation, the timbre and location remain mostly constant. Psychoacoustic experiments have indeed con- firmed that the continuity within the source will influence the number of streams perceived (Darwin and Bethell-Fox, 1977)(Figure 1.13A). Algorithms have been proposed which form streams by maximizing continuity of fea- tures within each stream. For example, Elhilali and Shamma (2008) (Figure 1.13B) suggested an algorithm using the concept of a Kalman filter (Kalman, 1960). This algorithm is cyclical; incoming sounds are segmented according to the perceived nature of the streams, and the streams are then updated using these segments. One significant advantage of this process is that calculations may be done online, with new segments being instantly joined to those streams which they bore the greatest similarity. An alternative approach to integration is to use a property known as temporal co- herence. When a source generates a sound, it may output multiple components (e.g. harmonics). However, the various components of the sound are all generated by the 41 Figure 1.13: Continuous features perceived as a stream. A. Continuous deformations to a sound’s features (top) are more easily bound into a single stream than ’jumps’ of the same size (bottom). B. Algorithm presented in Elhilali and Shamma (2008) for joining segments into streams by maximizing continuity between successive segments. Based on Kalman filter. same underlying cause, and so are likely to share some temporal envelope: that is, they start at the same time, undergo similar and simultaneous changes in strength, and finish together. This idea underlies integration via temporal coherence; fea- tures which have similar temporal envelopes most likely were generated by the same process. Each stream is decomposed into its component features and a temporal envelope. Moreover, these properties of sound have been shown to influence the perception of streams (Teki et al., 2013). Several models have been described which derive integration via temporal coher- ence type. Variants have used Independent Subspace Analysis (Casey and Westner, 2000), non-negative matrix factorization (Smaragdis, 2004) and autoencoders (Kr- ishnan et al., 2014). One issue with these modeling approaches is in identifying the 42 features; many models simply assume that the features are the inputs, so that all the features must be stationary across time. Unfortunately, features are typically nonstationary. For example, consider the changes to a harmonic’s location as the underlying pitch changes. Smaragdis (2004) suggested a solution that would allow features to be defined over time; the proposed algorithm could then detect mov- ing features, as long as they all move in exactly the same manner. An alternative approach was suggested by Krishnan et al. (2014), who constructed a series of cor- relation matrices over a variety of frequency and temporal filters. These correlation matrices are then decomposed using an autoencoder. In this way, objects which are sufficiently close in frequency and time are associated together. Such an algorithm reproduced some experimental findings concerning streaming, however it is unlikely that such a calculation could be performed by a neural network. 1.7.4 Neural Networks for Auditory Streaming We finish this section by reviewing two neural networks which have been proposed to solve some of the above challenges associated with auditory streaming. 1.7.4.1 “A Neural Cocktail-Party Processor” Shortly after the paper introducing the binding problem and correlation theory (Von der Malsburg, 1981), Von der Malsburg and Schneider (1986) proposed a neu- ral network that could separate auditory stimuli into distinct streams. The network proposed was intended as a sketch; consequently, it does not discuss particular cel- 43 lular mechanisms or timescales. The basic units in the model are neural oscillators; when connected to a persistent input (all inputs being binary), each oscillator au- tomatically alternates between periods of activity and inactivity. In this model, segmentation is performed using transient associative plasticity; if two inputs are present at a similar time, the connection between them increases in strength, meaning they are more likely to be co-active in the future. Segregation then occurs in keeping with correlation theory due to these strengthened connec- tions — paired units are observed to start oscillating in in time with one another. The network also contains inhibition-mediated competition, ensuring that multiple perceptual objects will oscillate out of phase with one another. In this model, the auditory sources emit a constant sound, so there is no need for integration to oc- cur. This model demonstrates how a network of neural oscillators might achieve segregation using learned correlations. 1.7.4.2 Local Excitatory Global Inhibitory Oscillator Network (LE- GION) The concept of streaming occurring through a network of neural oscillators was ex- tended by the Local Excitatory Global Inhibitory Oscillator Network. This model was first proposed as a solution to the binding problem in 1995 (Wang and Ter- man, 1995), and has since been investigated in a series of papers related to auditory streaming (Brown and Wang, 1997; Shao and Wang, 2009; Terman and Wang, 1995; 44 Wang, 1996; Wang and Chang, 2008; Wang and Brown, 1999; Wrigley and Brown, 2004). As with the Neural Cocktail-Party Processor, the basic units are neural os- cillators (highly reminiscent of the van der Pol oscillator) which undergo alternating periods of activity and inactivity. These units each represent one particular fre- quency, possibly also with a certain time lag. Recurrent connections exist between ‘close’ units, that is those units which represent similar frequencies (and possibly also similar time lags) (Figure 1.14). In this case, segmentation is performed due to the pre-established recurrent excita- tory connections: the units which are connected are more likely to oscillate in time. The key difference from von der Malsburg’s work above is that these associations are pre-encoded, rather than learned by temporal coherence. This allows for some memory in the network that informs what features should be ‘bound’. Segregation then occurs due to the oscillating nature of the network, in keeping with correlation theory. Finally, integration with the inclusion of the lagged inputs: if two features are close in frequency space, but are not present at the same time, they may still be associated together using a time-lagged version of the first feature. This model demonstrates how a recurrent neural network might achieve the segmentation, seg- regation, and integration hypothesized to underlie auditory streaming. 45 Excitatory Activity (x) In hi bi to ry A ct iv ity (y ) Low High Lo w H ig h y˙ = α(1 − tanh(βx)) − y x˙ = 3x − x3 + 2 − y + IA B Figure 1.14: The Local Excitatory Global Inhibitory Oscillator Network. A: Oscillator alternates between periods of high and low excitatory periods, with fast transitions between the states. B: Network structure, with local excitatory connections across frequency and time 46 Chapter 2: Cortical Computations via Transient Attractors 2.1 Overview The ability of sensory networks to transiently store information on the scale of sec- onds can confer many advantages in processing time-varying stimuli. How a network could store information on such intermediate time scales, between typical neuro- physiological time scales and those of long-term memory, is typically attributed to persistent neural activity. In this chapter, I will describe an alternative, a Transient Attractor network, supported by synaptic plasticity between individual connectivity that decays on the same second-long time scale. I hypothesize that such functional- ity could be usefully embedded within sensory cortex, and confer not only aspects of short-term memory, but also automatically de-noising inputs, and separating them into distinct perceptual objects. From the perspective of short-term memory, this network can learn any pattern presented to it, store several simultaneously, and robustly recall them on demand using targeted probes in a manner reminiscent of Hopfield networks. The stored information can be refreshed to extend storage time, is not sensitive to noise in the system, and can be turned on or off by simple neuromodulation. Moreover, I will show how the same mechanisms may assist in the processing of sensory information on the seconds-long time scales. The diverse 47 capabilities of transient attractors, as well as their resemblance to many features observed in sensory cortex, suggest the possibility that their actions might underlie neural processing in many sensory areas. 2.2 Introduction The real world “causes” of sensory inputs usually persist for much longer than the time scales of neural processing in sensory areas. As a result, there is great use for neural and circuit mechanisms within sensory cortex that can hold information over time scales much longer than neural time scales, on the order of seconds. Storage of information on this time scale is commonly addressed in the context of “short-term memory” (Maex and Steuber, 2009), but there are also other uses for seconds-long storage of information. For example, such aggregation of information over time can be used to segregate auditory stimuli into perceptual auditory objects (Krishnan et al., 2014). Similarly, features of visual objects can be assembled over time using such associations despite temporary occlusions and visual noise (Becker, 1992). The most common models of short-term memory rely on the concept of a “persistent attractor” (Barak and Tsodyks, 2014; Seung, 1996). A fixed set of recurrent connec- tions can support a number of different patterns of activity (“attractors”), whereby the activity patterns are self-reinforcing due to excitatory connectivity. Persistent activity is typically maintained by a combination of excitatory and inhibitory activ- ity (Aksay et al., 2007; Machens et al., 2005), and persistent states can even exist 48 in random networks with particular properties (Ganguli et al., 2008). The unifying feature of persistent attractors networks is that information is stored in neural ac- tivity itself, thus keeping it readily accessible. Although there is some evidence of information storage in persistent attractors in specialized brain regions such as the pre-frontal cortex (D’Esposito and Postle, 2015), it is unlikely to explain short-term memory in other areas such as the sensory cortices (Pasternak and Greenlee, 2005; Petrides, 2000; Postle, 2015), and perhaps not even explain all information storage within the specialized regions (Barak and Tsodyks, 2014; Sreenivasan et al., 2014; Stokes, 2015). An alternative location for the storage of information about recent inputs is in the local connectivity within the network itself. Indeed, such memory storage is implicit in models of long-term memory (Hopfield, 1982), where memories are encoded in the excitatory connectiv- ity which is established using a simple form of associative plasticity. Such a scheme could also be used for short-term memory if such changes in synaptic connectivity were temporary, which would allow for the short-term preservation of information within the network without affecting the network’s long-term structure (Buhmann and Schulten, 1986; Mongillo et al., 2008; Sandberg et al., 2003; Szatma´ry and Izhikevich, 2010). This allows the network to return to a certain state of activity (an attractor) in the presence of appropriate inputs (Schneegans and Scho¨ner, 2008). I label such attractors “transient” as they only exist during appropriate input and due to relevant changes to network connectivity (which are themselves temporary). 49 In this chapter, I will propose transient attractors as a unifying mechanism within cortical networks that can support multiple types of computation that require com- bining information across time scales much longer than those of the underlying neurons. I will first demonstrate how a transient attractor functions in the context of a classic short-term memory task. Several memories can be stored in the net- work structure, allowing for their recall in the presence of suitable inputs. These memories then fade over several seconds. The same network can be used to extract information from time varying stimuli, specifically in the tasks of stream segregation and signal de-noising. I finish by considering some issues that impact the various uses of transient attractors, including transient attractor maintenance, the effect of top-down attention and the overall robustness of the network. 2.3 Results I considered a simple form of a transient attractor network, composed of a single layer of excitatory and inhibitory neurons (Figure 2.1A), and in which neural activ- ity is represented with firing rates governed by leaky integrate-and-fire dynamics. Excitatory neurons receive feedforward inputs representing the stimulus, as well as recurrent connections. For this simple network, only a single inhibitory unit is required. This neuron receives inputs from, and projects back to, the excitatory neurons and itself. This inhibitory circuit mediates competition between ensembles of excitatory neurons to represent the input. This minimal circuit can produce the behavior described below, but can both be expanded into a larger network, as well 50 as be produced by many other variants (see Discussion). Short-term memory in this model is supported via plastic recurrent excitatory con- nections. The total synaptic strength of connections between each pair of excitatory neurons is the product of three terms: a baseline synaptic weight Sij between neu- rons i and j, an associative (Hebbian) gain Hij that can be increased by coincident pre- and postsynaptic activity, and a synaptic depression xi, governed by the equa- tions Wij(t) = SijHij(t)xi(t) (2.1) dHij(t) dt = 1 τH+ [Hmax −Hij(t)] yi(t)yj(t)− 1 τH− [Hij(t)−Hmin] (2.2) dxi(t) dt = 1 τx+ [1− xi(t)]− 1 τx− [xi(t)yi(t)] . (2.3) I assume that all such recurrent connections have a uniform baseline strength (Sij = S0), and connections to and from the inhibitory neuron will also be assumed to be uniform. As I will describe below, this gives the network the maximum potential for memory storage, but alternatives will be considered below. Excitation is regulated (and stable) due to two mechanisms: feedback inhibition and synaptic depression. The inhibitory circuit suppresses all neurons by an amount pro- portional to the total excitatory activity, resulting in competition between the ex- citatory neurons. Synaptic depression controls recurrent excitation within a neuron group, limiting the dominance that persistent attractors can have so that activity is more directly influenced by current stimuli (as filtered by recent history). Further 51 details about the model can be found in the Methods. In the presence of a constant external input, firing rates in the network will settle into some pattern (an attractor) that depends on both the external input and the state of the network. Note that this definition of an attractor is broader than that used in much of the persistent attractor literature, which only considers attractors that remain active when external input is removed. The dominant attractor at any moment will depend on both the stimulus present and the effective synaptic strengths, which in turn depend on recent history of network activity through the associative gain term (Hij). 2.3.1 Short-Term Memory via Transient Attractors I start by illustrating how the transient attractor network works within a minimal network with just four excitatory neurons (Figure 2.1A). I selected two patterns to store: the first with neurons #1 and #3 coactive, and the second with neurons #2 and #4 coactive (Figure 2.1B). Before the memory is stored, I presented “probe” stimuli, each driving a single neuron (Figure 2.1C, left) in order to verify there is no preexisting network attractors. Indeed, such probe stimuli only evoke activity in the neurons that were externally stimulated (Figure 2.1D, left). To imprint the memory, the two patterns are displayed alternately at 4 Hz for 1 sec (Figure 2.1C, center). Following this, both probe stimuli are displayed again (Figure 2.1C, right) 52 to determine if the memories are recalled in the network activity. Indeed, while only the stimulated neurons fire in response to the probe stimuli at the beginning (Figure 2.1D, left), the patterns emerge after training (right). During the training period the memory is imprinted in the increased recurrent weights between co-active neurons over repeated presentations (Figure 2.1E). These strengthened connections then lead to increases in membrane voltages in response to even a part of a pattern in which it has been involved (Figure 2.1F). This in turn causes an increase in inhibitory firing rates proportional to the additional excitatory activity (Figure 2.1G), and an increase in suppression of the non-paired neurons. These inhibitory effects are small relative to excitation due to the size of the pattern being stored in this small network, and in general will have larger effects that create competition between simultaneously active neurons. I next extended this simple example to a much larger network, capable of learning multiple, overlapping patterns. This network has 100 excitatory neurons, arranged in a 10 by 10 grid. Note that the grid arrangement is only to make visualizing the patterns of activity easier, and it does not represent any biases in connectiv- ity; again, all excitatory connections are all-to-all, and of equal strength. I trained this network with three patterns, two digits (to be easily recognizable) and a third composed of randomly selected neurons. This set of patterns illustrates how any pattern can be stored in the network, but also note that the two digits chosen have a large number of shared elements. Random subsets of each pattern are selected 53 Figure 2.1: Transient attractors in single layer network via associative weight modifications. A. Diagram of a minimal transient attractor network composed of four excitatory neurons and one inhibitory neuron. Recurrent con- nections between excitatory neurons are plastic, and all connections to/from the inhibitory neuron are equal. B. Top: When presented with inputs to neurons 1 and 3, recurrent connections between simultaneously active neurons are strengthened through transient associative plasticity. Bottom: A second stimulus (inputs 2 and 4) is stored through the strengthening of recurrent connections. (C-G) Two stim- ulus patterns are stored and recalled by the minimal transient attractor network in (A). C. The inputs to the network, composed of: a probe stimulus activating sin- gle neurons (left), followed by two repetitions of two different two-neuron patterns (shown in B), followed by the same probe stimulus (right). D. Activity of excitatory neurons in response to stimulus. E. The synaptic gains Hij strengthen as a result of the coincident neural activity, shown for a representative sample of connections. F. Membrane potential of neuron #3. Initially, both probes cause some inhibition, while after training the in-pattern probe causes elevated potential (firing), while other probe causes increased inhibition. G. Inhibitory neuron’s firing rate, which increases when any pattern is presented. Resulting competition between patterns suppress one pattern in favor of the other when final probe stimuli are presented (right). 54 Figure 2.2: Transient attractors can simultaneously store several patterns. A network of 100 excitatory neurons stores and recalls three different patterns. A. Input to the network: Center : three patterns are each presented twice during training. Left/Right : Probe stimuli before and after training, each probe random subset (25%) of relevant pattern. B. Neural activity (firing rate represented in grayscale) only recapitulates the probe inputs before (left), but separately recalls each pattern (right) after training. as probe stimuli, and the network is tested to have no pre-existing attractors, and trained as described above (Figure 2.2A). The successful storage of the memories in the network can be verified by comparing the levels of activity of the excitatory neurons to the initial and final probes (Figure 2.2B). This shows that an attractor has been created for each pattern. Furthermore, due to the inhibition-mediated competition, activity does not leak between overlapping attractors, and the stored information is recalled in the presence of a relevant probe. This demonstrates that this network is capable of performing short-term memory tasks involving multiple (potentially overlapping) memories held simultaneously. As with Hopfield networks, the memory capacity of this network (i.e., the number of patterns that can be stored simultaneously in memory) increases with the number of neurons (Amit et al., 1985), but in practice such a capacity cannot realistically be used due to the limitation of 55 the transient time scale over which the trained patterns of connectivity maintain themselves. Stored short-term memories in this network have an additional attractive property in contrast to persistent-activity-based attractors: namely that they are stable while being stored. Such stability can be demonstrated in an example network where there is a clear topography between different activity states of the network. Thus, I next consider a ring attractor (Ben-Yishai et al., 1995). A ring attractor is composed of a circle of neurons, with each neuron preferentially connected to its neighbors (Figure 2.3A). In principle, ring attractors based on persistent activity can store a continu- ous variable because activity at any point on the ring can be stable. However, it has been shown that any noise in recurrent connections will cause a severe reduction in the number of stable equilibriums: typically down to a handful (Itskov et al., 2011). In practice, this means that the system will always drift to one of the relatively few global attractors (Figure 2.3B). Transient attractors avoid this drift by having the network inactive in-between train- ing and read-out (Figure 2.3C), meaning that the memory cannot drift. This obser- vation complements earlier work (Itskov et al., 2011) showing plastic synapses will reduce the rate of drift in the case of persistent activity (Figure 2.3D). Furthermore, analogous to the more general network considered above (Figure 2.2), this network is capable of storing multiple locations simultaneously (Figure 2.3E), each re-activated by their own probe. This demonstrates how storing information in modified synap- 56 Figure 2.3: Short-term memory in a ring attractor. A. The network structure of a ring attractor, showing a section of the full ring. Each excitatory neuron (blue) connects to its nearest neighbors (with a Gaussian weight profile, see inset) and to a single inhibitory unit (red). B. After initializing a standard ring attractor to a single input (left) and then giving weak uniform input to keep the network active (gray), the position of the peak activity (right) drifts, often converging to one of several attractors. The trajectories of peak activity for several different initial inputs are shown. C. A ring attractor with transient synaptic plasticity allows for information storage in transient attractors at any single location. In this case, once information is first stored (ten initial inputs shown, left), there is no activity in the network (right). However, when probed with nonspecific uniform input, the network recalls the initialization with no drift. D. Transient attractors stabilize activity in case of persistent activity (ten initializations as before). The transient departure from the initialization is due to synaptic depression, but stabilized after the synapses recover. E. Transient attractors also allow for simultaneous storage of multiple locations. Here, two positions are stored, and recalled with more specific but still broad activation. 57 tic connections, as opposed to persistent activity, prevents slow distortion of the information by small errors within the network (in this case, attractor drift). 2.3.2 Maintenance of Information over Time By design, information stored in transient attractors degrades at the time scale of the underlying transient synaptic plasticity. While this would appear to limit the amount of time a memory can be stored by the transient attractor, such a net- work can extend to storage over longer periods of time through reactivation of the attractor (Mongillo et al., 2008). Such reactivation will strengthen all relevant con- nections, and thereby allow information to be stored for durations well past the time scales of the decay of the transient synaptic plasticity. To demonstrate how the transient attractor is capable of this, I first stored two overlapping patterns (Figure 2.4A, left). Without any further activity, the infor- mation stored will become inaccessible over several seconds due to the timescale of decay of the induced synaptic plasticity. However, here the stored information is refreshed by regular reactivation of the attractors via pulsing background activ- ity (Figure 2.4A, center). Background stimulation causing the refresh need not be specific to any stored pattern; in this example, background stimulation is uni- form across all channels, but as a result momentarily activates individual attractors within the network. Furthermore, the pulsing nature allows for sequential activation of multiple attractors due to the synaptic depression of synapses which were most 58 recently activated. The pulsing uniform activity is not the only conceivable method of refreshing memories; for example, specific memories might be targeted using an appropriate probe. As a result of this attractor reactivation, it can be seen that the duration of the memories has been extended (Figure 2.4A, right and Figure 2.4B). This demonstrates how transient attractors could store information over variable time scales. 2.4 Associating Distinct Patterns of Input via Temporal Coherence For the above examples of memory, stimuli were presented separately in time in order to focus on the storage and retrieval of patterns. However, real world stimuli will often not be so conveniently separated in time, with different components that can only be distinguished by detecting shared temporal features. Such a theory of temporal coherence has been suggested as a solution for the cocktail party problem, that is the ability to associate the features comprising different sounds and focus on those components while suppressing others (Bizley and Cohen, 2013; Shamma et al., 2013). Temporal coherence has likewise been used for visual object separation (Becker, 1992). The network described above can perform a simple example of such segregation based on temporal coherence. I generated a training stimuli composed of two ran- dom, non-overlapping patterns of activation, modulated by uncorrelated temporal envelopes (Figure 2.5A). As with earlier examples, probes are displayed before and 59 Figure 2.4: Maintenance of transient attractor by uniform input. A. The transient attractor network from Fig. 2 is initially trained with two patterns, and probed to test for memory storage almost 4 seconds later, well longer than the decay time of the transient plasticity. The intermediate period is either silent or consists of pulsing low intensity uniform network inputs (inset labeled Maintenance). B. The excitatory response to the probe stimuli either without (left) or with (right) the maintenance activity reveals how continued activity allows the network to maintain transient attractors and extends duration of information. Figure 2.5: Network can distinguish between patterns using temporal co- herence. A. The same 100 neuron transient attractor network from Figure 2 is trained on two patterns that are presented simultaneously, although with distinct temporally varying amplitudes (red, blue), with probe stimuli before and after, as with previous figures. B. The excitatory activation to the probe stimuli before (left) and after (right) reveal the ability to recall the two separate patterns, despite not being presented separately. 60 after exposure to patterns to demonstrate the creation of transient attractors. While both patterns were present at some amplitude throughout the training period, the network responses to the probes (Figure 2.5B) following training reveal that the network has learned both patterns. This happens due to the inhibitory feedback which prevents both patterns from being represented simultaneously. As patterns in the network aren’t represented simultaneously (even if both are present in the stimulus), they are essentially temporally segregated within the network allowing associations to be learned. I conclude that the transient attractor network is capable of performing some form of on-line temporal coherence analysis, and completes a simple streaming task. 2.4.1 Separating Signal from Noise Just as networks with persistent activity may act as neural integrators (Shen, 1989), the transient attractor network may also act as an integrator, allowing it to filter out noise and store an uncorrupted version of the signal. This works because changes to network connectivity add on time scales less than that of the decay. I now demonstrate this ability with an example where the signal corruption is due to both occlusion (part of pattern temporarily absent) and uniform noise (additional spu- rious inputs). I constructed a stimulus composed of two parts, signal and noise (Figure 2.6A). Different partially occluded versions of the pattern are presented briefly. Noise is also introduced, with other inputs randomly active such that the average firing rate is constant across all inputs. 61 In the context of such stimulation, it is not possible to distinguish between signal and noise by examining either any individual channel over all time, or all channels together at one individual point in time. However, because the plasticity integrates over all temporal associations on the second-long time scale, the noise ends up contributing much less to the connectivity compared with the more consistent signal over this time scale, resulting in an attractor dominated by the combinations of associates presented. By the end of training, presentation of a part of the pattern will activate a transient attractor corresponding to the entire pattern (Figure 2.6B), both filtering out the noise and filling in the majority of the occluded channels. 2.4.2 Modeling Attention and the Role of Inhibition The transient attractor network also has the ability to turn its function on or off through straightforward modulation of inhibition. When the overall strength of in- hibition is increased, recurrent activation of attractors will be suppressed such that the network will have no attractors (beyond faithfully relaying the stimulus). To demonstrate this, I take the network described in Figure 2.2, and rerun the simula- tions with the level of inhibition increased by doubling the strength of all inhibitory synapses. Although exposure to patterns still leads to synaptic strengthening, such changes are insufficient to create a stable attractor, and the final probe no longer leads to pattern recall (Figure 2.7). Such basic modulation coincides with observa- tions of the requirement of attention or engagement for the storage of short-term 62 Figure 2.6: Transient attractors able to de-noise and fill in occluded inputs. A. The same transient attractor network in Fig. 2 is presented with input generated from partially occluded signal pattern (top left). The inputs corresponding to the pattern are labeled in green to the left of the vertical axis. However, the full pattern is never presented to the network, and instead 25% occluded signals (examples bottom left and bottom right) are presented at 0.1 sec intervals. In addition to signal, noise (blue) is presented in all non-signal inputs over the entire interval (e.g., bottom middle). Quantity of noise set so that each input has the same level of activity. B. Network activity in response to stimulus across all time (top) and in response to the three example inputs (below). Initially the network responds to noise and signal indiscriminately, but over time correlations in input allow it to filter out noise and fill in the occluded parts of the signal. 63 memories (D’Esposito and Postle, 2015), as well as for changes associated with audi- tory streaming (Shamma et al., 2013), and is generally useful to selectively perform the various functions of a transient attractor network. 2.4.3 Model Robustness Stability is often a large concern in neural networks with recurrent excitation; a slight disturbance in the excitatory/inhibitory balance can either lead to runaway excitation or silence activity throughout the network. I tested how fine this bal- ance is in the model by changing the baseline synaptic strengths (i.e., S0 in eq. 2.1) of all neurons of a certain type, for example halving all feedback inhibition, and determining if the network continues to successfully store and recall patterns. Each individual parameter could be varied by at least 25% in either direction (Fig- ure 2.8A), showing the model to be highly resilient to the average sizes of synaptic strengths. I attribute this stability to the close link between inhibition and excita- tion, as inhibition adjusts in line with general excitation, and the effect of saturating firing rates within the model. Furthermore, while I have thus far assumed homogeneity in connections within the transient attractor network in order to maximize the number of patterns that could be robustly stored, the underlying functionality of the network is robust to signifi- cant variations of this homogeneity. To demonstrate this, I randomly removed a per- centage of recurrent connections while keeping total recurrent connection strength 64 Figure 2.7: Inhibition as proxy for attention. The transient attractor net- work from Fig. 2 with varying levels of inhibitory feedback. A. Original levels of inhibitory feedback lead to sufficient increases in membrane potential at second probe (left), and hence complete pattern recall (right). The membrane potential is plotted for red-circled cell. B. Doubling the feedback from inhibitory to excita- tory suppresses the membrane potential overall (left), and prevents memory recall (right), while still faithfully relaying the probe pattern. 2-2 WEE WEI WIE WII WSE +87% +41% +41% +104% >1000% -31% -51% -51% -32% -76% A 2-1 2 0 2 1 TPR B PPV/ / Connection Density (%)0 100 100 Pe rf or m an ce (% ) R el at iv e ch an ge Figure 2.8: Transient attractor model robustness to variations in param- eters and network homogeneity. Tests of parameter robustness are performed on the transient attractor network from Fig. 2, while storing two random non- overlapping patterns. A. Range over which each parameter may be varied without stopping successful storage/recall of information. Reported quantities in range are relative to default parameter values. B. The performance of the network in re- calling both memories when the homogeneity condition is violated by randomly re- moving some fraction of recurrent excitatory connections. Performance is measured as Positive Predictive Value (PPV) and True Positive Rate (TPR) (see Methods). Dashed/solid lines are median before/after training, shaded region lies between first and third quartile. 65 constant. It was found that the network still functions remarkably well at recall- ing any pattern for connection densities as low as 20% (Figure 2.8B). This result comes from the manner in which memories are stored as associations between many different pairs of neurons which is only perturbed when a large proportion of con- nections have been removed. I conclude that the model is not highly dependent on the assumption of uniform connectivity. The transient attractor model scales well to larger networks. In a larger network, the probability that two patterns significantly overlap significantly decreases (assuming either relative or absolute size of each pattern fixed), so memories are less likely to get confused. This is related to the fact that the memory capacity of a Hopfield network scales linearly with network size. Likewise, memories in larger networks are stored across multiple synapses, so that irregularities at any single synapse is less likely to cause issues. 2.5 Discussion In this chapter I have presented the transient attractor network, defined primarily by recurrent excitatory connections that are governed by an associative (Hebbian) plasticity that only lasts on the scale of seconds. I have demonstrated that such a network is capable of a wide range of useful behaviors, including short-term mem- ory (Figs. 2.1-2.3), source (or stream) segregation (Figure 2.4), signal de-noising (Figure 2.5), memory maintenance (Figure 2.6), and how these abilities can easily 66 be modulated by a simple top-down signal that modulates the level of inhibition (Figure 2.7). Furthermore, I demonstrated the robustness of the model with respect to both synapse strength and homogeneity (Figure 2.8). The concept that the same underlying network mechanism might have several uses in sensory computation is compelling in its simplicity. In fact, each of the tasks in Figures 2.2 and 2.4 - 2.7 was performed using the exact same network with the same parameters. Furthermore, while many of the above functions of transient attractor networks are demonstrated with these simplified networks, the networks size should actually make its desirable properties more robust. The mechanisms and network structure underlying transient attractors are known to exist in the cortex except, perhaps, for associative transient plasticity (Section 2.5.2). It does not depend on a set of stable attractors, or some finely prescribed structure. This allows it to be a candidate for short-term memory in a wide vari- ety of regions, such as the primary sensory cortex (Pasternak and Greenlee, 2005; Postle, 2015). This is in contrast with a large number of short-term memory mod- els which prescribe such tasks to particularly specialized regions of the brain. The broad applicability of short-term memory benefits from widely applicable mecha- nisms, perhaps working in tandem with more specialized regions. 67 2.5.1 Alternative Models for Short-Term Memory The classic model for short-term memory stores information in persistent attrac- tors (Barak and Tsodyks, 2014), that is through a self-sustaining state within the network. Once such an attractor is activated, activity will persist until externally stopped, while the identity of the persistent attractor stores the information. This self-sustenance is typically achieved in neural networks through different combina- tions of recurrent excitation (Goldman, 2009), inhibition (McDougal, 2011), or both (Aksay et al., 2007; Lim and Goldman, 2013; Machens et al., 2005). Of the many models of persistent attractors, an interesting subset makes use of synaptic modi- fications to the attractor to aid in the persistence of activity (Barak and Tsodyks, 2007; Itskov et al., 2011). The combination of persistent activity and underlying synaptic modifications does resemble the transient attractor network (Figure 2.3D), but nevertheless information storage in these networks relies on persistent activity. While some experimental results appear to support the idea of persistent activity underlying short-term memories (Courtney et al., 1997; Fuster and Alexander, 1971; Seung, 1996), a number of recent studies in different brain areas have cast doubt on this general conclusion (Barak and Tsodyks, 2014; Mongillo et al., 2008; Stokes, 2015). As a result, other models for short-term memory have been proposed, using pro- cesses such as cell assemblies (Lansner et al., 2003), non-stationary activity (Amit et al., 1997), cross-regional networks (Dubreuil and Brunel, 2016; Verduzco-Flores 68 et al., 2009), or purely feed-forward circuits (Goldman, 2009). These other ideas all rely on neural activity for information storage, and thus are still distinct from the idea of storing information in neural connectivity. The concept of storing information within temporarily strengthened synapses goes back at least 25 years; for example, Shen (Shen, 1989) demonstrated its potential use for neural integrators. More recently, it has been shown how synaptic dy- namics could be used to store short-term and temporarily inactive memories using either direct associative plasticity (Buhmann and Schulten, 1986; Sandberg et al., 2003; Szatma´ry and Izhikevich, 2010) or synaptic facilitation (Mongillo et al., 2008). However, in all of these models the scope of the memories was pre-defined by the structure of the network: Sandberg et al. essentially used a ring attractor which could store individual variables due to the ring structure, Szatmary/Izhikevich used randomly created periodic attractors, while Mongillo et al. facilitated pre-defined cell assemblies. My work builds on this by incorporating the ideas of Dynamic Field Theory (DFT)(Schneegans and Scho¨ner, 2008), that is considering how the attractor structure shifts in the presence of particular inputs. The most obvious ramification of this is the ability for targeted memory retrieval. More subtly, this expands the scope of memories that can be stored in the network; they no longer need to be stable attractors in their own right (as explored in the field of persistent attractors), or in the presence of background noise (in the three papers discussed above), but only during certain types of input. This in turn allows for novel points such as the completely flexible set of memories, the use of attractors in automatically processing 69 sensory inputs, their regulation by attention, or in their general robustness to noise. 2.5.2 Experimental Evidence for Transient Associative Synaptic Plas- ticity The transient attractor network above relies on an associative learning rule that de- cays on the order of seconds. There is scattered experimental evidence for transient associative effects (i.e., where strengthening of connectivity occurs between coactive neurons), which has been observed in ferret auditory cortex (Shamma et al., 2013), macaque PFC (Sugase-Miyamoto et al., 2008), and dissociated networks (Dranias et al., 2013). It is known that associative learning takes place over a variety of timescales due to multiple mechanisms (Kandel, 2014), including some direct asso- ciative connections which decay in minutes (Erickson et al., 2011; Malenka, 1991). It is conceivable such processes might exist for shorter timescales, but have proven difficult to separate from non-associative plasticity similar timescales (such as synap- tic facilitation and depression). Such associative plasticity also may be possible to achieve associative changes in effective coupling using non-associative facilitation within certain network structures; this is the subject of chapter 3. 2.5.3 Extensions of the Transient Attractor Network It is hypothesized that the pre-existing wiring of neural networks in the sensory cortices is informed by the structure of natural stimuli (Hebb, 1949), which is equiv- alent to non-uniform connectivity (Sij) in the transient attractor network. While 70 such non-uniformity would bias the network towards some attractors, this could be advantageous in the sensory cortices, as the location of transient attractors will be guided both by the immediate history and by the pre-learned nature of typical stimuli. When presented with a novel stimulus, the network’s interpretation may be biased by learned stimuli, which are presumably the stimuli that have proven the most useful (given rules of long-term plasticity). This coordination of short- and long-term plasticity is distinct from earlier work that stored short-term memories by strengthening some pre-existing attractors: in the transient attractor model, recent activity may change the nature (e.g. strengthen, make stable or shift) pre-existing attractors. This allows for much greater flexibility in memory storage; the number of possible transient attractors (as influenced by pre-learned patterns, recent history, and by the nature of the instantaneous input) is far larger than that of pre-existing attractors. 71 Methods Simulations All simulations were performed in MATLAB using the Euler method with ∆t = 0.1 ms. Neuron Model In all simulations I used a continuous firing rate model with cell voltage, vi(t), de- pending on the weighted sum of recurrent excitatory, inhibitory and input currents, dvi(t) dt = −τvi(t) + ∑ Excj Wijyj(t) + (E I rev − vi(t)) ∑ Inhj Wijyj(t) + In(t) In this formulation, the excitatory input currents are independent of the cell mem- brane potential. This approximation is valid so long as the excitatory reversal potential is far higher than the firing threshold, which is the case for cortical neu- rons. In contrast, this can not be assumed for the inhibitory currents because the inhibitory reversal current is typically close to the cell membrane potential at rest. Firing rates, yi(t), are then calculated as a saturated rectified linear function of the cell voltage, yi(t) = max[0, 1− exp(−a(vi(t)− b))]. Parameters Simulation parameters which remain constant across all simulations are listed in Table 2.1. 72 Weights between neurons depend on the network structures used in each Figure. For Figure 2.1: WSE = 5, WIE = 5, WII = 20, WEI = 10, WEE = 1 For Figure 2.3: WSE = 1, WIE = 10, WEI = 2, WEE = 1.5 For Figure 2.2, 2.4-9: WSE = 5, WIE = 5, WII = 20, WEI = 1, WEE = 0.1 Ring Model Network as portrayed in Figure 2.3. The recurrent excitatory baseline weights are Gaussian with mean = 1.5 (from Parameters) and standard deviation = 10. All weights are affected by a random multiplicative noise term, drawn from a normal distribution, µ = 1, σ = 0.05. Robustness analysis Results in Figure 2.8. Parameter changes: Each parameter changed individually until memory recall is no longer successful. Recall is successful if, during a relevant probe, the average firing rate within either pattern is five times greater than max- imum firing rate out of the pattern (including alternate pattern) and the average firing rate in each pattern is at least 0.1 (10% maximal firing rate). Sparsity sensitivity: Connection density tested over range [0.05:0.05:1], 100 trials for each value. Each trial has a new, randomly selected connection matrix with each connection set to 0 with probability = density, and all remaining weights scaled uniformly to ensure total recurrent excitation remains constant. Each pattern con- tains 20 randomly selected excitatory neurons while each prompt is a subset of 5 73 Name Symbol Value Max Hebbian Hmax 5 Min Hebbian Hmin 1 Hebbian increase τH+ 100ms Hebbian decrease τH− 2000ms Depression increase τx+ 50 ms Depression decrease τx− 100 ms Leak current τ 0.5 ms−1 Firing scale a 1 Firing threshold b 1 Inhibitory reversal potential EIrev -1 Table 2.1: Transient Attractor Network Parameters Above parameters were used in all simulations in the chapter. from each pattern. An excitatory neuron is considered active if its average firing rate is over 0.1 while probe displayed. Analysis does not consider probe neurons in any category. PPV = number active within pattern / total number active, TPR = number active within pattern / number within pattern. 74 Chapter 3: Achieving Transient Associative Plasticity through Synap- tic Facilitation 3.1 Overview Associative (‘Hebbian’) synaptic plasticity acts to strengthen the connections be- tween neurons which are active at the same time; this allows neural networks to store information for the duration of the plasticity. Such learning is of use to networks on a variety of time scales, including the ability to form transient associative con- nections which decay within seconds. However, experimental evidence for a direct type of this transient associative plasticity is lacking. Instead, typical mechanisms of short-term plasticity, such as synaptic facilitation, are not associative, meaning they are activated regardless of whether activity in one neuron is paired with its postsynaptic targets. In this chapter, I will demonstrate a network architecture by which facilitation could lead to short-term associative strengthening, and as a result be used to create a ‘transient attractor’ network capable of short term memory. In addition, it can perform temporal coherence analysis useful for auditory streaming, as well as robust stimulus de-noising. Thus, facilitation acting within specific cor- tical microcircuits could play a key role alongside other known synaptic effects to 75 control how information is processed throughout the cortex. 3.2 Introduction The real world ‘causes’ of sensory inputs, such as objects generating visual or audi- tory stimuli, typically persist on time scales many times longer than those of activity from a single neuron. Making sense of natural sensory stimuli would therefore be greatly assisted by the ability to store information about the sensory inputs for some period of time. One common means to preserve information within a network for longer than individual neuron reaction times is in self-sustaining (or persistent) patterns of activity (Barak and Tsodyks, 2014; Seung, 1996). Recently, however, several experimental findings have raised doubt on whether such persistent activity is responsible for all the short term retention of information in the cortex (Postle, 2015; Sreenivasan et al., 2014). In this chapter, I will concentrate on an alternative mechanism for the storage of such short-term information: temporary modifications to the network’s functional connectivity. Changes to a network’s connectivity will affect how the network responds to incom- ing signals; these changes might therefore allow for the storage and recall of short- term memory. One example of this is the transient attractor network presented in Chapter 2. This network is based on recurrent, excitatory connections that are strengthened due to coincident pre- and postsynaptic activity. Strengthening con- nections between coactive neurons means it is more likely they will be coactive in 76 the future. This strengthening is transient, and decays on the order of seconds. Such transient changes in connectivity modify the network’s attractor structure: when a subset of neurons in a pattern are externally stimulated, excitatory currents to other neurons within the pattern are amplified by the strengthened connections. Later, the whole pattern may be recalled using an appropriate probe. Such transient at- tractors have been associated with cortical processes including short-term memory, temporal coherence analysis and signal de-noising. The function of the transient attractor network relies on transient associative plas- ticity. Indeed, such modifications are known to occur directly on an individual neuron to neuron basis through mechanisms of long-term potentiation (LTP) (Tet- zlaff et al., 2012). However, LTP is usually defined by changes occurring over time scales of minutes to hours. Although it may be possible that information that is only useful for seconds is saved for much longer periods of time due to such mechanisms, accumulating large quantities of obsolete information will make recalling individual pieces of information very difficult. One such candidate mechanism for short-term plasticity on the time scale required for the transient attractor network is synaptic facilitation, whereby the strength of a synaptic input is temporarily strengthened due to presynaptic neural activity. This occurs due to a buildup of calcium in the presynaptic terminal which increases the probability of vesicle release (Zucker and Regehr, 2002). Such strengthening will occur regardless of whether it is coincident with postsynaptic activity. Such 77 non-associative plasticity cannot store memories in the all-to-all transient attractor network; any facilitation at one neurons synapses will affect all other neurons equally. Sparser networks, however, may temporarily store information using non-associative short-term plasticity. This was demonstrated by Mongillo et al. (2008), who pre- sented a network which contained several pre-determined cell assemblies, i.e. groups of cells with strong connections between cells in the same group. When a given as- sembly is active, all cells comprising it undergo facilitation. As a result, should one cell in the assembly fire soon afterwards, the facilitation will increase the post synaptic currents in all connected cells. This disproportionately affects cells within the same assembly. Mongillo demonstrated how this could lead to the spontaneous reactivation of any recently active assemblies in the presence of background noise, allowing for short-term memory via facilitation. The memories that may be stored in this network are, however, pre-defined by the set of assemblies: the network can- not arbitrarily associate any pair of features. In this chapter, I will demonstrate how a neural network may achieve associative plasticity using only non-associative mechanisms (namely, synaptic facilitation). I start with a simple model which demonstrates how facilitation can lead to associa- tive plasticity in networks with a predefined circuit motif. I will then apply this motif to a transient attractor network, demonstrating that it can store short-term memories. Finally, I will consider a large network in which the recurrent connections represent a set of more generalized features, and examine how facilitation in such 78 a network might allow for short-term memory via transient attractors. Together, these networks describe how short-term associative plasticity may be achieved using synaptic facilitation. 3.3 Results In order to demonstrate how short-term associative plasticity might be achieved in a network of facilitating neurons, I start with a simple example. A small network is constructed which is capable of transient associative learning within a population of three ‘basis’ neurons without any form of associative plasticity. This means that stimulating any two basis neurons at the same time will lead to a temporary increase in recurrent excitation between those two neurons, but not to the third neuron. This was achieved through a sparse network structure; a diverse system of connections between neurons allows for non-associative mechanisms to have a targeted effect. In particular, the network includes a second population of neurons, ‘feature’ neurons. Each feature neuron connects to and from a pair of basis neurons, and one feature neuron exists for each pair of basis neurons (Figure 3.1A). In addition, the network contained one ‘basis’ inhibitory neuron and one ‘feature’ inhibitory neuron, to medi- ate competition within the two excitatory populations. All connections are initially equal in strength. Synaptic strengths between excitatory neurons are the product of three factors: a constant, baseline weight Sij, a synaptic facilitation factor ui(t), and a synaptic 79 depression factor xi(t). The latter two of these will be effected by the presynaptic firing rate, yi(t). The synaptic strengths are governed by the equations Wij(t) = Sijui(t)xi(t) (3.1) dui(t) dt = 1 τu+ (umax − ui(t)) (yi(t)/y0)α − 1 τu− (ui(t)− 1) (3.2) dxi(t) dt = 1 τx+ (1− xi(t))− 1 τx− xi(t)ui(t)yi(t) (3.3) (Tsodyks et al., 1998). Here, I used common values of the time scales of both synaptic facilitation and depression decay, τu− = 1.5s, τx− = 0.2s (Mongillo et al., 2008). Note that the dynamic terms are dependent only on presynaptic activity, and so will be applied equally to all synapses from the same neuron. In the presence of presynaptic activity, the synaptic facilitation will increase towards umax and the synaptic depression xi(t) will decrease towards 0. In the absence of activity, both the facilitation and depression will asymptote to 1, with the depression recovering faster than excitation (τu− > τx−) (Hennig, 2013). This results in an immediate weakening of the synapse during activity, followed by a long period in which the synapse is stronger. I next designed a stimulus to test associative learning between the three basis neu- rons. This will be composed of three periods. During the first, neuron B1 and B2 are activated; if associative learning occurs, this will lead to them being paired. In the second period, neuron B3 is activated, so that all neurons have been active an equal amount. This rules out excitability due to non-associative effects. I then tested for associative learning by stimulating one of the paired neurons (B2), and observing 80 Figure 3.1: A sketch of associative plasticity due to synaptic facilitation. A. Network structure composed of three excitatory basis cells, three intervening feature cells (two shown), and a single (feature) inhibitory cell. External stimulus is designed to test Hebbian learning; the first two input neurons are initially co-active, while the third not co-active with any other neuron. B. Basis layer activity reveals Hebbian association has been achieved, as the network now exhibits (indirect) ex- citation between the neurons which were initially co-active, but not between those which weren’t. C/D. Associative learning achieved through the level of activity and facilitation of the feature neurons. Feature corresponding to co-active pair of neuron shows much higher level of activity, and therefore facilitation. This in turn leads to more current flowing to the paired neuron (B1), providing further input current to the facilitated feature neuron (F1), and allowing the facilitated feature to outcompete the other feature neuron. 81 the resulting activation. During this final period, the paired neuron (B1) is active, whereas the unpaired neuron (B3) is not. This indicates associative learning has occurred. As this learning is controlled by facilitation, it will decay on the timescale of facilitation. Therefore, this demonstrates transient associative plasticity using non-associative mechanisms. This process depends on facilitation (ui(t)) of the feature neurons (Figure 3.1C). Feature neurons respond strongest, and hence facilitate strongly, when both of their basis neurons are active. This means that those feature neurons which represent pre- viously co-active pairs of neurons will have strongly facilitated synapses, strength- ening the connection between the neurons. Should one neuron in the pair become active, the other will also become significantly active, which will also increase the excitatory input into the feature neuron (positive feedback). In addition, the in- hibitory neuron mediates competition between the feature neurons; those feature neurons which connect to one active and one inactive neuron may fire initially, but will soon lose out to other feature neurons which connect to two active neurons. This is also supported by synaptic depression; many postsynaptic currents may be initially strong enough to cause neural activity, but only those which then receive strong positive feedback (that is, they are associated with the dominant assembly) may remain active. 82 3.3.1 Short-Term Memory in a Complete Facilitating Network Having established how transient associative plasticity can be achieved in our facil- itating feature network, I next considered a specific application of transient associa- tive plasticity: the transient attractor network. This network made extensive use of direct transient associative plasticity, and was shown to be capable of multiple types of cortical tasks within the sensory cortices. I now demonstrate how a facilitating feature network can mimic one of these tasks being the ability to store and recall multiple short-term memories. I constructed a network based on the above schema, except with 9 basis cells (and consequently 36 feature cells), and test if two memories might be stored and recalled. A simple stimulus was designed for this task (Figure 3.2B). The majority of this stimulus involved imprinting the network with the patterns; the patterns were shown alternately at 4 Hz for 1 second. A probe was also selected for each pattern to test whether the pattern has been stored in the network. These probes are simply one neuron from each pattern. Each probe was shown briefly before and after the training period. Before training, the probe stimuli only led to the activation of the probe neurons themselves, indicating no memory is initially stored in the network. After training, however, each probe led to the recall of the entire pattern with which it is associated (Figure 3.2C). This transient attractor network is therefore capable of flexibly storing multiple patterns within its facilitated synapses. 83 Figure 3.2: Short-term memory in a two-layer, pairwise complete network. A. Network structure with two layers with one neuron in second layer for each pair of neurons in first layer (subset of all second layer neurons shown). C. Stimulus composed of two presentations of each pattern (250 ms duration each, 50 ms be- tween), capped at beginning and end by probe stimulus (subset of each pattern, 20 ms duration, 80 ms between). D-E. Activity of first (D) and second (E) layers of excitatory input in response to input. First layer successfully completes patterns after training. 84 3.3.2 Facilitating Feature Network with Generalized Features The above transient attractor network contains simplistic features (a pair of neu- rons), with every possible feature represented. In this way, it can temporarily learn associations between any possible inputs. This is not necessary for processing natu- ral stimuli which typically contain higher order correlations; not all pairs of inputs are equally likely to occur, or are as behaviorally important to the individual. Instead, cortical networks are thought to successively decompose inputs into sets of increasingly complex features. The set of features chosen is not random, but instead works to efficiently represent natural stimuli (Section 1.4.1.1). This also means that the set of features is not complete; not every possible feature is included (many are of little use in processing natural stimuli). This motivates the design of a more realistic neural network, where the network is primarily concerned with storing in- formation about some set of ‘natural’ patterns, as opposed to any pattern. It does this through the use of smaller, pre-learned features, which store information about those same patterns. Once again, I constructed a network composed of basis and feature cells. I also ran- domly selected a set of P ‘usual’ patterns. These patterns will inform the selection of features, and the network will be tested to see how well it can store and recall one or two of these patterns. I do not assume that a stable attractor already exists for each distinct pattern (as would be required for persistent activity). Instead, re- 85 peated exposure to a pattern will cause facilitation of the relevant feature neurons, increasing disynaptic excitation between the basis neurons within the same pattern. This will then assist in later recalling the pattern. The two-layer network contains 100 basis neurons (each with their own input), 500 feature neurons, and one inhibitory neuron for each layer (Figure 3.3A). Each feature represents a set of 5 basis neurons. Before forming connections between the layers I randomly selected P patterns, with each pattern consisting of 20 basis neurons. Each feature cell will represent a random subset of 5 basis neurons from a randomly assigned pattern. Note that only a very small proportion of all features are present in the network (0.1% of all features within a pattern when P = 50, or 0.0005% of all possible features). As an illustrative example of how the network functions, I considered an example where the network is initially aware of 50 different patterns (P = 50). I then selected two of these patterns to be presented to the network, and checked if the network can store them in its facilitated synapses. Prompts were randomly chosen (subsets of 5 neurons from both pattern), and the stimulus was created by alternating the chosen patterns at 4 Hz for 1 sec, with prompts at start and end (Figure 3.3B, top). At the initial prompt, the network failed to complete the patterns (Figure 3.3B, bottom left), although some extra channels are active due to the choice of features. After training, however, the network successfully recalled both patterns (Figure 3.3B, bot- tom right). This demonstrates how facilitation has allowed for the accumulation of 86 Figure 3.3: Two-layer network with generalized features. A. Network struc- ture with wiring between layers determined by set of P randomly chosen patterns. B. Network successfully learns patterns. Top: Stimulus created using two patterns selected from set of P = 10 patterns, shown alternately (duration 250 ms, 50 ms between), capped at beginning and end by probe stimulus (subset of each pattern, 20 ms duration, 80 ms between). Bottom: First layer response before and after training showing successful recall of both patterns. C. Probability of firing for dif- ferent classes of first excitatory layer neurons firing vs P, averaged over 1000 trials (prompt excluded). Blue line represents in pattern, red the alternate pattern chosen for training, orange all other neurons. D. Probability pattern most resembles cor- rect pattern from set vs P , averaged over 1000 trials. Black dotted line is expected performance due to chance (100%/P ). associations, creating transient attractors and allowing for the storage and recall of memories. The degree to which the features in the network are tuned to represent each indi- vidual pattern depends on the number of recognizable patterns, P ; an increase in pre-wired patterns in the network will mean fewer second layer neurons assigned to each pattern and more distractions from other feature neurons. For these reasons I anticipated a drop in performance with an increase in the number of recognizable patterns. I measured this performance in two ways. 87 First I examined which neurons are active in response to each prompt (Figure 3.3C). Below P = 50 the network recalls all patterns with very high accuracy, but even out to P = 250 the basis neurons within the pattern remain several times more likely to respond than the other neurons. To determine whether this indicates the network is recalling the correct pattern, or just a similar one, I measured how often the output is most similar to the desired pattern (compared with P − 1 other patterns used to determine connections). This probability decays with increasing P , so that the network’s response has a 50% chance of most resembling the desired pattern when a total set of 250 patterns is used to determine feature set (Figure 3.3D). I remark that the network still performs remarkably well for high P values, when only a small number of second layer neurons are allocated to each pattern. In this case, successfully storing and recalling patterns involves incorporating features from other similar patterns without confusing the patterns entirely. I conclude that this network can store and recall a wide variety of patterns. 3.3.3 Extracting Information using Temporal Coherence The transient attractor network (Chapter 2) is also able to segregate simultaneously presented patterns whose amplitude varies in time via temporal coherence (Shamma et al., 2011), and this property extends to the network presented here. Specifically, two patterns were chosen, and each pattern’s amplitude over time was governed by a temporal envelope. I now demonstrate how this property extends to the network 88 presented here. I used the larger transient attractor network described above, with wiring deter- mined by a set of P = 50 patterns. Two patterns are selected from this set, and prompts chosen as subsets of each pattern. For training, each pattern is allocated a randomly varying temporal envelope (Figure 3.4A). In order to separate out the patterns, the network must learn which inputs share a temporal envelope. By com- paring the network’s responses to the initial and final prompts, it can be seen that the network can store, and recall, each individual pattern (Figure 3.4B). I conclude that this network is capable of performing some on-line temporal coherence analysis, and complete a simple streaming task. 3.3.4 Signal De-noising By selectively strengthening connections within the networks, transient attractors may also act as integrators, and in doing so work to de-noise the input signal. Pre- viously this was done using Hebbian synaptic learning; here I show it can also be achieved using facilitating synapses. (Figure 3.5A). This stimulus incorporates oc- clusion (signal partially missing) and noise in a way so that it is not possible to extract the signal using only information from any one point in time, or information from any single input channel. The stimulus used is the same as that used in chapter 2 on signal de-nosing. 89 Figure 3.4: Temporal coherence analysis with facilitating network. From simulation with 100 first layer excitatory neurons, 500 second layer excitatory neu- rons and P = 50. A. Stimulus structure. Two patterns (top, center) are each assigned their own random temporal kernel (bottom, center). A subset from each pattern is chosen as a prompt (top, left/right), displayed briefly at beginning/end to test memories stored in network (bottom, left/right). B. Activation of first layer of excitatory neurons before (left) and after (right) training reveal separations of stim- ulus into two patterns using temporal coherence, with patterns recalled via suitable prompt. Figure 3.5: Signal de-noising with facilitating network. From simulation with 100 first layer excitatory neurons, 500 second layer excitatory neurons and P = 50. A. Stimulus composed of two parts. Top: Occluded patterns (25% occluded at a time) shows for 25 ms, repeats every 75 ms. Bottom: Additive noise random across all non-pattern channels, designed so all channels have approximately equal average firing rates and temporal duration. Right: The total stimulus at two points. B. Network activity in response to stimulus. Initially network responds to noise and signal equally, but over time correlations in input allow it to filter out noise and complete the pattern. 90 Initially, the network structure failed to discriminate signal from noise; the biased wiring structure does meant a non-trivial attractor was seen (Figure 3.5B). By the end, however, the network effectively distinguished noise from signal, and even included many parts of the signal which were temporarily occluded. Note that this is not the same as merely matching the stimulus with some attractor set (if so, the initial presentation would have led to a successful recall of the pattern). Instead, the network accumulated information about the likely nature of the signal by facilitating relevant synapses, and used this to usefully process the signal. 3.4 Discussion In this chapter, I have demonstrated how short-term facilitation might mimic tran- sient associative plasticity. Examples in this chapter are largely concerned with the transient attractor network introduced in chapter 2, which was shown to perform a variety of tasks related to sensory processing. These findings might prove to be applicable to several other neural processes. Associative plasticity allows neural networks to directly track correlations between neurons, and is widely associated with cortical processing. Direct associative plasticity has been observed to occur on many different timescales, but typically enduring at least tens of minutes (Tetzlaff et al., 2012). In contrast, non-associative mechanisms are known to endure on a variety of shorter timescales, from seconds up to a few minutes (Fisher et al., 1997). The above work might therefore allow those cortical processes which depend on as- sociative plasticity to be performed on any time scales, including the timescales on 91 which stimulus objects typically persist (hundreds of milliseconds to tens of seconds). Additionally, several models have already been proposed which rely on transient as- sociative plasticity which decays within seconds. This includes models of short-term memory (Sandberg et al., 2003; Szatma´ry and Izhikevich, 2010), auditory streaming (Von der Malsburg and Schneider, 1986), corrective prediction (Schultz and Dickin- son, 2000) and in control of persistent activity (Brunel, 2003). The above concept of transient associative plasticity via short-term facilitation might allow for more realistic implementations of many of these models. The short term storage of information in neural network through facilitation is only possible in asymmetric (e.g. sparse) networks; biased connections permit biased ef- fects from facilitation. The use of sparse networks in this way was first exploited by Mongillo et al. (2008). In the model presented by Mongillo et al., only objects which correspond to pre-existing cell assemblies may be stored. The network presented in this chapter extended this idea by including a subpopulation of feature neurons; memories are then stored by facilitating a subset of all features, from which the object may be recreated. This framework relies on a particular network structure within cortex. For one, it distinguishes between neurons as either basis or feature cells; such a division is not known to exist in cortex. Nevertheless, it is possible that a similar mechanism could allow for the short-term retention of information in a network without a clear 92 division between basis and feature cells. For example, when two cells are co-active, their shared neighbors are disproportionately likely to be coactive, and may there- fore facilitate. Later, should one cell become active, it will disproportionately affect the other cell by way of the various facilitated, shared neighbors. The framework also assumes that recurrent connections between excitatory neurons are largely sym- metric; there is some experimental evidence to support this (Ko et al., 2011; Song et al., 2005). This chapter presented two different structures for facilitating networks. The first assumed that every possible feature is present; this may be of use in some small neural networks, but is inefficient for large networks. I then extended this idea to a more general concept of associative plasticity: when only a small subset of all possible features are represented by feature cells, facilitation may allow for recurrent connectivity between neurons associated with a recently seen feature. In biological neural networks, it is widely believed that successive layers of stimulus processing do indeed decompose stimuli into successive features (Section 1.4.1.1), and it is quite plausible that many of these connections would be bidirectional. Moreover, due to shared higher order correlations in natural stimuli, the number of features used to represent natural stimuli is far smaller than the set of all possible features. This means that the above facilitating transient attractor network makes use of a network structure that has been well documented within the sensory cortices to temporarily store information about any natural stimuli to which it was recently exposed. 93 Methods Simulations All simulations were performed in MATLAB using the Euler method with ∆t = 0.1 ms. Neuron Model In all simulations I used a continuous firing rate model with cell voltage, vi(t), de- pending on the weighted sum of recurrent excitatory, inhibitory and input currents, dvi(t) dt = −τvi(t) + ∑ Excj Wijyj(t) + (E I rev − vi(t)) ∑ Inhj Wijyj(t) + In(t) In this formulation, the excitatory input currents are independent of the cell mem- brane potential. This approximation is valid so long as the excitatory reversal potential is far higher than the firing threshold, which is the case for cortical neu- rons. In contrast, this can not be assumed for the inhibitory currents because the inhibitory reversal current is typically close to the cell membrane potential at rest. Firing rates, yi(t), are then calculated as a saturated rectified linear function of the cell voltage, yi(t) = max[0, 1− exp(−a(vi(t)− b))]. Parameters Simulation parameters which remain constant across all simulations are listed in Table 3.1. 94 Weights between neurons depend on the network structures used in each Figure. For Figure 3.1-2: WSE1 = 10, WE1E2 = 20, WE2E1 = 20, WEXIX = 1, WIXEX = 20, WIXIX = 10, y0 = 5 For Figure 3.3-5: WSE1 = 10, WE1E2 = 5, WE2E1 = 5, WEXIX = 1, WIXEX = 10, WIXIX = 10, y0 = 5 95 Name Symbol Value Max Facilitation umax 5 Facilitation increase τu+ 100ms Facilitation decrease τu− 1500ms Depression increase τx+ 50 ms Depression decrease τx− 100 ms Leak current τ 0.5 ms−1 Firing scale a 1 Firing threshold b 0 Inhibitory reversal potential EIrev -1 Table 3.1: Facilitating Network Parameters Above parameters were used in all simulations in the chapter. 96 Chapter 4: Auditory Streaming via Gamma Partitioning 4.1 Overview Understanding the auditory landscape around us involves discriminating between several auditory sources. The cortical processes that allow for this are not well understood. In this chapter, I will investigate how a biologically realistic network might represent multiple perceptual objects through a process of gamma partition- ing. This partitioning incorporates both segmentation, that is the division of the the stimulus into multiple objects, and segregation, that is the separation the cortical representation of each object. I demonstrate how this online mechanism might be applied in processing several different types of auditory stimuli. I finish by consid- ering how a biologically plausible online clustering technique might allow for the formation of coherent streams (so-called integration). This method is of particular interest due to its ability to exploit multiple features in forming streams, giving a plausible mechanism behind streaming in the auditory cortex. 97 4.2 Introduction The auditory landscape that surrounds us is highly complex. Consider, for example, a cocktail party with music, multiple voices, and a variety of background noises. The sound we hear is a single waveform, the sum of individual sound waves from each source. ‘Auditory streaming’ refers to how the auditory system separates this com- bined information into auditory objects which represent the sources of the sound. This process has been divided into three parts (Bregman, 1990): segmentation, seg- regation, and integration (defined in Section 1.7). Understanding the challenges involved in auditory streaming can be assisted by comparing it with visual object recognition. Distinct visual objects typically ap- pear in separate regions of the two dimensional visual field, stimulating different regions of the primary visual cortex and therefore allowing for spatial segmentation and segregation. In contrast, sounds are primarily represented over a single dimen- sion, frequency. When decomposed into their constituent frequencies, the sounds from individual sources are not continuous and localized, but disjoint and span a large range of frequencies. This means that the representation of different auditory sources significantly overlap. Auditory streaming is influenced by some non-spectral features such as spatial location, this information is not necessary for streaming; this is evidenced by the ability to stream sounds using a single audio channel. The auditory system employs a large feature expansion when interpreting incoming 98 sounds; there are approximately 8,000 times more neurons in the auditory cortex than receptors in the cochlea (Worden, 1971). These neurons are collectively sensi- tive to a wide array of characteristics (Section 1.6.2). It is also known that sounds which are more similar with respect to these characteristics are more likely to be perceived as a single source (Section 1.6.3.2). We are left, however, with the so- called binding problem (Section 1.7.2): how do you represent the sets of features which are bound into a single stream? A solution to the binding problem is suggested by correlation theory (Von der Mals- burg, 1981): features that are represented at the same points in time are ‘bound’ into a single object. This theory has been employed in neural network models to explain auditory streaming (Von der Malsburg and Schneider, 1986; Wang, 1996) via targeted recurrent excitatory connections (Section 1.7). These models use net- works with simple, local connectivity, concentrate on non-overlapping objects, and are composed of idealized ‘oscillator units’ as opposed to true neurons. These mod- els will be further compared with the model presented here in the Discussion. In this chapter, I will investigate a new model for auditory segmentation and seg- regation based on synchrony of neurons in the auditory cortex during gamma wave oscillations. These are oscillations that occur in cortex at between 30 Hz and 100 Hz (Arnal and Giraud, 2012) (Section 1.4.2.2). Gamma partitioning of sensory stimuli has previously been used to explain visual processing (Miconi and Vanrullen, 2010). This model used localized recurrent connections (local using a combination of loca- 99 tion and orientation). Auditory segmentation provides a much more challenging task due to the low dimensionality of the input, and the manner with which the auditory sources significantly overlap in their frequency representation. In this chapter, I will demonstrate how gamma partitioning can process auditory stimuli with minimal delay using non-localized recurrent connections. I will start by describing the model structure, then use it to process a variety of auditory stimuli. I also consider how this partitioning may lead to complete auditory streaming by combining it with a clustering algorithm over successive windows of time. 4.3 Results A neural network is created which is composed of spiking, noisy, leaky integrate- and-fire neurons with 2 ms absolute refractory periods. Within the network there are three populations of neurons: one excitatory population and two inhibitory pop- ulations. The excitatory neurons are responsible for receiving and interpreting the stimulus, while the two inhibitory populations respectively mediate local and global competition within the excitatory population. Groups of excitatory neurons are arranged tonotopically over the range 50 Hz to 2 kHz, spaced every 5 Hz. Each group contains multiple (100) spiking neurons; this allows us to smoothen the firing rate over time while maintaining realistic spiking behavior. The input current to each group is calculated using a Difference of Gaus- sians weight scheme (center σ = 10 Hz, surround σ = 100 Hz). There are also 100 Figure 4.1: Network structure. Network composed of excitatory neurons, global inhibitory neurons and local inhibitory neurons. Recurrent excitatory connections change between different networks investigated. recurrent excitatory connections between the groups of excitatory neurons; these inform the grouping of frequencies into objects, and are further discussed below. Local inhibitory neurons enforce competition within groups of excitatory neurons, and between groups which represent similar frequencies. These local inhibitory neu- rons are arranged tonotopically over the range 50 Hz to 2 kHz, spaced every 20 Hz. They take inputs from, and project back to, nearby excitatory neurons; in my simulations, the connections are described by a Gaussian function centered on the neurons assigned frequency with standard deviation of 100 Hz. 101 The global inhibitory neuron connects equally to and from all excitatory neurons in the network. By integrating over all excitatory cells, this global inhibitory neuron monitors the total level of network activity, inhibiting all neurons if some threshold is passed. This behavior ensures that only a minority of cells may be active in any small window of time: in a sense, the global inhibitory neuron acts as a gatekeeper. The existence of this cell leads to gamma waves within the network - activity slowly builds, the global inhibitory circuit is activated, and all activity is subdued before starting the cycle once again. 4.3.1 Recurrent Excitatory Connections This network may contain recurrent connections between excitatory neurons. These connections inform which neurons will be co-active; neurons which are connected are more likely to fire during the same gamma wave. All connections are symmetric in this network. I consider three different patterns of recurrent wiring: no recurrent connectivity, local recurrent connectivity, and harmonic recurrent connectivity. These recurrent connections are determined using a Hebbian learning scheme (except for the case with no recurrent connections). This means that the connection strength is determined by average co-activity given some set of ‘typical sounds (both typical ‘local’ sounds to determine the local recurrent connectivity and typical ‘harmonic’ sounds for the harmonic connectivity). This involved first constructing sets of such ‘typical sounds’, each sound has a random fundamental frequency between 250 Hz 102 and 1 kHz. Such ‘local’ sounds only contain energy at the fundamental frequency, whereas ‘harmonic’ sounds have energy at all integer multiples of the fundamental frequency. These energy distributions are then smoothed by convolving with a Gaussian function (σ =10 Hz). The recurrent excitatory weights are then calculated as Wij = [1− exp(−m(ρij − ρmin))] (4.1) where the correlation (ρij) is calculated using a set of 10 6 ‘typical’ sounds, m = 5 and ρmin = 0.1. Under this training system, the harmonic connections will also include connections between neurons which represent similar frequencies. When combined with the local inhibitory network, both the local and harmonic recurrent connections form a center-surround feedback pattern between the excita- tory neurons - activity from an excitatory neuron excites its closest neighbors, but indirectly inhibits more distant neighbors. 4.3.2 Stimulus Pre-Processing Before being fed into the three models, any auditory stimulus is pre-processed in a manner which is broadly faithful to cochlear processing (Figure 4.2). The cochlea is known to decompose the input waveform, measuring the amount of energy at each frequency; I mimic this using a discrete fourier transform, which is then smoothed using the aforementioned ’Difference of Gaussians’ filter. 103 Figure 4.2: Cochlear pre-processing. Stages involved in pre-processing stimulus, similar to calculations performed in the cochlea. Sound waves from multiple sources are combined into a single waveform, which is then decomposed into frequency space using a discrete fourier transform (using a hamming filter 2 ms wide, phase information discarded). This is then passed through a ’Difference of Gaussians’ filter to create the input current for each excitatory group of neurons. 4.3.3 Stimuli from Musical Instruments I start by considering how each model responds when presented with the rela- tively static harmonic sounds produced by musical instruments. Recordings cho- sen were recorded by the London Philharmonic Orchestra (London Philharmonia, 2016). Recordings from four instruments (trombone, cello, clarinet, flute) were used that represent the main categories of non-percussive instruments (brass, string, reed and non-reed woodwind) and for which the database was most complete. Inputs are generated using 12 notes that form an octave within the range of all four instru- ments (C4 to B4 in International Pitch Notation, also known as the octave directly above ‘Middle C’). This makes for a total of 48 samples with fundamental frequen- 104 cies between 250 Hz and 500 Hz. An auditory stimulus is created by choosing two samples. I start with an in-depth analysis of the networks’ behavior in response to one particular auditory stimulus (Cello C4 and Flute D4) (Fig 4.3A). There are several similarities in neural activity across all models: all exhibit gamma wave dynamics of similar frequency and inten- sity, and in all cases activity is concentrated in frequencies associated with inputs. There are also significant qualitative differences between the networks’ behavior. The causes of these can be understood by examining the recurrent connectivity. Neurons (which represent individual bands of frequency) which are strongly coupled are highly likely to be simultaneously represented. This means that local connec- tions cause neurons representing similar frequencies to co-activate, and harmonic connections cause entire harmonic stacks to co-activate. These harmonic stacks match well with the harmonic input from the individual sources. I next established two different measures to quantify this network behavior. The first involves processing the network activity to separate it into different gamma waves, and then measuring how well each gamma wave correlates with the two different sources (over the interval in which the gamma wave occurred). These cor- relations may be plotted against one another, and the distance from the diagonal then used to estimate how much each gamma wave ‘specializes’ in either source (Figure 4.4A). These results confirm the above qualitative findings. In particular, individual gamma waves in the harmonic network represent individual sources with 105 Figure 4.3: Gamma partitioning applied to two instruments. Network ac- tivity for three different patterns of recurrent connectivity (representative 100 ms shown of total 3 second simulation, activity smoothed over 1 ms to increase visibil- ity). Network without any recurrent connections (top) has activity in every gamma cycle distributed uniformly across all harmonics. In contrast, each gamma wave in the network with local recurrent connections (middle) is concentrated in relatively few harmonics, but with no clear pattern between gamma waves. Finally, in the network with harmonic recurrent connections (bottom), each gamma wave matches fairly well with one source or the other, with the source represented alternating between subsequent cycles. 106 Figure 4.4: Quantifying source separation. Analysis of stimulus displayed in Figure 4.3. A Absolute value of correlation between individual gamma waves and two sources. Average distance from diagonal below. Note that the optimal score is 1/ √ 2 ≈ 0.7. Left: No recurrent connectivity shows all gamma waves representing both sources equally. Middle: Local connections cause significant variation between gamma waves, but none significantly match with either source. Right: Harmonic connections cause all gamma waves to correlate very well with one source, but not with the other. B. Principal component analysis and correlation between Varimax rotations of first two PCs and sources. Left: No recurrent connectivity has one significant PC, and varimax components do not correlate well with sources. Middle: Local connections cause several important PCs which collectively explain most net- work behavior, and first two PCs (rotated) do not match well with sources. Right: Harmonic connections leads to 2 highly significant PCs, which when rotated match well with two sources. 107 a very high level of fidelity. The second measure attempts to test for a more general ‘perceptual object’. Syn- chronicity theory need not dictate that only one object is represented in each gamma wave as the above analysis would suggest, but rather that the features attributed to each object follow a common temporal pattern. We tested for such general ‘percep- tual objects’ using Principal Component Analysis (PCA); the number of principal components (PCs) required to explain most of the variance can measure the number of objects perceived. Analysis using PCA agrees with above findings (Figure 4.4B); no recurrent connections leads to a single object, local connections lead to several objects (the number of which is approximately equal to the number of harmonics present), and harmonic connections lead to two perceptual objects. Finally, we at- tempt to determine how these detected objects might match with the underlying sources. We do this using a Varimax rotation (Kaiser, 1958) on the first two PCs. The Varimax rotation is an orthogonal rotation which maximizes sparsity in indi- vidual components. The sounds from natural sources are often sparse in frequency, meaning that applying a Varimax rotation might allow the extraction of objects which are similar to stimuli. This was indeed the case for the harmonic network. Finally, all of these metrics were applied for all possible pairings of instrument stim- uli (excluding those in which both were playing the same note) (Figure 4.5). These results suggest that the above findings hold for just about all possible pairings of instrumental stimuli; no recurrent connectivity leads to a single perceptual object 108 Figure 4.5: Gamma partitioning across all samples. A. Degree of ‘specializa- tion’ as in Figure 4.4A. B. Number of ‘perceptual objects’ using PCA. C. How well first two PCs (rotated) match with two input sources. which slightly correlates with both sources, local recurrent connectivity leads to multiple objects which don’t correlate well with the sources, and harmonic recur- rent connectivity leads to the detection of two perceptual objects which match well with the input sources. This suggests that gamma partitioning may separate many different combinations of harmonic stimuli. 4.3.4 Dynamic Vocal Stimuli Thus far it has been shown how gamma partitioning might work to disentangle com- binations of harmonic sounds in a network with harmonic recurrent connectivity. To simplify analysis, I have only used stimuli which have an approximately constant harmonic structure. The majority of auditory stimuli, however, change significantly on the time scale of hundreds of milliseconds or less. We therefore now consider how the auditory partitioning provided by the network with harmonic recurrent connec- 109 tivity might assist with the streaming of such dynamic sounds. To test this method, I have chosen the first four bars of Queen’s ‘Bohemian Rhap- sody’ (Mercury, F, 1975). This song was selected due to the existence of a multi- track recording with multiple vocal parts, not as an ill-conceived attempt to win favor with my thesis committee. Two distinct vocal parts were selected, combined into a single waveform, and any silent periods were removed. Finally, the frequency was shifted slightly so that the whole sample would fall between 250 Hz and 500 Hz; this was done by rescaling time, speeding up by 40%. Due to the constantly evolving nature of both the underlying sources and the percep- tual segments, we use the correlation between each gamma wave and each (instan- taneous) source to determine how gamma partitioning has processed the stimulus (Figure 4.6). As before, it can be seen that most gamma waves correlate strongly with one of the sources (as they existed at the moment of each gamma wave). This is a strong indication that gamma partitioning can successfully handle dynamic stim- uli. It is interesting to ask how easily these segments can be joined together into coherent auditory streams. This can be done using the continuous representation hypothesis (Section 1.7.3); each segment is paired with the preceding segments with which it was most similar. To do this we turn to the literature for the online clustering of data streams (Venkatasubramanian, 2009). A sliding k-means clustering (k=2) al- 110 Figure 4.6: Gamma partitioning of ‘Bohemian Rhapsody’. A. Input stimu- lus composed of two vocal parts of introduction. B. Parts labelled using individual recordings C. Correlation between individual gamma waves and each individual source reveals most gamma waves bear a strong resemblance to one of the sources. Average distance from diagonal = 0.57. Smoothed heat map (reflected at bound- aries) displayed rather than raw data points due to quantity of data points. D. Auditory streaming can be performed by combining gamma partitioning with the sliding k-means algorithm to join segments together. Stream identities swap when sounds are discontinuous or significantly overlapping. 111 gorithm was applied both due to its simplicity and because k-means clustering can be performed by some forms of neural networks (Murtagh and Herna´ndez-Pajares, 1995). Windows are 50 ms long (neurally plausible timescales), neighboring win- dows overlap by 90%, and clustering is performed on timepoints with a significant amount of neural activity (>0.01% average). Stream continuity was achieved by initializing each cluster center with those found in the previous window. This complete streaming algorithm shows some success at forming coherent streams (Figure 4.6D). This algorithm does encounter issues when the underlying auditory objects are discontinuous in frequency space, or when they significantly overlap one another. In theory, this could be remedied by including a wider variety of audi- tory features (influenced by factors such as timbre or spatial location) which change slowly in time. One final example of auditory streaming via gamma partitioning has been included (Figure 4.3.4). This was done on the opening phrase of the ‘Flower Duet from Delibes’ Lakme´ (Delibes, date unknown). Unfortunately, I was not able to obtain individual recordings of each singers part, and so I was not able to quantitatively confirm the network performance. However, it is possible to see that the network segregates the input into individual streams, and that all harmonics within each stream undergo similar modulations in frequency (trills). 112 Figure 4.7: Gamma partitioning of ‘Flower Duet’. A. Input stimulus composed of two vocal parts in the chorus. B. Gamma partitioning and streaming performed using gamma partitioning and k-means streaming. 4.4 Discussion In this chapter I have considered how auditory partitioning might occur in a neural network using both recurrent excitatory and inhibitory connections. Partitioning occurs due to gamma wav oscillations, with different sets of perceptual objects rep- resented in different gamma cycles. It has been shown how this partitioning works for a variety of auditory stimuli, including those which are relatively complex. Fi- nally, I considered how this partitioning might be combined with a simple clustering algorithm to form continuous perceptual streams. This partitioning of auditory stimuli is only possible because information about the stimulus is preserved in the same population of neurons. The recurrent excitatory 113 connections provide a feedback loop within each perceptual object, causing one object to be disproportionately represented in each cycle. The inhibitory recurrent connections aid in this, causing a ‘winner-takes-all’ environment which can only rep- resent a small part of the stimulus. Inhibitory feedback is also crucial in resetting the cycle, so the network does not continuously represent one sole object. Finally, synaptic depression and neural refractory periods serve to penalize objects which have just been active, allowing the network to represent multiple successive objects over successive cycles. This also causes some phase-locking between neurons repre- senting the same object, causing them to be more likely to fire simultaneously in future cycles. These processes all depend on the population of neurons maintaining a continued neural percept of the stimulus, continued over timescales longer than individual neural processing. They are therefore dependent on recurrent connectiv- ity within the network. The model presented shares several features with the LEGION network (Wang and Terman, 1995)(Section 1.7.4.2). The foremost amongst these is how segmentation is calculated using pre-learned features; this is well supported by psychoacoustic experiments on auditory streaming, which demonstrate how streaming is influenced by factors such as harmonicity, timbre and location. For example, the importance of harmonicity to auditory streaming can be gauged by considering whispered speech (which lacks harmonics); such speech can only be understood while it is relatively loud, and is quickly lost amongst background noise if the volume drops. These pre-learned features are represented in the recurrent connections, a premise broadly 114 supported by theories of long term plasticity (Section 1.3). Finally, like the earlier LEGION network, synchronicity between oscillators is used to distinguish different streams. However, there are also several key differences between this network and earlier work. Some of these differences are caused by the different mechanisms underlying the network oscillations. In LEGION, oscillations are generated individually within each oscillating ‘unit’. In contrast, oscillations are driven collectively by the global inhibitory neuron in the network presented above. Explicitly modeling all neural processes as standard LIF neurons does make the network presented above more biologically plausible. More importantly, however, global synchronization of oscil- lations means that synchrony is achieved much faster. In LEGION, all oscillators start out of phase and are slowly shifted in phase space until segments match, a process that takes at least 5-10 oscillations. In contrast, in the gamma wave net- work some level of synchrony is automatically imposed (global gamma waves), and clear partitioning is achieved within one or two oscillations. This allows the gamma wave network to segregate quickly evolving signals (Figure 4.6), a task that was not attempted with the earlier neural network streaming models. The network presented in this chapter also expands on the concept of ‘similar’ neu- rons. Neurons can be considered to be similar if they are typically co-active in the presence of natural stimuli, or alternatively if there is a significant excitatory connection between them. Gamma partitioning essentially functions by clustering 115 together groups of ‘similar’ neurons. In the above network I consider two neurons to be ‘similar’ if they represent frequencies which might both occur in the same har- monic stack. Such a calculation is non-trivial; the definition of ‘similar can no longer be thought to depend on a distance function (a metric) as the underlying function is not monotonic (200 Hz and 400 Hz are ‘similar in harmonic space, whereas neither is ‘similar’ to 311 Hz). This level of complexity was not considered in the LEGION network, or in the paper which introduced the concept of gamma partitioning in the visual system (Miconi and Vanrullen, 2010). Indeed, when the LEGION model was later extended to consider harmonic stimuli this was achieved via extensive pre- processing to detect and gather frequencies belonging to harmonic stacks (Brown and Wang, 1997). In biological neural networks, each individual neuron may be sensitive to several different features. This means that two neurons might be fairly similar in some respects, but not in others. Consequently, the ability to cluster inputs using more complicated relationships between neurons is of great interest. We also briefly considered how segments might be combined together into auditory streams. In particular, we applied a sliding k-means clustering algorithm. This algorithm was chosen as it is relatively simple, and has been linked to clustering in neural networks . This algorithm has trouble following streams when they undergo a sudden jump in features; this is not surprising given the temporal continuity basis. We expect stronger performance with the inclusion of more features, and with a larger neural representation; this model only considers 400 different receptive fields (albeit with 100 neurons with each set of inputs), far less than the 108 in the audi- 116 tory cortex. These features need not play an equal role in stream formation; some features may remain more stable within any given stream, and it is reasonable to assume that a neural network could weight features appropriately (and learn these weights through exposure to natural stimuli). One interesting extension that could be explored is how this process of auditory streaming might be informed by my earlier work on transient attractors. It is highly plausible that the strengths of recurrent connections which dictate ‘similarity’ might depend on both ‘hard-wired’ correlations and also on correlations in the recent his- tory. An attempt was made to incorporate transient attractors into the above audi- tory streaming model, although significant strengthening of connections was found to lead to runaway excitation within each gamma cycle. This in turn might be remedied through plastic inhibitory networks (Vogels and Abbott, 2009). This will hopefully prove to be an interesting area for future research. While the gamma wave partitioning network presented in this chapter certainly falls short of a complete explanation of auditory streaming, this novel approach shows promise in its ability to perform complicated segmentation, in its biological realism, and in its potential for the incorporation of several complex features. 117 Methods Simulations All simulations were performed in MATLAB using the implicit Euler method with ∆t = 0.1 ms. Neuron Model In all simulations I used a continuous firing rate model with cell voltage, vi(t), depending on weighted sum of recurrent excitatory, inhibitory and input currents, C dVj dt = ILeakj (t) + I E j (t) + I I j (t) + αNdWt + In(t) IEj (t) = NE∑ i=1 gEij(t) IIj (t) = NE∑ i=1 gIij(t)[E I rev − Vj(t)] ILeakj (t) = g Leak[Vrest − Vj(t)]. Here, dWt is the time derivative of Brownian motion (N(0, √ ∆t)). In this formula- tion, the excitatory input currents are independent of the cell membrane potential. This approximation is valid so long as the excitatory reversal potential is far higher than the firing threshold, which is the case for cortical neurons. In contrast, this can not be assumed for the inhibitory currents because the inhibitory reversal current is typically close to the cell membrane potential at rest. Due to the spiking (dirac-delta) nature of the inputs, we model each of these con- 118 ductances as the product between a synaptic strength (a∗ij) and a time course for the conductivity from each incoming spike, g∗ij(t) = W ∗ ij ∑ tspk 1 τ ∗1 − τ ∗2 [ exp ( −t− tspk τ ∗1 ) − exp ( −t− tspk τ ∗2 )] H(t− tspk) (Gabbiani et al., 1994). The Heaviside step function (H) is included to ensure that the conductance may only change after each presynaptic spike occurs. Parameters Simulation parameters were constant across all simulations, and are listed in Table 2.1. Other Packages Spectrogram (DFT) were calculated using the ‘myspectrogram.m’ file by Kamil Wo- jcicki, Signal Processing Laboratory, Griffith University, Nathan, QLD, Australia, 2007. Accessed 2016. Available at https://www.mathworks.com/matlabcentral/ fileexchange/29596-speech-spectrogram/content/myspectrogram/myspectrogram.m 119 Name Symbol Value Maximum Synaptic Weights: Input to Exc WSE 0.02 Exc to Exc WEE 15 Exc to Local Inh WEI 0.5 Local Inh to Exc WIE 1 Exc to Global Inh WEIG 0.2 Global Inh to Exc WIGE 50 Synaptic timescales Excitatory synapse decr τE1 1 ms Excitatory synapse incr τE2 0.2 ms Excitatory synapse decr τ I1 10 ms Excitatory synapse incr τE2 2 ms Other Leak current gLeak 0.1 ms−1 Firing threshold b 1 Resting current Vrest 0 Inhibitory reversal potential EIrev -1 Noise αN 0.5 Table 4.1: Gamma Partitioning Network parameters Above parameters were used in all simulations in the chapter. 120 Chapter 5: Conclusions In this dissertation, I have presented a set of neural network models which investi- gate the roles that recurrent connections in neural networks might play in processing sensory stimuli. These models all revolve around the idea of keeping an ongoing per- cept of recent stimuli within the sensory cortices, and using this to alter how the neural network processes any new incoming stimuli. This task is particularly impor- tant since the objects being processed in the sensory cortices typically persist for far longer than the timescales of neural processing, but far shorter than the timescales of long term synaptic modifications. I started by investigating potential mechanisms behind short-term memory. This is one area in which recurrent neural connections have often been discussed; re- current connections allow for stable attractors, and information may be stored by activating one such attractor. It is doubtful, however, that such a persistent activity model of short term memory explains all the capabilities of short-term memory. In- stead, I investigated the role that temporary modifications to the recurrent network connectivity might play in storing memory. This plastic connectivity may not only emphasize pre-existing attractors, but also temporarily create attractors, which may 121 then be sensed using an appropriate prompt. I then went on to demonstrate how this concept of transient attractors could help in other tasks associated with sensory processing. Next, I investigated the mechanisms which might allow for temporary modifications to the network’s connectivity. Previous works have assumed some transient asso- ciative mechanism to store information, whereas none has been observed. Instead I investigated how this information might be stored by facilitating features within the network, making use of the process of synaptic facilitation and the hierarchical nature of stimulus processing. This alternative means of storing associative informa- tion might allow many different processes which depend on associative modifications to occur on a wider variety of timescales. Finally, I considered the challenge of auditory segmentation, as the first step to auditory streaming. In line with synchronicity theory, I hypothesized that this seg- mentation might happen over multiple gamma cycles. This mechanism relies on two different recurrent mechanisms: the global inhibitory network creates a ‘winner- takes-all’ environment, and recurrent excitatory connections cause a clustering of neurons due to their pre-learned ‘similarity’. This mechanism is also highly depen- dent on the continued representation of the stimulus within the same population of neurons; synaptic depression and refractory periods mean that a cluster of neurons which have just represented some auditory source are less likely to fire in the next gamma cycle. 122 Much work remains in understanding the role of recurrent connections in neural networks; for example, ideas from all of the above work might be formed into a more comprehensive model of auditory streaming. However, the work in my dis- sertation has demonstrated several new ways in which recurrent connections within the sensory cortices might aid in the processing of complex, natural stimuli. 123 Bibliography Abbott, L. F. and Regehr, W. G. Synaptic computation. Nature, 431(7010):796–803, 2004. Aksay, E., Baker, R., Seung, H. S., and Tank, D. W. Anatomy and discharge properties of pre-motor neurons in the goldfish medulla that have eye-position signals during fixations. Journal of neurophysiology, 84(2):1035–1049, 2000. Aksay, E., Olasagasti, I., Mensh, B. D., Baker, R., Goldman, M. S., and Tank, D. W. Functional dissection of circuitry in a neural integrator. Nature neuroscience, 10 (4):494–504, 2007. Amit, D. J., Fusi, S., and Yakovlev, V. Paradigmatic working memory (attractor) cell in IT cortex. Neural computation, 9(5):1071–92, 1997. Amit, D. J., Gutfreund, H., and Sompolinsky, H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14):1530– 1533, 1985. Arnal, L. H. and Giraud, A.-l. Cortical oscilations and sensory predictions. Trends in cognitive sciences, 16(7), 2012. Atkinson, R. C. and Shiffrin, R. M. The Control Process of Short-Term Memory. Technical report, Institute for Mathematical Studies in the Social Sciences, 1971. Attwell, D. and Laughlin, S. B. An energy budget for signaling in the grey matter of the brain. Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism, 21(10):1133–45, 2001. Barak, O., Tsodyks, M., and Romo, R. Neuronal Population Coding of Parametric Working Memory. Journal of Neuroscience, 30(28):9424–9430, 2010. Barak, O. and Tsodyks, M. Persistent activity in neural networks with dynamic synapses. PLoS Computational Biology, 3(2):0323–0332, 2007. 124 Barak, O. and Tsodyks, M. Working models of working memory. Current Opinion in Neurobiology, 25:20–24, 2014. Becker, S. Learning to categorize objects using temporal coherence. Technical report, The Rotman Research Institute, 1992. Becker, S. and Plumbley, M. Unsupervised Neural Network Learning Procedures For Feature Extraction and Classification. Journal of Applied Intelligence, 6, 1996. Ben-Yishai, R., Bar-Or, R. L., and Sompolinsky, H. Theory of orientation tuning in visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 92(9):3844–3848, 1995. Bendor, D. and Wang, X. The neuronal representation of pitch in primate auditory cortex. Nature, 436(7054):1161–1165, 2005. Bizley, J. K. and Cohen, Y. E. The what, where and how of auditory-object per- ception. Nature Reviews Neuroscience, 14(10):693–707, 2013. Bragin, A., Jando´, G., Na´dasdy, Z., Hetke, J., Wise, K., and Buzsa´ki, G. Gamma (40-100 Hz) oscillation in the hippocampus of the behaving rat. The Journal of neuroscience : the official journal of the Society for Neuroscience, 15(1 Pt 1): 47–60, 1995. Bregman, A. S. Auditory Scene Analysis. MIT Press, Cambridge, Mass, 1990. Bregman, A. S. and Campbell, J. Primary Auditory Stream Segregation and Percep- tion of Order in Rapid Sequences of Tones. Journal of Experimental Psychology, 89(2):244–249, 1971. Brenowitz, S. D. and Regehr, W. G. Associative short-term synaptic plasticity mediated by endocannabinoids. Neuron, 45(3):419–431, 2005. Brody, C. D. and Hopfield, J. J. Simple networks for spike-timing-based computa- tion, with application to olfactory processing. Neuron, 37(5):843–52, mar 2003. Brokx, J. P. L. and Nooteboom, S. G. Intonation and the perceptual separation of simultaneons voices. Journal of Phonetics, 10:23–36, 1982. Brown, G. J. and Cooke, M. Computational auditory scene analysis, 1994. Brown, G. J. and Wang, D. Modelling the perceptual segregation of double vowels with a network of neural oscillators. Neural Networks, 10(9):1547–1558, 1997. Brunel, N. Dynamics and Plasticity of Stimulus-selective Persistent Activity in Cortical Network Models. Cerebral Cortex, 13(11):1151–1161, 2003. Brunel, N. and Hakim, V. Fast Global Oscillations in Networks of Integrate-and-Fire Neurons with Low Firing Rates. Neural Computation, 11(7):1621–1671, 1999. 125 Buhmann, J. and Schulten, K. Associative recognition and storage in a model network of physiological neurons. Biological Cybernetics, 54(4-5):319–335, 1986. Burkitt, A. N. A review of the integrate-and-fire neuron model: II. Inhomogeneous synaptic input and network properties. Biological cybernetics, 95(2):97–112, aug 2006a. Burkitt, A. N. A review of the integrate-and-fire neuron model: I. Homogeneous synaptic input. Biological cybernetics, 95(1):1–19, jul 2006b. Carlyon, R. P., Cusack, R., Foxton, J. M., and Robertson, I. H. Effects of attention and unilateral neglect on auditory stream segregation. J. Exp. Psychol.: Human Percept. Perform., 27(1):115–127, 2001. Casey, M. A. and Westner, A. Separation of mixed audio sources by independent subspace analysis. Proceedings of the International Computer Music Conference, pages 154–161, 2000. Cherry, E. C. Some Experiments on the Recognition of Speech, with One and with Two Ears, 1953. Courtney, S. M., Ungerleider, L. G., Keil, K., and Haxby, J. V. Transient and sustained activity in a distributed neural system for human working memory., 1997. Curtis, C. E. and Lee, D. Beyond working memory: the role of persistent activity in decision making. Trends in Cognitive Sciences, 14(5):216–222, 2010. Darwin, C. J. Auditory grouping. Trends in cognitive sciences, 1(9):327–333, 1997. Darwin, C. J. and Bethell-Fox, C. E. Pitch continuity and speech source attribution. Journal of Experimental Psychology: Human Perception and Performance, 3(4): 665–672, 1977. Dayan, P. and Abbott, L. F. Theoretical Neuroscience. MIT Press, Cambridge, Mass, 2001. Delibes, L. Flower Duet from Lakme´. Performed by Anna Netrebko & Elina Garanca. [Video File]. Retrieved from https://www.youtube.com/watch?v=Vf42IP ipw on 06/10/2016. D’Esposito, M. and Postle, B. R. The Cognitive Neuroscience of Working Memory. Annual Review of Psychology, 66(1):115–142, 2015. Deutsch, D. Binaural integration of melodic patterns. Perception & psychophysics, 25(5):399–405, 1979. Ding, N., Chatterjee, M., and Simon, J. Z. Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. NeuroImage, 88: 41–46, 2014. 126 Doumont, J.-L. Magical Numbers : The Secen-plus-or-Minus-Two Myth. Psycho- logical Review, 45(2):123–127, 2002. Dranias, M. R., Ju, H., Rajaram, E., and VanDongen, A. M. J. Short-Term Memory in Networks of Dissociated Cortical Neurons. Journal of Neuroscience, 33(5): 1940–1953, 2013. Druckmann, S. and Chklovskii, D. B. Over-complete representations on recurrent neural networks can support persistent percepts. Advances in Neural Information Processing Systems, 23:1–9, 2010. Dubreuil, A. M. and Brunel, N. Storing structured sparse memories in a multi- modular cortical network model. Journal of Computational Neuroscience, 40(2): 157–175, 2016. Dudai, Y. Memory from A to Z. Oxford University Press, Oxford, 2002. Dudek, S. M. and Bear, M. F. Homosynaptic long-term depression in area CA1 of hippocampus and effects of N-methyl-D-aspartate receptor blockade. Proceedings of the National Academy of Sciences of the United States of America, 89(10): 4363–7, 1992. Edin, F., Klingberg, T., Johansson, P., McNab, F., Tegne´r, J., and Compte, A. Mechanism for top-down control of working memory capacity. Proceedings of the National Academy of Sciences, 106(16):6802–6807, 2009. Ehret, G. The auditory cortex. Journal of Comparative Physiology - A Sensory, Neural, and Behavioral Physiology, 181(6):547–557, 1997. Elhilali, M. and Shamma, S. a. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. The Journal of the Acoustical Society of America, 124(6):3751–3771, 2008. Engel, A. K., Koning, P., Gray, C. M., and Singer, W. Stimulus-dependent neu- ronal oscillations in cat visual cortex: Inter-columnar interaction as determined by cross-correlation analysis, 1990. Engel, A. K. and Singer, W. Temporal binding and the neural correlates of sensory awareness. Trends in Cognitive Sciences, 5(1):16–25, 2001. Erickson, M. A., Maramara, L. A., and Lisman, J. A single 2-spike burst induces GluR1-dependent associative short-term potentiation: a potential mechanism for short term memory. Journal of cognitive neuroscience, 22(11):2530–2540, 2011. Erickson, R. Sound Structure in Music. University of California Press, Berkeley and Los Angeles, 1975. Ermentrout, G. B. and Terman, D. H. Mathematical Foundations of Neuroscience, volume 35 of Interdisciplinary Applied Mathematics. Springer New York, New York, NY, 2010. 127 Felleman, D. J. and Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, N.Y. : 1991), 1(1):1–47, 1991. Fisher, S. A., Fischer, T. M., and Carew, T. J. Multiple overlapping processes underlying short-term synaptic enhancement. Trends in Neurosciences, 20(4): 170–177, 1997. Fontolan, L., Krupa, M., Hyafil, A., and Gutkin, B. Analytical insights on theta- gamma coupled neural oscillators. Journal of mathematical neuroscience, 3:16, 2013. Fourcaud, N. and Brunel, N. Dynamics of the Firing Probability of Noisy Integrate- and-Fire Neurons. Neural computation, 14:2057–2110, 2002. Fries, P., Roelfsema, P. R., Engel, A. K., Ko¨nig, P., and Singer, W. Synchronization of oscillatory responses in visual cortex correlates with perception in interocular rivalry. Proceedings of the National Academy of Sciences of the United States of America, 94(23):12699–12704, 1997. Fuster, J. M. and Alexander, G. E. Neuron Activity Related to Short-Term Memory, 1971. Gabbiani, F., Midtgaard, J., and Knopfel, T. Synaptic Integration in a Model of Cerebellar Granule Cells. Journal Of Neurophysiology, 72(2):999–1009, 1994. Ganguli, S., Huh, D., and Somploinsky, H. Memory traces in dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 105(48):18970–5, dec 2008. Ghose, G. M. and Maunsell, J. Specialized representations in visual cortex: A role for binding? Neuron, 24(1):79–85, 1999. Goldman, M. S. Memory without Feedback in a Neural Network. Neuron, 61(4): 621–634, 2009. Goldman, M. S., Compte, A., and Wang, X.-j. Theoretical and computational neuroscience: Neural integrators: recurrent mechanisms and models. Squire, L.; Albright, T.; Bloom, F, 2007. Gray, C. M., Ko¨nig, P., Engel, A. K., and Singer, W. Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties., 1989. Haykin, S. and Chen, Z. The Cocktail Party Problem. Neural Computation, 17(9): 1875–1902, 2005. Hebb, D. O. The Organization of Behavior. Wiley, New York, 1949. 128 Hennig, M. H. Theoretical models of synaptic short term plasticity. Frontiers in Computational Neuroscience, 7(April):1–10, 2013. Holcman, D. and Tsodyks, M. The emergence of up and down states in cortical networks. PLoS Computational Biology, 2(3):174–181, 2006. Hopfield, J. J. Neural networks and physical systems with emergent collective com- putational abilities. Proceedings of the National Academy of Sciences of the United States of America, 79(April):2554–2558, 1982. Hubel, D. H. and Wiesel, T. N. Receptive Fields, Binocular Interactions and Func- tional Architeture in the Cat’s Visual Cortex. J. Physiol, 160:106–154, 1962. Hubel, D. H., Wiesel, T. Receptive Fields of Single Neurons in the Cat’s Striate Cortex. Physiol, J, 148:574–591, 1959. Itskov, V., Hansel, D., and Tsodyks, M. Short-term facilitation may stabilize parametric working memory trace. Frontiers in Computational Neuroscience, 5 (October):1–19, 2011. Izhikevich, E. M. Dynamical Systems in Neuroscience: The Geometry of Excitability and Burtsing. MIT Press, Cambridge, Mass., 2007. Jensen, O., Kaiser, J., and Lachaux, J. P. Human gamma-frequency oscillations associated with attention and memory. Trends in Neurosciences, 30(7):317–324, 2007. Kaas, J. H., Hackett, T. A., and Tramo, M. J. Auditory processing in primate cerebral cortex. Current Opinion in Neurobiology, 9(2):164–170, 1999. Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psy- chometrika, 23(3), 1958. Kalman, R. E. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1):35, 1960. Kandel, E. R. Cellular Mechanisms of Learning and the Biological Basis of Individu- ality. In Principles of Neural Science, pages 1248–1280. McGraw-Hill Companies, Inc., 2014. Keysers, C. and Perrett, D. I. Demystifying social cognition: A Hebbian perspective. Trends in Cognitive Sciences, 8(11):501–507, 2004. King, P. D., Zylberberg, J., and DeWeese, M. R. Inhibitory Interneurons Decorrelate Excitatory Cells to Drive Sparse Code Formation in a Spiking Model of V1. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33 (13):5475–5485, mar 2013. 129 Ko, H., Hofer, S. B., Pichler, B., Buchanan, K. A. K., Sjo¨stro¨m, P. J., and Mrsic- Flogel, T. D. Functional specificity of local synaptic connections in neocortical networks. Nature, 473(7345):87–91, 2011. Krishnan, L., Elhilali, M., and Shamma, S. Segregating Complex Sound Sources through Temporal Coherence. PLoS computational biology, 10(12):1–10, dec 2014. Kubota, K. and Niki, H. Prefrontal cortical unit activity and delayed alternation performance in monkeys. Journal of neurophysiology, 34(3):337–347, 1971. Lansner, A., Fransen, E., and Sandberg, A. Cell Assembly Dynamics in Detailed and Abstract Attractor Models of Cortical Associative Memory. Theory in Bio- sciences, 122(1):19–36, 2003. Larocque, J. J., Lewis-Peacock, J. a., and Postle, B. R. Multiple neural states of representation in short-term memory? It’s a matter of attention. Frontiers in human neuroscience, 8(January):5, 2014. Lewis-Peacock, J. A., Drysdale, A. T., Oberauer, K., and Postle, B. R. Neural Evi- dence for a Distinction Between Short-Term Memory and the Focus of Attention. Journal of cognitive neuroscience, 24(1):61–79, 2012. Lim, S. and Goldman, M. S. Balanced cortical microcircuitry for maintaining short- term memory. Nature neuroscience, 16(9):1306–1314, 2013. Lisman, J. E. and Idiart, M. A. Storage of 7 +/- 2 short-term memories in oscillatory subcycles., 1995. London Philharmonia. Sound Samples. http://www.philharmonia.co.uk/ explore/sound samples, 2016. Lundqvist, M., Rose, J., Herman, P., Brincat, S. L., Buschman, T. J., and Miller, E. K. Gamma and Beta Bursts Underlie Working Memory. Neuron, 90(1):152– 164, 2016. Machens, C. K., Romo, R., and Brody, C. D. Flexible Control of Mutual Inhibition: A Neural Model of Two-Interval Discrimination. Science, 307(5712):1121–1124, 2005. MacNeil, D. and Eliasmith, C. Fine-tuning and the stability of recurrent neural networks. PLoS ONE, 6(9), 2011. Maex, R. and Steuber, V. The first second: Models of short-term memory traces in the brain. Neural Networks, 22:1105–1112, 2009. Malenka, R. C. Postsynaptic factors control the duration of synaptic enhancement in area CA1 of the hippocampus. Neuron, 6(1):53–60, 1991. Masquelier, T. Neural variability, or lack thereof. Frontiers in computational neu- roscience, 7(February):7, 2013. 130 McDougal, R. A. Excitatory-inhibitory interactions as the basis of working memory. PhD thesis, Ohio State University, 2011. Mejias, J. F. and Torres, J. J. Maximum memory capacity on neural networks with short-term synaptic depression and facilitation. Neural computation, 21(3): 851–71, 2009. Mercury, F. Bohemian Rhapsody. Performed by Queen, 1975. [Video File]. Retrieved from https://www.youtube.com/watch?v=lXZhmvGusfs on 06/10/2016. Miconi, T. and Vanrullen, R. The gamma slideshow: object-based perceptual cycles in a model of the visual cortex. Frontiers in human neuroscience, 4:205, jan 2010. Miller, G. A. The Magical Number Number Seven, Plus of Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review, 63:81–97, 1956. Miller, G. A. MIller, George A. In Lindzey, G., editor, A History of Psychology in Autobiography, page 401. Stanford University Press, 1989. Miller, G. A. and Heise, G. A. The Trill Threshold. Journal of the Acoustical Society of America, 22(5):637–638, 1950. Mongillo, G., Barak, O., and Tsodyks, M. Synaptic theory of working memory. Science (New York, N.Y.), 319:1543–1546, 2008. Movshon, J. A. Reliability of neuronal responses. Neuron, 27(3):412–414, 2000. Murtagh, F. and Herna´ndez-Pajares, M. The Kohonen self-organizing map method: An assessment. Journal of Classification, 12(2):165–190, 1995. Olivers, C. N. L., Peters, J., Houtkamp, R., and Roelfsema, P. R. Different states in visual working memory: When it guides attention and when it does not. Trends in Cognitive Sciences, 15(7):327–334, 2011. Olshausen, B. A. and Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 1996. Olshausen, B. and Field, D. What is the other 85 % of V1 doing? Problems in Systems . . . , pages 1–29, 2004. Orhan, E. The Hopfield Model. Technical report, NYU, 2014. Oswald, J. P., Klug, A., and Park, T. J. Interaural intensity difference processing in auditory midbrain neurons: effects of a transient early inhibitory input. The Journal of neuroscience : the official journal of the Society for Neuroscience, 19 (3):1149–1163, 1999. Pasternak, T. and Greenlee, M. W. Working memory in primate sensory systems. Nature reviews. Neuroscience, 6(2):97–107, 2005. 131 Peters, A., Payne, B. R., and Budd, J. A numerical analysis of the geniculocortical input to striate cortex in the monkey. Cerebral Cortex, 4(3):215–229, 1994. Petrides, M. Dissociable roles of mid-dorsolateral prefrontal and anterior inferotem- poral cortex in visual working memory. J Neurosci, 20(19):7496–7503, 2000. Postle, B. R. Neural Bases of the Short-Term Retention of Visual Information. In Attention & Performance XXV: Mechanisms of Sensory Working Memory, pages 43–58. Elsevier, 2015. Purves, D., Augustine, G. J., Fitzpatrick, D., Hall, W. C., LaMantia, A.-S., and White, L. E. Neuroscience. Sinauer Associates, Inc., Suderland, MA, fifth edition, 2012. Qin, L., Chimoto, S., Sakai, M., Wang, J., and Sato, Y. Comparison Between Offset and Onset Responses of Primary Auditory Cortex. Journal of Neurophysiology, pages 3421–3431, 2007. Qin, M. K. and Oxenham, A. J. Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. The Journal of the Acoustical Society of America, 114(1):446–454, 2003. Regehr, W. G. Short-term presynaptic plasticity. Cold Spring Harbor perspectives in biology, 4(7):a005702, 2012. Riesenhuber, M. and Poggio, T. Hierarchical models of object recognition in cortex. Nature neuroscience, 2(11):1019–25, 1999. Roelfsema, P. R., Engel, A. K., Ko¨nig, P., and Singer, W. Visuomotor integration is associated with zero time-lag synchronization among cortical areas., 1997. Rolls, E. T., Dempere-Marco, L., and Deco, G. Holding Multiple Items in Short Term Memory: A Neural Mechanism. PLoS ONE, 8(4), 2013. Rose, J. E., Gross, N. B., Geisler, C. D., and Hind, J. E. Some neural mechanisms in the inferior colliculus of the cat which may be relevant to localization of a sound source. Journal of neurophysiology, 29(2):288–314, 1966. Roskies, A. L. The binding problem. Neuron, 24(1):7–9, 1999. Rozell, C. J., Johnson, D. H., Baraniuk, R. G., and Olshausen, B. A. Sparse coding via thresholding and local competition in neural circuits. Neural computation, 20 (10):2526–63, oct 2008. Sandamirskaya, Y., Zibner, S. K. U., Schneegans, S., and Scho¨ner, G. Using Dy- namic Field Theory to extend the embodiment stance toward higher cognition. New Ideas in Psychology, 31(3):322–339, 2013. Sandberg, A., Tegne´r, J., and Lansner, A. A working memory model based on fast Hebbian learning. Network, 14(4):789–802, 2003. 132 Schneegans, S. and Scho¨ner, G. Dynamic field theory as a framework for under- standing embodied cognition. Elsevier Inc., 2008. Schnupp, J., Nelken, I., and King, A. Auditory Neuroscience. MIT Press, Cam- bridge, Mass, 2011. Schultz, W. and Dickinson, A. Neuronal Coding of Prediction Errors. Annual review of neuroscience, 23:473–500, 2000. Serre, T., Wolf, L., and Poggio, T. Object Recognition with Features Inspired by Visual Cortex. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2:994–1000, 2005. Seung, H. S. How the brain keeps the eyes still. Proceedings of the National Academy of Sciences of the United States of America, 93(23):13339–13344, 1996. Shafi, M., Zhou, Y., Quintana, J., Chow, C., Fuster, J., and Bodner, M. Variability in neuronal activity in primate cortex during working memory tasks. Neuro- science, 146(3):1082–1108, 2007. Shamma, S. A., Elhilali, M., and Micheyl, C. Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences, 34(3):114–123, 2011. Shamma, S. A., Elhilali, M., Ma, L., Micheyl, C., Oxenham, A. J., Pressnitzer, D., Yin, P., and Xu, Y. Temporal Coherence and the Streaming of Complex Sounds. Adv Exp Med Biol., 787:535–543, 2013. Shao, Y. and Wang, D. Sequential organization of speech in computational auditory scene analysis. Speech Communication, 51(8):657–667, 2009. Shatz, C. The developing brain. Scientific American, 1992. Shen, L. Neural Integration by Short Term Potentiation. Biological Cybernetics, 61: 319–325, 1989. Singer, W. and Gray, C. M. Visual feature integration and the temporal correlation hypothesis. Annual review of neuroscience, 18:555–586, 1995. Smaragdis, P. Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. Technical Report 2, Mistubishi Electric Research Laboratories, 2004. Song, S., Sjostrom, P. J., Reigl, M., Nelson, S., and Chklovskii, D. B. Highly non- random features of synaptic connectivity in local cortical circuits. PLoS Biology, 3(3):0507–0519, 2005. Sreenivasan, K. K., Curtis, C. E., and D’Esposito, M. Revisiting the role of persis- tent neural activity during working memory. Trends in Cognitive Sciences, 18(2): 82–89, 2014. 133 Stokes, M. G. Activity-silent’ working memory in prefrontal cortex: a dynamic coding framework. Trends in Cognitive Sciences, 19(7):394–405, 2015. Stokes, M. G., Kusunoki, M., Sigala, N., Nili, H., Gaffan, D., and Duncan, J. Dynamic coding for cognitive control in prefrontal cortex. Neuron, 78(2):364– 375, 2013. Sugase-Miyamoto, Y., Liu, Z., Wiener, M. C., Optican, L. M., and Richmond, B. J. Short-term memory trace in rapidly adapting synapses of inferior temporal cortex. PLoS computational biology, 4(5):e1000073, 2008. Sussman, E. Integration and segregation in auditory scee analysis. J Acoust Soc Am, 117(3), 2005. Szatma´ry, B. and Izhikevich, E. M. Spike-Timing Theory of Working Memory. PLoS Computational Biology, 6(8):e1000879, 2010. Teki, S., Chait, M., Kumar, S., Shamma, S., and Griffiths, T. D. Segregation of complex acoustic scenes based on temporal coherence. eLife, 2013(2):1–16, 2013. Terman, D. H. and Wang, D. Global competition and local cooperation in a network of neural oscillators. Physica D, 81:148–176, 1995. Tetzlaff, C., Kolodziejski, C., Markelic, I., and Wo¨rgo¨tter, F. Time scales of memory, learning, and plasticity. Biological Cybernetics, 106(11-12):715–726, 2012. Theunissen, F. E. and Elie, J. E. Neural processing of natural sounds. Nature reviews. Neuroscience, 15(6):355–66, 2014. Tsodyks, M., Pawelzik, K., and Markram, H. Neural Networks with Dynamic Synapses. Neural Computation, 10, 1998. Varela, F. J., Thompson, E., and Rosch, E. The Embodied Mind: Cognitive Science and Human Experience. MIT Press, 1991. Venkatasubramanian, S. Clustering on Streams. In Encyclopedia of Database Sys- tems. Springer, 2009. Verduzco-Flores, S., Bodner, M., Ermentrout, G. B., Fuster, J. M., and Zhou, Y. Working memory cells’ behavior may be explained by cross-regional networks with synaptic facilitation. PloS one, 4(8):e6399, 2009. Vogels, T. P. and Abbott, L. F. Gating multiple signals through detailed balance of excitation and inhibition in spiking networks. Nature neuroscience, 12(4):483–91, apr 2009. Von der Malsburg, C. and Schneider, W. A neural cocktail-party processor. Biolog- ical Cybernetics, 54(1):29–40, 1986. 134 Von der Malsburg, C. The correlation theory of brain function. Technical report, Max-Planck Institute, 1981. Wang, D. Primitive auditory segregation based on oscillatory correlation. Cognitive Science, 20(3):409–456, 1996. Wang, D. and Chang, P. An oscillatory correlation model of auditory streaming. Cognitive Neurodynamics, 2(1):7–19, 2008. Wang, D. and Terman, D. H. Local Excitatory Global Inhibitory Oscillator Net- works. IEEE Transactions on Neural Networks, 6(1), 1995. Wang, D. L. and Brown, G. J. Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, 10(3):684–697, 1999. Wang, X.-j., Introduction, I., Synchronization, A., Resonance, B., Subthreshold, C., Rhythms, V. S., Irregular, W., Activity, N., Communication, L.-d., and Learn- ing, C. Neurophysiological and Computational Principles of Cortical Rhythms in Cognition. Physiol Rev, 90:1195–1268, 2010. Watanabe, K. and Funahashi, S. Prefrontal delay-period activity reflects the decision process of a saccade direction during a free-choice ODR task. Cerebral Cortex, 17:i88–i100, 2007. Watt, A. J. and Desai, N. S. Homeostatic Plasticity and STDP: Keeping a Neuron’s Cool in a Fluctuating World. Frontiers in synaptic neuroscience, 2(June):5, jan 2010. Wei, Z., Wang, X.-J., and Wang, D.-H. From Distributed Resources to Limited Slots in Multiple-Item Working Memory: A Spiking Network Model with Nor- malization. The Journal of Neuroscience, 32(33):11228–11240, 2012. Weinberger, N. M. Plasticity in the Primary Auditory Cortex, Not What You Think it is: Implications for Basic and Clinical Auditory Neuroscience. Otolaryngol (Sunnyvale), 4(164):1–19, 2012. Wessel, D. Timbre Space as a Musical Control Structure. Computer Music Journal, 3(2):45–52, 1979. Wimmer, K., Nykamp, D. Q., Constantinidis, C., and Compte, A. Bump attractor dynamics in prefrontal cortex explains behavioral precision in spatial working memory. Nature neuroscience, 17(3):431–9, 2014. Worden, F. G. Hearing and the Neural Detection of Acoustic Patterns. Behavioral Science, 16(1), 1971. Wrigley, S. N. and Brown, G. J. A Computational Model of Auditory Selective Attention. IEEE Transactions on Neural Networks, 15(5):1151–1163, 2004. 135 Wurtz, R. H. Recounting the impact of Hubel and Wiesel. The Journal of physiology, 587(Pt 12):2817–2823, 2009. Zarahn, E., Aguirre, G., and Esposito, M. D. A Trial-Based Experimental Design for fMRI. Neuroimage, 6:122–138, 1997. Zucker, R. S. and Regehr, W. G. Short-Term Synaptic Plasticity. Annual Review of Physiology, 64(1):355–405, 2002. Zylberberg, J., Murphy, J. T., and DeWeese, M. R. A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of V1 simple cell receptive fields. PLoS computational biology, 7(10):e1002250, oct 2011. 136