ABSTRACT Title of dissertation: BEE-ING THERE: THE SYSTEMATICITY OF HONEYBEE NAVIGATION SUPPORTS A CLASSICAL THEORY OF HONEYBEE COGNITION Michael J. Tetzlaff, Doctor of Philosophy, 2006 Dissertation directed by: Professor Georges Rey Department of Philosophy The Classical theory of cognition proposes that there are cognitive processes that are computations defined over syntactically specified representations, ?sen- tences? in a language of thought, for which the representational-constituency re- lation is concatenative. The main rival to Classicism is (Nonimplementational, or Radical, Distributed) Connectionism. It proposes that cognitive processes are computations defined over syntactically simple, distributed representions, for which the constituency relation is nonconcatenative. I argue that Connectionism, unlike Classicism, fails to provide an adequate theoretical framework for ex- plaining systematically related cognitive capacities and that this is due to its nec- essary reliance on nonconcatenative constituency. There appears to be an interesting divergence of attitude among philoso- phers of psychology and cognitive scientists regarding Classicism?s language of thought hypothesis. On one extreme, there are those who argue that only hu- mans are likely to possess a language of thought (or that we at least have no evi- dence to the contrary). On the other extreme, there are those who argue that dis- tinctively human thinking is not likely to be explicable in terms of a language of thought. They point to features of human cognition which they claim strongly support the hypothesis that human cognitive-state transition functions are com- putationally intractable. This implicitly suggests that the cognitive processes of simpler, nonhuman minds might be computationally tractable and thus amena- ble to Classical computational explanation. I review much of the recent literature on honeybee navigation. I argue that many capacities of honeybees to acquire various sorts of navigational informa- tion do in fact exhibit systematicity. That conclusion, together with the correct- ness of the view that Classicism provides a better theoretical framework than does Connectionism for explaining the systematicity of the relevant cognitive ca- pacities, gives one reason in support of the claim that sophisticated navigators like honeybees have a kind of language of thought. At the very least, it provides one reason in support of the claim that the constituency relation for the mental representations of such navigators is concatenative, not nonconcatenative. BEE-ING THERE: THE SYSTEMATICITY OF HONEYBEE NAVIGATION SUPPORTS A CLASSICAL THEORY OF HONEYBEE COGNITION by Michael J. Tetzlaff Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2006 Advisory Committee: Professor Georges Rey, Chair Professor Peter Carruthers Professor Christopher Cherniak Professor Gary Marcus ?Copyright by Michael J. Tetzlaff 2006 TABLE OF CONTENTS .....................................................................................................................List of Tables iv ....................................................................................................................List of Figures v Chapter 1: Introduction: Systematicity, Navigation, and Cognitive Architecture..... 1 ............................................................................................................1.1 The Issues 1 ...................................1.2 Classical and Connectionist Cognitive Architectures 4 .......................................................................................................1.3 Systematicity 6 ...............................................................................................1.4 Why Navigation? 7 .............................................................................................1.5 Why Honeybees? 10 ............................................................................................1.6 The Terrain Ahead 11 ........................................Chapter 2: Two Candidate Explanations of Systematicity 14 ..................................................2.1 The Classical Explanation of Systematicity 16 .........................2.2 Smolensky?s Connectionist Explanation of Systematicity 23 2.3 Summary of the Key Features of the Two Explanations ...............................................................................of Systematicity 32 ........................................................................Chapter 3: Sytematicity and Causation 36 .........................................................................3.1 Vector Constituent Causation 37 .........................................................................................3.1.1 Superposition 38 ........................................3.1.2 Criteria for Existence and Causal Efficacy 42 ....................................................................................3.1.3 Vector Similarity 52 .........................................3.1.4 Vector Constituents as Causal Precursors 55 .........................3.1.5 Causal Efficacy of Information about Constituents 58 ..................................................................................Chapter 4: Acausal Explanation? 70 ..................................4.1 An Adequate Explanation, but Not of Systematicity 72 ....................................................................................4.2 Moral of the Argument 81 ...................................Chapter 5: Structure Sensitivity and Principled Explanation 83 ..................................5.1 Prediction versus Accommodation of Systematicity 84 ................................................5.2 The Nonarbitrariness of Classical Processes 91 ...............................5.3 Unprincipledness is Not Structured-Domain Relative 93 5.3.1 The Relationship between Content Structure ......................................................and Representation Structure 100 5.3.2 Unprincipledness Rests with Vector Constituency, .................................................................................Not Encoding 109 ...................................................................5.4 Representations for Navigation 113 ii ..............................Chapter 6: Structure of the Honeybee?s Navigational Domain 115 ...........................................................................................6.1 Simple Structures 116 ......................................................6.1.1 Distance and Direction Relations 117 ................................................6.1.2 Solar Compass and Solar Ephemeris 119 ..................................6.1.3 Updating Previously Learned Relationships 123 ....................................6.2 Complex Structures: Sequences, Rules, and Maps 130 .................................................................................6.2.1 Vector Sequences 131 .....................................................................................6.2.2 Maze Learning 143 ...............................................6.2.2.1 Configurations and Sequences 144 ...........................................................................................6.2.2.2 Rules 155 ............................................6.2.3 Novel Shortcuts and Vector Averaging 161 ...................................................6.2.3.1 Novel Shortcuts to the Hive 161 ............................6.2.3.2 Explanations of Novel-Shortcut Behavior 172 ...................................................................6.2.4 A Kind of Cognitive Map 190 ....................Chapter 7: The Systematicity of Honeybee Navigational Capacities 197 .............................7.1 Systematicity of Information Acquired by Honeybees 198 .....................................................7.1.1 Attributions of Content to Insects 200 ........................................................7.1.2 Some Honeybee Systematicities 205 ......................................7.2 Weak Systematicity and the Tracking Argument 213 .....................................................................7.2.1 The Tracking Argument 215 .............................................7.3 Systematicity and Semantic Structural Roles 223 ..................................................7.3.1 Distinguishing Systematic Variants 224 .............................................................................7.3.2 ?What? and ?Where? 226 ..............................................................................................7.3.3 Indexicals 227 ......7.4 Operations on Semantic Constituents of Complex Representations 230 7.5 Algebraic Rules: An Introduction to Modelling Issues............................. 237 7.5.1 Algebraic Rules and Free Generalization......................................... 239 7.5.2 Free Generalization in Bees................................................................. 242 ............................................................................7.6 Summary and Conclusion 249 Appendix A: A Limited Representational System which is both Map- .......................................................................and Language-Like 255 ...........................................................................A.1 Lexicon for Map Legend L 255 ....................................................................................................A.2 Syntax for L 256 ..............................................................................................A.3 Semantics for L 256 ..............................................................................................A.3.1 L-Models 256 ..........................................................A.3.2 Truth Conditions for wffs of L 257 .......................................................................................................................References 258 iii LIST OF TABLES Table 6.1: Stimulus pairs in Giurfa et al.?s (2001) delayed matching-to- ...........sample and delayed non-matching-to-sample experiments 159 ..........Table 6.2: Courses set in Menzel et al.?s (1998) displacement experiments 171 Table 6.3: Comparison of explanations of Menzel et al.?s (1998) ........................................................displacement experiment results 190 iv LIST OF FIGURES .................................................................................Figure 2.1: Vector representations 24 ......................................Figure 2.2: Vector representations are processed as wholes 29 Figure 6.1: Course configurations in Collett (T. S.) et al.?s (1993) ...............................................................vector sequence experiments 133 Figure 6.2: Train and test configurations in Collett (M.) et al.?s (2002) .............................................................................channel experiments 137 ........Figure 6.3: Mazes used in Zhang et al.?s (1996) maze learning experiments 144 Figure 6.4: Sample maze configurations in Collett (T. S.) et al.?s (1993) ...............................................visual-sequence learning experiments 148 Figure 6.5: Maze configurations for Collett (T. S.) et al.?s (1993) ............................?blue?single exit? sequence learning experiment 151 ...............................Figure 6.6: Mazes learned with color cues (Zhang et al. 1996) 156 Figure 6.7: Y-maze used in a delayed matching-to-sample experiment .................................................................................(Giurfa et al. 2001) 158 Figure 6.8: Map of the area chosen by Menzel et al. (1998) for their ...................................................................displacement experiments 162 .....................Figure 6.9: Distributions of vanishing bearings (Menzel et al. 1998) 164 Figure 6.10: Histograms of vanishing bearing distributions in .............................Menzel et al.?s (1998) displacement experiments 180 ..........Figure 7.1: Novel metric shortcuts contrasted with novel complex routes 231 Figure 7.2: Connectivity structure of Dickinson and Dyer?s (1996) .............................................model of solar ephemeris learning 243 v Chapter 1 Introduction: Systematicity, Navigation, and Cognitive Architecture 1.1??The Issues The Classical theory of cognition proposes that there are cognitive processes that are computations defined over syntactically specified representations, ?sen- tences? in a language of thought. Classicism provides a theoretical framework for explaining several features of cognition. 1 Some of these are the productivity of thought, the compositional unity of particular thoughts, inferential relations among thoughts, structure-sensitive errors in reasoning, the multiplicity of psy- chological ?attitudes? that may be taken toward particular thoughts (we can be- lieve that P, desire that P, etc.), the causal relations that obtain between thoughts in cognitive processes, and the systematicity of thought. Theories of cognitive ar- chitecture must be evaluated in light of how well they explain (or explain away) those and other properties of cognition. My focus is on the systematicity of cog- nitive capacities. The main rival to Classicism is Nonimplementational (or Radical) Distrib- uted Connectionism (hereafter, simply Connectionism). As we will see, Classi- 1 1 Rey 1997. cism affords a relatively straightforward explanation of the systematicity of thought. And, though other approaches to cognitive architecture might turn out to be viable, 2 the only worked-out alternative to the Classical explanation of sys- tematicity is a Connectionist one. Thus, one of the two principal questions I ad- dress is, Which theoretical framework, Classicism or Connectionism, provides the best explanation of systematicity? I argue that Connectionism, unlike Classi- cism, fails to provide an adequate framework for explaining systematicity. There appears to be an interesting divergence of attitude among philoso- phers of psychology and cognitive scientists regarding the Classicist?s language of thought hypothesis. On one extreme, there are those who argue that only hu- mans are likely to possess a language of thought (or that we at least have no evi- dence to the contrary). Povinelli and colleagues 3 favor the view that certain hu- man cognitive capacities require a language of thought. Some of the capacities they include in that category are the capacities to represent unobservables and counterfactual situations, to distinguish individuals and kinds, to learn new rules that operate on instances of variables, and to use productive and systematic symbolic systems. However, they argue that, in many cases, there is evidence 2 2 See Beer 2000; Haugeland 1997; and van Gelder 1995, 1998. 3 Penn and Povinelli (submitted), Povinelli and Bering 2002, Povinelli et al. 2000, Povinelli and Giambrone 2001, Povinelli and Vonk 2003. which suggests that nonhumans lack such capacities, while in other cases, there is a lack of evidence that nonhumans have such capacities. On the other extreme, it has been argued that distinctively human thinking is not likely to be explicable in terms of a Classical language of thought. For ex- ample, Horgan and Tienson 4 point to features of human cognition (its open- endedness, the potential relevance of anything to anything, and the holistic char- acter of relevance) which they claim strongly support the hypothesis that human cognitive-state transition functions are computationally intractable. This implic- itly suggests that the cognitive processes of simpler, nonhuman minds might be computationally tractable and thus amenable to Classical computational expla- nation. Thus, the second of the two principal questions I address is, Do the cognitive capacities of any nonhuman organisms exhibit systematicity? I argue that certain navigational capacities of honeybees do in fact exhibit systematicity. That conclu- sion, together with the correctness of the view that Classicism provides a better theoretical framework for explaining the systematicity of the relevant naviga- tional capacities than does Connectionism, gives one reason in support of the claim that sophisticated navigators like honeybees have a kind of language of 3 4 Horgan and Tienson 1996. thought (or, at the very least, a system of mental representation for which the constituency relation is Classical in character [? 2.1]). 1.2??Classical and Connectionist Cognitive Architectures The most important tenets (for my purposes) of Classicism and Connectionism will be spelled out in more detail in Chapter 2. Here I provide a brief sketch of how those theories answer two questions: What are the relations among mental representations as vehicles of content? What roles do those vehicles play in cogni- tive processes? The Classicist holds that the relations among mental representations include both causal relations and constituency relations. Certain mental representations are complex, in the sense that they have constituents which are themselves repre- sentations. Those constituents play causal roles in cognitive processes. That is, cognitive processes are causally sensitive to the constituent structure of mental representations. Moreover, mental representations may share constituents. In other words, two different, complex-representation tokens may share constituent tokens of the same type. For purposes of illustration, we can think of Classical mental representations as being analogous, in certain respects, to formulae of propositional logic. Thus, suppose that a cognitive system?s entokening P ? Q causes it to entoken 4 ~Q???~P. The causal mechanisms responsible for that transition, on the Classical story, are sensitive to the constituent structure, or syntax, of P ? Q, ~P, and ~Q. The transition will have been governed by rules that operate on the constituents of those representations. The Connectionist, unlike the Classicist, holds that the only relations among mental representations are causal relations (though there are constituency rela- tions among the contents of mental representations). The Connectionist hypothe- sizes that the mind is a kind of network of interconnected nodes. It?s structure, at the cognitive level, is similar to the structure of the brain at the level of neurons and their interconnections. Mental representations are not formulaic; rather, they are patterns of activity levels across sets of nodes. These representational patterns do not have parts that are themselves representations. They are, in that sense, simple rather than complex (though their contents may be complex). Cognitive processes are transformations of representational patterns into other representa- tional patterns. Suppose, then, that a cognitive system?s entokening of the pattern <1, 2, 3, 4> causes it to entoken the pattern <5, 6, 7, 8>. (These representations may have the same respective contents as P ? Q and ~Q ? ~P; those contents could be, say, [If there?s smoke, there?s fire] and [If there?s no fire, there?s no smoke]. 5 ) The causal 5 5 I adopt the convention of using boldface square brackets to indicate contents. mechanisms responsible for that transition, on the Connectionist story, are sensi- tive to the activity levels of the individual nodes. The strengths of the connec- tions between nodes determines what activity pattern becomes entokened as the result of the entokening of another activity pattern. There are no operations de- fined over syntactically specified entities. For the Connectionist, representations are distributed not only in the sense that they are realized by the activity levels of multiple nodes but also in the sense that a particular set of activity levels may realize many representations at once. This will be the case when a pattern of activity that is a representation is the sum, or superimposition, of multiple patterns that are themselves representations. (Similarly, cognitive processes are distributed in the sense that one and the same set of connection strengths may realize multiple operations at once [? 3.1.2].) As we will see, the idea of representations in superposition plays an important role in the Connectionist explanation of systematicity. 1.3 Systematicity There are a number of possible varieties of systematicity. Linguistic capacities may become more systematic over the course of development. 6 Also, different kinds of cognitive capacities might be systematically related in different ways. 6 6 Hadley 1994. For now, a general characterization will do (I argue in Chapter 7 that honeybee navigational capacities exhibit two specific kinds of systematicity). The central idea is that certain, relatively specific cognitive capacities come in clumps. That is, if a mind has certain cognitive capacities, it thereby?by nomological necessi- ty?also has certain other cognitive capacities. Common examples of systemati- cally related capacities are various linguistic ones. Thus, if a person has the ca- pacity to understand the sentence, ?John loves Mary,? then that person thereby also has the capacity to understand the sentence, ?Mary loves John.? As I?ll emphasize in the next chapter, systematicity has an important seman- tic aspect. That this is so is tied to the fact that cognitive capacities are capacities to acquire, store, and process information. An explanation of systematicity must make clear how causal cognitive processes preserve the appropriate semantic relations among mental representations. 1.4 Why Navigation? Patricia Churchland once pointed out that ?if you root yourself to the ground, you can afford to be stupid.? 7 On the other side of the coin, if your survival de- pends on long foraging trips to perhaps unfamiliar territory far from home, then you can?t afford to be stupid. For the need to navigate over long distances and to 7 7 Churchland 1986, p.13. find your way back to safety brings with it the distinct possibility that you will become lost. So the abilities to plan your trip in advance and to think about what to do when in fact you do become lost would be very valuable assets. It?s extremely likely that some navigational capacities do not require cogni- tive capacities. For example, there supposedly is no need to posit thought proc- esses or memories in order to explain chemotaxis, phototaxis, or magnetotaxis 8 in bacteria. Likewise, although ants have the ability to home toward remembered landmarks, it is plausible that such beacon homing can be explained in terms of recognition?triggered-response mechanisms. 9 On the other hand, some navigational capacities would seem to require the capacity to represent various places of interest and certain relations (topological, metric, etc.) among them, 10 as well as the capacity to make inferences involving those representations. Perhaps the clearest example is the capacity to take novel shortcuts. Thus, suppose an organism has learned how to get from Place A to Place B and how to get from Place C to Place A. Suppose further that the organ- ism is unfamiliar with the territory between Places B and C, and that no percep- 8 8 Blakemore and Frankel 1981. 9 Gallistel (1998), however, argues that the image matching mechanism thought by many to un- derly beacon (landmark) homing in ants requires symbolic computation. 10 Although I here speak of representing places and relations, I mean to leave open the issue of what contents and extensions such representations actually have, at least in the case of nonhu- man animals (see below, ? 7.1.1). tible features associated with Place B (or with known routes to or from it) are detectable by the organism from Place C. Nonetheless, when at Place C, it takes the direct route from Place C to Place B. Assuming that the organism?s finding its way to Place B was not accidental, it must have acquired information about the directed distances between Places A and B and between Places A and C, and it must have used that information to infer the direct route. We know of no other way an organism (or device) could accomplish such a task. Navigation in humans and other animals, including invertebrates, has been studied extensively. 11 Despite this, philosophers of mind have devoted relatively little attention to this body of work, certainly much less attention than they have devoted to natural language. 12 In particular, recent philosophical discussions of systematicity have focused on linguistic capacities and sentence parsing. 13 A col- league once suggested that if the philosophical focus had been on navigation rather than language, the language of thought hypothesis would not have been nearly so influential. I hope to convince you that that suggestion is dubious. 9 11 See, for example, Healy 1998 and Golledge 1999. 12 Two noteworthy exceptions are Carruthers 2005 and Robinson 1995. 13 Cummins et al. 2001, Hadley 1994, Niklasson and van Gelder 1994. 1.5 Why Honeybees? The honeybee is a superb model organism for the study of learning and memory. Also, its neurophysiology is being investigated using both electrical and optical techniques. 14 It has ?only? about 960,000 neurons, which makes the goal of at- taining a comprehensive understanding of its neuroanatomy relatively practical. The evidence is growing for the idea that the honeybee has genuinely cogni- tive capacities. 15 This is, it is becoming increasingly difficult to explain honey- bees? behavior in nonrepresentational terms. For example, they exhibit multiple stages of memory consolidation; 16 their learning mechanisms go well beyond those of simple association; and they can generalize well beyond the information present in the stimuli used for training. Some researchers have come to advocate the view that honeybees have goal-specific expectations 17 (cf. ? 6.1.3). Especially pertinent is the growing body of evidence that strongly supports the hypothesis that honeybees are capable of taking novel shortcuts (?? 6.2.3.1, 6.2.3.2, 6.2.4). 10 14 Joerges et al. 1997, Menzel and M?ller 1996. 15 Menzel and Giurfa 2001, Menzel et al. 2000b. 16 Menzel 1999. 17 Menzel et al. 1996. 1.6 The Terrain Ahead Chapter 2 revisits the Classical explanation of systematicity and Smolensky?s Connectionist explanation. 18 Although these explanations are familiar to many philosophers and cognitive scientists, it will be useful to review them in detail. I focus on Smolensky?s explanation, since it is the most-often discussed explana- tion in the literature, and it contains the essentials of any adequate Connectionist explanation. Chapter 3 examines the role of representational constituents in the Classical and the Connectionist explanations. As we have seen, Classicism attributes causal roles to the constituents of complex representations. If a Classical repre- sentation is tokened, so too must be its constituents, and they will thus be avail- able to play causal roles in mental processes. There is still much confusion in the literature concerning whether the Connectionist explanation attributes causal efficacy, in cognitive processes, to representational constituents. I argue that it does not?it does not attribute to such constituents causal roles in mental opera- tions on the representations of which they are constituents. In that sense, the Connectionist explanation is not a causal one. Chapters 4 and 5 raise and defend arguments for the claim that we have strong (though defeasible) reasons to prefer the Classical explanation of syste- 11 18 Many of the key elements of these explanations are developed in articles collected in MacDon- ald and MacDonald 1995. maticity over the Connectionist one. I argue in Chapter 4 that while there is a sense in which the Connectionist explanation is an adequate one, as an ?acausal? explanation, it is not adequate as an acausal explanation of systematicity. At best, it is an adequate explanation of how networks can be rigged so as to exhibit the systematicities of which Classical architectures, by their very nature, are capable. Combining the lessons from Chapters 3 and 4, I conclude that since the Connec- tionist account is neither a causal explanation of systematicity nor an acausal ex- planation of systematicity, it is not really an explanation of systematicity at all. I argue in Chapter 5 that the Connectionist explanation is unprincipled in that it appeals to cognitive processes that are arbitrary with respect to Connec- tionism. The explanation will be shown to have the same form as certain scien- tific explanations which are clearly unprincipled. The central point is that Classi- cal cognitive systems exhibit systematicity ?for free,? as it were (by nomological necessity). The systematicity of Classical systems is a product of Classical cogni- tive architecture alone. If a Classical system doesn?t exhibit systematicity, that will have to be because it has been specifically designed out of the system. On the other hand, Connectionist cognitive architectures can just as easily be non- systematic as systematic. For such architectures, systematicity has to be specifi- cally designed in. (An important part of my argument is a response to an attempt 12 by Cummins and colleagues 19 to shift the issue from systematic relations among thoughts or items of information to law-like psychological effects of acquiring knowledge of various structured domains. As we?ll see, if that shift is warranted, it becomes a bit (but just a bit) easier to argue that the Classical explanation is just as unprincipled as the Connectionist account.) Chapter 6 is a review of much of the literature on honeybee navigation. I argue that some of the navigational abilities of bees require the learning and storage of semantically complex information. Some, in addition, require learning by means of combining new and previously acquired information in novel ways. Finally, in Chapter 7, I argue that various capacities of honeybees to acquire information relevant to their navigational tasks exhibit certain systematicities. I conclude by proposing that a complete account of honeybee navigational capaci- ties will be one that posits cognitive processes that are computations defined over syntactically specified representations. At the very least, such an account will be one that posits computations defined over configurationally complex rep- resentations. Either way, the account will not be a Connectionist one. 13 19 Cummins 1996 and Cummins et al. 2001. Chapter 2 Two Candidate Explanations of Systematicity A view widely held among cognitive scientists 1 is that human thought is system- atic. Roughly, the idea is that our capacity to think certain thoughts is intrinsi- cally related to our capacity to think certain other thoughts. For example, anyone who is able to think that Seabiscuit was a better racehorse than War Admiral is also able to think that War Admiral was a better racehorse than Seabiscuit. Any- one who can think that there are black cats and brown dogs can also think that there are black dogs and brown cats. There are many ways to more precisely specify the nature of systematicity. 2 For present purposes, we may consider two structurally complex thoughts to be systematically related just in case they have the same logical and representational constituents and are formal permutations of each other. Thus, whereas the thought that Fa ? Gb is a systematic variant of the thought that Ga ? Fb, this is true neither of the thought that Fa ? Hb nor the thought that ~ (Fa ? Gb). 14 1 In addition to the researchers who contributed to the explanations of systematicity presented in this chapter, some others who (at least implicitly) accept that human thought is systematic are Anderson 1995; Barsalou 1992, 1993; Block 1995; Butler 1991; Carruthers 2005; Hadley 1994, 1997; Horgan and Tienson 1996; Hummel and Holyoak 2001; Marcus 2001; Niklasson and van Gelder 1994; Phillips 1998; Phillips and Halford 1997; Pinker 1997; and Sterelny 1990. 2 See Hadley 1994, McLaughlin 1993, and Niklasson and van Gelder 1994. There are two aspects of systematicity particularly important to account for. First, systematicity is supposedly a matter of psychological law. Anyone who is able to think the thought T is thereby also able to think systematic variants of T. Nature, it seems, packages capacities to think various thoughts in bundles. Sec- ond, systematicity has a semantic aspect: the semantic relations among system- atically related thoughts are nonarbitrary. For example, the contents [brown], [black], [cat], and [dog] contribute to the content of both the thought that there are black cats and brown dogs and the thought that there are black dogs and brown cats. A natural place to look for explanations of systematicity, its lawfulness, and its semantic character are theories of cognitive architecture. Fodor and others 3 (hereafter, Fodor) have promoted an explanation that appeals to Classical cogni- tive architecture. Smolensky 4 has offered an explanation that appeals to one type of Connectionist cognitive architecture. The assumptions those explanations have in common include the following (note that the notion of constituency appealed to here is a very broad one, one that allows for the possibility that the constituency relation is an abstract, formal relation, rather than some sort of part ?whole relation): 15 3 Fodor 1998, Fodor and McLaughlin 1995, Fodor and Pylyshyn 1995, and McLaughlin 1993. 4 Smolensky 1995a?c. Representationalism??Thinking that P requires having a mental representa- tion that has the content [P]. Complexity of mental representations??Some mental representations are com- plex in the sense that they have mental representations as constituents. Structure-sensitive processing? ?Mental processes are sensitive to the con- stituent structure of mental representations. Compositionality for mental representations??The content of some mental rep- resentations is determined by the contents of their constituents and by their constituent structure. 5 But Fodor?s Classical explanation and Smolensky?s Connectionist explanation rely on different views about the nature of mental representations, mental proc- essing, and the constituency relation for mental representations. 2.1??The Classical Explanation of Systematicity Let?s begin with Fodor?s Classical explanation of systematicity. For the purpose of understanding his account, it useful to see that he endeavors to explain the systematicity of thought in much the way one might explain the systematicity (what there is of it) present in natural language. For example, anyone who can understand the sentence ?Andy loves Betty? is bound to be able to understand the sentence ?Betty loves Andy.? A plausible explanation of this appeals to these facts: (1) the two sentences have the words ?Andy,? ?loves,? and ?Betty? as con- 16 5 This notion of compositionality, as I intend it to be understood, is weaker than the linguistic notion, since what is meant by ?constituent? is left open. stituents; (2) those constituents have the respective contents [Andy], [loves], and [Betty]; and (3) the two sentences have the same syntactic structure. Furthermore, understanding them requires understanding what their syntactic structures and their constituents contribute to their contents. But if all this is true, then it looks like what it takes to understand one of the sentences is just what it takes to un- derstand the other. Roughly, what explains the systematicity present in natural language is that the requirements for understanding systematically related sen- tences are the same. What is necessary for understanding ?Andy loves Betty? is necessary and normally sufficient for understanding ?Betty loves Andy.? A useful way to bring the key features of the Classical explanation into relief is to first suppose that John is able to think that Andy loves Betty. We may then spell out in detail how Classical hypotheses about mental representation and processing, together with that supposition, explain how John is thereby also able to think that Betty loves Andy. The first step of the Classical explanation is a hypothesis about the nature of propositional attitudes, such as believing that P, desiring that P, and so on. On the Classical view, to have a certain sort of occurrent propositional attitude toward a thought content is to stand in a specific kind of computational relation to a mental-representation token with that content. For example, for a to occurrently judge that C is for a to have entokened within his cognitive system a representa- 17 tion both having the content [C] and playing the computational role of a judg- ment. Clearly, then, on the Classical account of propositional attitudes, John is able to think that Andy loves Betty only if he can entoken a mental representa- tion with the content [Andy loves Betty]. Let?s say that a token of a mental repre- sentation with that content is a token of ?. 6 (For ease of exposition in what fol- lows, I?ll generally put aside type?token subtleties.) The Classical view hypothesizes that some mental representations are com- plex, in the sense that they have representations as constituents. Furthermore, the Classicist proposes that the structure of some complex mental representations is governed by a combinatorial syntax. This means that certain mental representa- tions are of certain formal types (individual constants, variables, etc.) and that they combine to form more complex representations according to syntactic rules. Thus, the Classicist proposes that ? is a complex, syntactically structured repre- sentation, formally much like a well-formed formula in an artificial language such as first-order predicate logic. Indeed, ? is part of a system of mental repre- sentation, ?Mentalese,? which is literally a language of thought. 18 6 We could call them ?ANDY LOVES BETTY? representations. But at this point in the exposition, that label might be misleading, since it would prematurely suggest that they have language-like constituent structure. The Classical explanation proceeds by hypothesizing that mental representa- tions have language-like constituent structure and then showing that that hypothesis plays a central role in a good explanation of systematicity. To refer to the representations in question as ?ANDY LOVES BETTY? representations might make the Classicist?s hypothesis seem trivial or question begging (cf. Cummins et al. 2001), when in fact it is neither. In what sense the structure of mental representations is language-like, on the Classical view, is explained below. What, then, are ??s constituents? The specific kinds of constituents that mental representations have is a point of contention among Classicists. But the Classical explanation of systematicity doesn?t depend on any particular stance on that issue. The important point is that whatever are ??s constituents, they stand in structural relations governed by syntactic rules. So, for expository purposes, we can keep the discussion at an intuitive level. Let?s assume, then, that ??s content, [Andy loves Betty], is composed of the contents [Andy], [loves], and [Betty]. 7 Further, since the constituents of a repre- sentation are themselves representations, let?s suppose that ? has three constitu- ents, each having one of those three contents. Call the constituents of ? which have those contents a, L, and b, respectively, 8 where a and b are individual con- stants and L is a 2-place predicate. Now, Mentalese representational constituency is a co-tokening relation: rep- resentation R is a constituent of representation R* just in case it is metaphysically necessary that whenever R* is tokened, so is R. 9 Call this sort of constituency 19 7 Again, this supposition is for expository purposes. As McLaughlin (1993) notes, Classicism is not committed to the view that the constituents of a thought content stand in one-to-one corre- spondence with the words in a public-language sentence that may be used to express it. 8 The constituents a, L, and b themselves might be either simple or complex. The Classical account of systematicity does not and need not take a stand on this issue. 9 This is explicit in Fodor and McLaughlin 1995, p. 201; see also Fodor 1998. van Gelder (1990) makes it clear that concatenative constituency is a necessary feature of complex Classical repre- sentations. According to Classicism, the mind/brain is a syntactically driven physical system that exhibits semantically coherent behavior. This requires that mental processes are causally sensitive to the syntactic structure of mental representations, which in turn requires that their syntactic constituents are physically entokened. ?concatenative? constituency. Clear examples of representations with concatena- tive constituency are representationally complex written sentences. The word ?Andy? is concatenative constituent of the sentence ?Andy loves Betty,? since the latter cannot be tokened unless the former is tokened. From the Classical characterization of the constituency relation, and given that a, L, and b are ??s constituents, it follows that tokening ? requires tokening a, L, and b. Furthermore, John is able to stand in a computational relation to ? only if his cognitive system can token ?. Hence, John is able to stand in a computa- tional relation to ? only if his cognitive system can token a, L, and b. The Classicist?s story so far is that John is able to think that Andy loves Betty only if his cognitive system can token a, L, and b. What the Classicist still needs to explain is how John?s cognitive system can token a, L, and b only if he can think that Betty loves Andy. The explanation proceeds by appealing to the Classical account of mental processes. That account includes the hypothesis that some mental processes have representational constituents in their domains and are causally sensitive to syntactic structure. Thus, the Classicist claims that there are mental processes that can operate on ??s constituents so as to construct mental representations which have the same syntactic form as ?, the very same constitu- ents as ?, but a different arrangement of those constituents. If there are mental processes that can construct ? by, as it were, completing the mental predicate 20 ?_L_? with ?a? in the first slot and ?b? in the second, then there are mental processes that can construct other mental representations by completing the same predicate with ?b? in the first slot and ?a? in the second. So, on the Classical view, if John?s cognitive system is capable of tokening a, L, and b (and aLb representations), then his cognitive system is also capable of tokening bLa representations. What remains to be explained is how John can token bLa representations only if he can think that Betty loves Andy. That is, there is still the question of the content of bLa. The Classicist addresses this question by hypothesizing that the semantics for mental representations is compositional: the content of a complex mental representation is determined by its syntactic structure together with the contents of its constituents, which are context independent. On this hypothesis, ? has the content [Andy loves Betty] because, first, its constituents, a, L, and b, have the contents [Andy], [loves], and [Betty], respectively, and second, it has the syntactic form xRy, where x = a, R = L, and y = b. Likewise, bLa has the content [Betty loves Andy] because, first, its constituents, a, L, and b, have the contents they do, and second, it has the form xRy, where x = b, R = L, and y = a. Therefore, John can token bLa representations only if he is able to think that Betty loves Andy. This completes the explanatory chain from the supposition that John can think that Andy loves Betty to the result that he can think that Betty loves Andy. 21 Note that the Classical account explains why the semantic relations among systematically related thoughts are nonarbitrary. Systematically related mental representations share constituents, and those constituents contribute the same contents to the content of the relevant mental representations. That is why, for example, the content [loves] contributes to the content of both the thought that Andy loves Betty and the thought that Betty loves Andy. Thinking either thought requires tokening a complex mental representation having a constituent with the content [loves]. The Classical account also explains why systematicity is a nomologically necessary feature of thought. Because the systematic variants of a particular mental representation are constructed from the same constituents by means of the same syntactic rules, anyone who can token that mental representation is bound to be able to token its systematic variants. Of course, there could be spe- cial circumstances in which systematicity does not hold for certain thoughts. For example, John might suffer a type of brain damage that prevents him from thinking that Betty loves Andy, even if he can think that Andy loves Betty. But the point is that, on the Classical view, such circumstances would have to be out of the ordinary. In other words, the Classicist may hold that the law that thought is systematic is a ceteris paribus law. Let?s move on to Smolensky?s explanation of systematicity. 22 2.2??Smolensky?s Connectionist Explanation of Systematicity Smolensky accepts representationalism, mental-representation complexity, struc- ture-sensitive mental processing, and compositionality for mental representa- tions. He disagrees with the Classicist, however, on the nature of mental repre- sentations and, correlatively, on the nature of the constituency relation. He also disagrees with the Classicist on the nature of mental processes. 10 Smolensky?s account of the systematicity of thought takes some setting up, but then is relatively straightforward. A good place to begin is his view on the nature of mental representations. Unlike Fodor, Smolensky does not attempt to explain systematicity in terms of language-like mental representations. Instead, he appeals to representations that encode both the syntactic structure of language-like representations and their constituents but do not actually have language-like, configurational structure themselves. On his account, all mental representations, or at least those impor- tant for issues about systematicity, are patterns of Connectionist-network unit activation levels. They are distributed over many units, which is to say that (1) every mental representation comprises the activity of multiple units, and (2) every unit participates in multiple mental representations. Such activity pat- terns are readily conceptualized as vectors (ordered sets of numbers), where each 23 10 Actually, Smolensky doesn?t argue that Classicism is wrong. His intended conclusion is that there are viable Connectionist alternatives to Classicism. number in the vector uniquely corresponds to the activity level of a particular unit (Fig. 2.1). For this reason, following Smolensky and many others, we may simply call mental representations of this sort ?vectors.? On Smolensky?s view, the constituency relation for vector representations is a certain type of vector component 11 relation, not a co-tokening relation. Of course, there are many vector component relations: vectors are mathematically decomposable in many ways (in some systems of vector representation, includ- ing Smolensky?s, infinitely many). For example, just as many different pairs of numbers sum to a given number, many different pairs of vectors sum to a given vector (some vector operations are introduced below). So, some of the compo- nents mathematically derivable from a vector representation will not have an ap- 24 11 On my usage, vector components are not elements or subsets of vectors. Vector components are members of the domains of vector operations such as vector addition and tensor multiplication, which are introduced below. 1 6 3 4 2 Vector: ?1, 6, 3, 4, 2? Content: [Andy loves Betty] 5 0 7 8 9 Vector: ?5, 0, 7, 8, 9? Content: [Betty loves Andy] a b c d e a b c d e Activity pattern 1 Activity pattern 2 Figure 2.1. Vector representations. The activity patterns and the contents I?ve assigned to them were chosen arbitrarily. Note that, although the contents of two vectors may be systematically related, as they are here, this does not require that the vectors have any common elements or subvectors. a?e: Connectionist-network units. propriate content or will have no content at all. Such components, then, will not be representational constituents of the vector from which they are derivable. To address this matter, Smolensky comes up with a system of vector representation in which just those vector components with the appropriate contents are the con- stituents of mental representations. He achieves this, in part, by providing an algorithm for translating Classical symbol structures into vectors. In particular, he shows that a unique vector translation is derivable from any constituent structure, binary tree. 12 In order to understand Smolensky?s translation scheme, it is necessary first to understand two vector operations, vector addition and tensor multiplication. To add two vectors, we simply add their corresponding elements. Thus, the vec- tor sum of ?1, 2, 3? and ?2, 3, 4? is ?3, 5, 7?. Generalizing to all finite vectors, the sum of the vectors ?x 1 , x 2 , ?, x n ? and ?y 1 , y 2 , ?, y n ? is 25 12 Smolensky (1995c) alternately speaks of vectors as being, realizing, and representing Classical symbol structures. He doesn?t speak of vectors as translating them. However, with respect to the present issue, I think that seeing vectors as translations (of a sort) most clearly elucidates his view. For the notion of translation brings with it the idea of semantic relations, and that idea is crucial to the explanation of systematicity. ?x 1 + y 1 , x 2 + y 2 , ?, x n + y n ?. (Vector addition is defined only for vectors having the same number of ele- ments.) The tensor product of two vectors is the vector which contains all the separate products of every single element of the first and every single element of the second. For example, the vector product of ?1, 2? and ?2, 3, 4? is ?1(2), 1(3), 1(4), 2(2), 2(3), 2(4)? = ?2, 3, 4, 4, 6, 8?. Generalizing to all finite vectors, the tensor product of ?x 1 , x 2 , ?, x n ? and ?y 1 , y 2 , ?, y m ? is ?x 1 y 1 , x 1 y 2 , ?, x 1 y m , x 2 y 1 , x 2 y 2 , ?, x 2 y m , x n y 1 , x n y 2 , ?, x n y m ?. Vectors which are tensor products, or which have tensor products as compo- nents, are called ?tensor product representations.? We are now in a position to understand the essential?s of Smolensky?s tree translation scheme. 13 Take some constituent structure tree, say, (L (A, B)), 26 13 See Smolensky 1995c, pp. 136?141. having the content [Andy loves Betty]. In Smolensky?s system, it has the unique vector translation V = (r 0 ? L) + (r 1 ? ((r 0 ? A) + (r 1 ? B))), where ??? is tensor multiplication and ?+? is vector addition. The tree constituents L, A, and B are assigned the vectors L, A, and B, respectively. That L and A are left branches is encoded by taking the tensor products of L and r 0 and of A and r 0 , where r 0 is a (constant) vector than encodes the left-branch structural role. That B and (A, B) are right branches is encoded by taking the tensor products of B and r 1 , and of (r 0 ? A) + (r 1 ? B) and r 1 , where r 1 is a (constant) vector that encodes the right-branch structural role. That a certain tree has two particular trees as its immediate subtrees?for example, that (L (A, B)) has L and (A, B) as its im- mediate subtrees?is encoded by requiring that the vector which translates the higher-level tree is the sum of the vectors which translate the two subtrees. Given Smolensky?s tree translation scheme, just those vector components with the appropriate contents are the constituents of mental representations. Al- though V is equal to the sum of many different pairs of vectors, only the sum (r 0 ? L) + (r 1 ? ((r 0 ? A) + (r 1 ? B))) gives us V?s constituents, L, A, and B. 14 27 14 This works as long as r 0 and r 1 are independent vectors (see Smolensky 1995c, pp. 237 and 283n19). Smolensky?s notion of vector constituency, then, may be stated as follows: Vector constituency? ?Vector V n is a vector constituent of vector V m iff V n uniquely translates tree T, V m uniquely translates tree T*, and T is a Clas- sical constituent of T*. Vector constituency, then, is a derivation relation, not a co-tokening relation. It is a vector component relation that presupposes a translation function from trees to vectors, where the vector that translates a particular tree is uniquely derivable from it. Since vector constituency is not a co-tokening relation, one vector can be a constituent of another, tokened vector, without itself ever being tokened. Ac- cordingly, it is further true that although the representation-level processes in a Smolensky cognitive architecture result in vector-to-vector transformations, they do not operate on any tokened constituents of the vector tokens they trans- form?vectors are processed as wholes (Fig. 2.2; see also ? 3.1.2). This stands in stark contrast to the Classical account, on which there are representation-level processes that transform complex representation tokens by operating on their tokened constituents. The principal representation-level operation in Connectionist networks is matrix multiplication: the multiplication of a vector by a matrix of connection strengths. Matrix multiplication is implemented by a set of simpler algorithmic processes, each being the multiplication of a single unit?s activation value by a 28 single connection strength. But these algorithmic processes operate at a subrepre- sentational level of description: they do their job at the level of single units and single connections. They do not operate on patterns of activity levels. Hence, they do not operate on mental representations or their constituents (which themselves are patterns of activity levels). Thus, in a Smolenksy architecture, neither representation-level processes nor the algorithmic processes that implement them operate on the constituents of the representation tokens they manipulate. It?s important to be clear on the role of trees and tree translation algorithms in Smolensky?s account. Neither are to be understood as playing causal roles within cognitive systems. They are, rather, elements of his theory of how cogni- tive systems can exhibit some of the properties of Classical systems of represen- tation. Trees simply provide a good example of representations having Classical 29 [Andy loves Betty] 1 6 3 4 2 5 0 7 8 9 [Betty loves Andy] Figure 2.2. Vectors are processed as wholes. Vector transforming processes in networks with dis- tributed vector representations operate on entire vectors, not on any of their constituents. Here, the vector instantiated at left is directly transformed into an instantiation of one of its systematic semantic variants; and this is accomplished in the absence of any process that operates on any vector with the content [Andy], [loves], or [Betty]. constituent structure, and Smolensky shows that tensor product representations can have a parallel, but non-Classical, constituent structure. The tree translation algorithms describe but do not govern mental processes, in the sense that they are not executed by cognitive systems. They do, though, provide a way of under- standing the tensor product representation constituency relation. They also pro- vide a way to show that a Connectionist network with a Smolensky architecture can process tensor product representations in a way that maintains the appropri- ate semantic relations among systematically related mental representations, as we will see shortly. We may briefly sum up the key points of the preceding as follows. Consider a mental representation that has the content [Andy loves Betty]. On Smolensky?s account, that representation is a tensor product representation, V 1 = (r 0 ? L) + (r 1 ? ((r 0 ? A) + (r 1 ? B))). Vector V 1 is the unique translation, and encodes the constituent structure, of a tree, (L (A, B)), having the content [Andy loves Betty]. Furthermore, V 1 ?s component vectors, A, L, and B, have the contents [Andy], [loves], and [Betty], respectively. Those vectors are the representational constituents of V 1 . Now, vector V 1 can be transformed into a different vector, V 2 = (r 0 ? L) + (r 1 ? ((r 0 ? B) + (r 1 ? A))). 30 Note that V 2 has the same constituents as V 1 , but their mathematical arrange- ment is different: the roles of A and B are reversed. A key question now is, What is the content of V 2 ? Since vectors are translations of trees, an important step in answering that question is to determine which tree V 2 translates. Smolensky, in fact, provides a procedure for deriving from any vector that tree which is its unique translation. He shows not only that there is only one vector that translates a given tree but also that there is only one tree derivable from a given vector. The tree that is uniquely derivable from and uniquely translated by V 2 is (L (B, A)). Hence, assuming compositionality for tensor product representations, V 2 has the content [Betty loves Andy]. The explanation of systematicity is now relatively straightforward. Suppose that John?s cognitive system has a Smolensky architecture and can token V 1 . Then the vector space for that system contains the vectors A, L, B, r 0 , and r 1 . 15 Furthermore, the system must (in principle) be capable of building up V 1 by means of processes that both operate on its constituents and implement vector addition and tensor multiplication. But then the vector space for the system also contains V 2 . For V 2 has the same constituents and the same mathematical struc- ture as V 1 . Finally, if the vector space for the system contains V 2 , then the system 31 15 These consequences depend on the properties of a Smolensky architecture. One key property is that of having fixed, independent, structural-role vectors (r 0 and r 1 ). Another is that of having a continuous range of unbounded activation values. is capable of tokening V 2 . Hence, on Smolensky?s account, if John is able to think that Andy loves Betty, he is thereby also able to think that Betty loves Andy. For if John is able to think that Andy loves Betty, then his cognitive system is capable of tokening V 1 . And if his cognitive system is capable of tokening V 1 , it is capable of tokening V 2 . And, finally, if it is capable of tokening V 2 , then John is able to think that Betty loves Andy. As on the Classical account, systematically related tensor product represen- tations share constituents, and those constituents individually contribute the same contents to the content of the relevant mental representations. So Smolen- sky?s account explains why the semantic relations among systematically related thoughts are nonarbitrary. Smolensky?s account also seems to explain why systematicity is a nomologi- cally necessary feature of thought: because a particular tensor product represen- tation and its systematic variants have the same vector constituents and the same mathematical form, anyone who is able to token that representation is bound to be able to token its systematic variants. 2.3??Summary of the Key Features of the Two Explanations of Systematicity The Classical explanation and Smolensky?s explanation both assume, in a broad sense, compositionality for mental representations. But they differ in four im- 32 portant respects. The Classical account posits a cognitive architecture with the following features: (1) The constituency relation for mental representations is concatenative. (2) Mental representations have syntactic structure. (3) Mental processes are causally sensitive to the syntactic properties of mental representations. (4) The constituents of mental representations play causal roles in mental processes. On the other hand, Smolensky?s account posits a cognitive architecture with these features: (5) The constituency relation for mental representations is nonconcatena- tive. (6) Mental representations have mathematical (vector) structure, of a sort that is not also a kind of syntactic structure. (7) Mental processes are functionally sensitive to the constituent structure of mental representations. (8) The constituents of any particular mental-representation token do not have causal roles in any operation on that token. My claim that (8) is a feature of a Smolensky architecture is controversial. I show in the next chapter that it is indeed a feature of Connectionist architectures. Feature (6) might require some clarification. Some defenders of Connection- ism, including Smolensky, do speak of vectors as having syntactic structure and 33 do consider mental processes to be sensitive to syntactic structure. For the math- ematical structures of the relevant vectors encode the syntactic structures of their corresponding Classical representations, and that permits mental processes to be structure sensitive. But this is a terminological matter. To avoid confusion, I will use terms describing the formal structure of representation tokens only as de- scriptions of their configurational structure, not as descriptions of their (broadly speaking) constituent structure (though these two kinds of structures may coin- cide, as they do for Classical representations). Features (5)?(8) are very plausibly essential features of any Connectionist architecture on which a non-Classical explanation of systematicity could be based. Again, that this is true for (8) is a topic of the next chapter. Feature (7) seems clearly essential for any adequate explanation of systematicity. Regarding (5) and (6), note first that they are features of any Connectionist architecture that employs distributed vector representations, whether or not they are tensor prod- uct representations. Furthermore, all Connectionist systems alleged to exhibit some significant kind of systematicity employ distributed vectors. Indeed, as van Gelder 16 argues, it is hard to see how Connectionists could provide a non- Classical explanation of systematicity without appealing to distributed vectors. For Connectionist networks do not have arbitrarily extendable representational 34 16 van Gelder 1990, pp. 368?369 and 374?375. resources?they have a finite number of units over which to represent arbitrary complex structures. So, in order to represent such structures, Connectionists have turned to representational schemes which permit the various parts of a complex structure to be represented at once over the same set of units; that is, they have turned to distributed vectors. As I argue in Chapters 4 and 5, the appeal to distributed vectors in explana- tions of systematicity turns out to be problematic. The force of the difficulties facing Connectionism will be clearer if we first see that the constituents of a vec- tor representation token do not have causal roles in any operation on that token. 35 Chapter 3 Systematicity and Causation There is a specific sense in which the Classical explanation of systematicity is a causal explanation. Since Classical constituency is a co-tokening relation, the rep- resentational constituents within a cognitive system, on the Classical account, are available to causally interact via rule-governed processes in order to form sys- tematically related mental representations. The causal efficacy of representational constituents is essential to the Classical explanation. In contrast, Fodor and McLaughlin 1 argue, Smolensky?s explanation is not a causal one. That is, his explanation of the capacity to token systematically related vectors does not posit causal laws governing constituents of those vectors. Nothing about a Smolensky architecture guarantees that the vector constituents of tokened vectors are ever themselves tokened within the system. Neither to- kening a vector nor performing an operation on a vector requires tokening its vector constituents. So nothing about a Smolensky architecture guarantees that the vector constituents of tokened vectors are available to causally interact in or- der to form systematically related mental representations. Moreover, neither con- 36 1 Fodor 1998 and Fodor and McLaughlin 1995. stituent structure trees nor tree?vector algorithms play any causal roles within Smolensky architectures. In this chapter, I?ll examine and reject a variety of objection?s to Fodor and McLaughlin?s argument. Note that their argument applies to any cognitive ar- chitecture for which the constituency relation is nonconcatenative. So there is good reason to think that it applies to every Connectionist architecture (? 2.3). 3.1 Vector Constituent Causation Some defenders of Connectionism have argued that Smolensky?s explanation of systematicity is (or could turn out to be) a causal explanation after all. Some of them argue that the vector constituents of tensor product mental representations do (or might) play causal roles at the representational level of description, ap- pealing to either the notion of superposition, criteria for existence and causal effi- cacy, or similarity relations among vectors. Contrary to first appearances, on this sort of view, vector constituents are (or might be) causally efficacious, even if not severally present within the relevant cognitive system. Other defenders of Connectionism argue that nonconcatenative constituency is compatible with the architectural requirement that a vector?s constituents must have played a causal role in the eventual production of that vector, and that that is enough to guarantee the causal efficacy of those constituents. Still others argue that whether vector constituents themselves are causally efficacious is not the is- 37 sue; rather, it is whether facts that certain vectors have certain constituents are caus- ally efficacious. 3.1.1 Superposition Smolensky suggests the possibility that the constituent structure of tensor prod- uct representations is analogous to the structure of such phenomena as complex waves. 2 Thus, when a musical chord is played, the sound waves of its individual notes are in superposition. They are not independently tokened within the re- sulting complex wave, in the sense that the waves in superposition are not like the separate strands of a string. Nevertheless, they each have their own causal consequences. For instance, they can be discriminated by the human ear. Or consider the example of a single-trace recording of a chord on magnetic tape. 3 The magnetic pattern on the tape, it might be claimed, is a superposition of the patterns that would have been present if the chord?s notes had been recorded separately. None of those patterns is actually present on the tape. But if the tape is played on suitable sound processing equipment, each individual note?s pattern can have its own causal consequences. 38 2 Smolensky 1995c, pp. 241, 284n26. 3 The example is from Horgan and Tienson (1996, p. 183, note 3). Horgan and Tienson do not ar- gue that vector constituents are causally efficacious at the representational level. But a defender of Connectionism might be tempted to argue that they are, or could be, on the basis of such ex- amples. Horgan and Tienson?s position will be examined below. I find neither of these analogies persuasive. Let?s start with the recording case. As Fodor and McLaughlin have argued, the trouble with such cases is sim- ply that counterfactual causes cannot have actual effects. 4 The current question is whether the type of magnetic pattern under discussion has ?constituents,? of the specified sort, with independent causal powers. And the answer is clearly no. A magnetic pattern that would have been there in a counterfactual situation is not in fact there and so cannot have actual causal consequences. Of course, the magnetic pattern that is in fact on the tape is a kind of encod- ing of a chord. And the pattern can be decoded so as to more-or-less accurately reproduce the chord. So it might appear that some sort of constituent ?structure- sensitive processing is going on. But the fact that the pattern can be decoded doesn?t show that it has causally efficacious, single-note encoding constituents. It only shows that it carries information about the chord?s structure. And this it can do, even if it has no such constituents at all. After all, in principle, each distinct chord type could be encoded by a different simple numerical symbol. The sound wave case might be different from the magnetic-pattern case. In the magnetic-pattern case, the constituents are only ?counterfactually there.? If the same is true in the sound wave case, then the same response is called for: counterfactual causes cannot have actual effects. However, it might be thought 39 4 Fodor and McLaughlin 1995, pp. 214?215. that in the case of sound waves, any constituent waves are somehow actually there, even though they are not separately tokened. And if they are actually there, then they can have actual effects. We?d then have the kind of case the pres- ently envisioned defender of Connectionism wants: a clear example of non- Classical, nontokened constituents with causal efficacy. For example, I can imagine someone wanting to claim that in the case of a chord?s sound wave, the individual notes? waves could first severally come into being and then superimpose to form the chord?s waveform. If that is the case, then clearly each note?s wave pattern makes a causal contribution to the charac- ter of the complex wave pattern, even though its individual character is lost in the superposition. And since it makes a causal contribution to the character of the complex pattern, it can have further causal consequences through that contribu- tion. Moreover, even if a chord?s wave pattern is produced all at once, without its component waves having been produced independently, it still seems to be the case that each component wave?s pattern makes a causal contribution to the character of the chord?s wave pattern. Thus, it certainly appears that something can be actually present, in some sense, without being separately tokened, and that that is enough for it to be causally efficacious. Clearly, one problem with the move under consideration is that sense needs to be made of the purported distinction between being actually present and be- 40 ing separately tokened. If to be tokened, ?separately? or otherwise, is something other than to have an instance actually present, then what is it? Furthermore, in the case of wave phenomena, there is in fact no pressure to distinguish between a wave?s being actually present and its being tokened. The law of superposition can be stated as follows: The existence of one wave does not affect the existence or properties of another wave, even if they are in the same place at the same time. This is equivalent to the statement that waves add algebraically; that is, the displacement of the sum wave A + B is equal to the displacement due to wave A added to the displacement due to wave B at the same point and time. ? This clearly distinguishes waves from material things, no two of which can occupy the same place at the same time. Waves can pass through each other without affecting each other. 5 Given what we know about waves, and contrary to the envisioned view under discussion, component waves do not lose their individual character when in su- perposition. So there is no reason to regard them as nontokened, without all of their defining properties intact. Of course, we might not be able to tell what the component waves of a complex sound wave are, just by looking at (say) the dis- placement pattern due to the complex wave. In that sense, waves do lose their ?individual character,? or appearance, when in superposition. But that?s an epistemological problem, not one about the nature of waves. By now it should be clear that tensor product representations and complex waves are significantly disanalogous. Given what we know above waves, the 41 5 Berg and Stork 1995, p. 29. component waves of a complex wave must be (separately) tokened in order for it to have the properties it has. 6 That?s why each of its component waves can have its own causal consequences. However, the vector constituents of a tokened ten- sor product representation need not themselves ever be tokened in order for it to have the properties it has. To put this another way, waves and vectors superim- pose differently. A complex wave token is a result of physical interactions among its component wave tokens. A tensor product representation, on the other hand, is a result of computations that rely on mathematical relations among its vector con- stituent types, regardless of whether or not those types are ever tokened. So the fact that waves in superposition can each be causally efficacious provides no rea- son for thinking that nontokened vector constituents can be causally efficacious. 7 3.1.2 Criteria for Existence and Causal Efficacy Matthews argues that, on Fodor?s own criteria for existence and causal efficacy, vector constituents appear both to exist and to have causal consequences. 8 42 6 Contrast Horgan and Tienson (1996, p. 74): ?Sound waves, like all waves, superimpose; so in the chord none of the individual waves that went to make it up is tokened.? 7 Fodor (1998) also argues that waves and vector constituents are significantly disanalogous. However, his discussion is misleading and confusing because of the way he construes vector con- stituency: ?C is a derived [vector] constituent of vector V iff V (uniquely) encodes C* and C is a con- stituent of C*. That is, the derived [vector] constituents of a vector V are the constituents tout court of the tree that V encodes? (p. 177). Construing vector constituency is this way makes it all too easy to argue that vector constituents are not like waves in superposition, since nonimplementa- tional connectionist architectures don?t support Classical symbol structures. They do, however, support vectors, and vector constituents are vectors, after all. 8 Matthews 1996, pp. 164?166. Roughly, on Fodor?s view, a science is committed to the existence of those theo- retical entities that figure essentially in its explanations and generalizations, 9 and a scientific theory is committed to the causal efficacy of a property if the theory includes a causal law to the effect that something?s having that property (for example, a sail?s having the property of being an airfoil, or a bank of nodes? hav- ing the property of instantiating a certain vector) is nomologically sufficient for the occurrence of an event of some specific kind (under appropriate conditions). More formally, a theory is committed to the causal efficacy of a property, F, if, according to the theory, an occurrence of an event that has F is nomologically suf- ficient for the occurrence of an event that has a certain property, G. 10 According to Matthews, vector constituents satisfy both criteria. For on Smolensky?s theory, decomposing tensor product representations into their constituents is essential to understanding and explaining the regularities in a network?s behavior. 11 Matthews is wrong if his claim is that Fodor is committed to the existence and causal efficacy of nontokened vector constituents. Note first that there is no problem at all for Fodor regarding the existence and causal efficacy of tokened 43 9 Compare Fodor (1998, p. 123): ?What kinds of things a theorist says there are sets an upper bound on what taxonomy his explanations and generalizations are allowed to invoke. And what taxonomy his explanations and generalizations invoke sets a lower bound on what kinds of things the theorist is required to say that there are.? 10 See Fodor 1990, chapter 5. 11 See Smolensky 1995a, pp. 188?191. vectors. Nor is there a problem for Fodor regarding the existence and causal effi- cacy of vectors as types. For they can be tokened, and if they are, they can have causal consequences. The specific issue is whether a vector, as a nontokened vector constituent, can have causal consequences. 12 Now, it is certainly true that decomposing tensor product representations into their constituents is essential to understanding and explaining the regulari- ties in the behavior of a Smolensky architecture, including any regularities re- lated to systematicity. But the broad issue here is whether Smolensky?s explana- tion of systematicity is a causal one, and Matthew?s objection just presupposes that it is. It is tendentious whether Smolensky?s theory requires that a nonto- kened vector constituent?s possessing some specific property is nomologically sufficient for the occurrence of some specific kind of event, and Matthews pro- vides no reason for thinking that this is so. Still, Matthew?s presupposition could be right. So we need to look at how decomposing tensor product representations into their constituents is essential to understanding and explaining the regularities in the behavior of a Smolensky 44 12 The fact that tokened vector constituents are causally efficacious is thus beside the point (see also Section 3.1.4). Nothing prevents a Connectionist system from tokening both a representa- tionally complex vector and one or more of its vector constituents. But whether or not a particular Connectionist system does that is not a matter of cognitive architecture (that is, the system?s do- ing that wouldn?t be part of what makes it a Connectionist system). It could be made a matter of cognitive architecture, so that operating on a tokened vector requires decomposition into, and operations on, tokens of its vector constituents. But then the architecture wouldn?t be a com- pletely Connectionist one. A good example of a system having such an architecture is Touretzky?s (1986) BoltzCONS. architecture. Smolensky?s exposition of this, though not difficult to follow, takes a few pages. 13 So rather than summing up the entire exposition in terms of general principles, I?ll provide a simple example. Suppose we have a Smolensky architecture that computes a simple function, namely, the function whose value is the binary tree (y, x), for any binary tree (x, y) in its domain. The network, we?ll assume, computes this function in one step, by multiplying the vector Vi, which translates (x, y), by the connection weight ma- trix W, yielding the vector Vo, which translates (y, x). So how does the network work? How does it compute the function in the manner it does? First, as Smolensky shows, there are weight matrices W extract left and W extract right , such that W extract left ? Vi = Vx and W extract right ? Vi = Vy, where ??? is matrix multiplication, and Vx and Vy translate the trees x and y, re- spectively. There are also weight matrices W construct left and W construct right , such that 45 13 Smolensky 1995c, pp. 245?249. Vo = (W construct left ? Vy) + (W construct right ? Vx). Thus, substituting W extract left ? Vi for Vx and W extract right ? Vi for Vy, we obtain Vo = (W construct left ? W extract right ? Vi) + (W construct right ? W extract left ? Vi) Vo = Vi ? ((W construct left ? W extract right ) + (W construct right ? W extract left )). Since the products and sums of weight matrices are themselves weight matrices, there is a weight matrix, W, such that W = (W construct left ? W extract right ) + (W construct right ? W extract left ). Hence, Vo = Vi ? W. This derivation shows how the network computes the function in question by means of a single vector transformation. It can do so because of the mathematical structure of W, Vi, Vo, Vx, and Vy and because of the fact that Vx and Vy trans- late the appropriate trees. Of course, Vx and Vy are the vector constituents of Vi and Vo. And they must be referred to in order to explain how the network works in this case. This is just one example of the fact that ?tensor product constituents play absolutely 46 indispensable roles in the description and explanation of cognitive behavior in [Smolensky architectures].? 14 So is Matthews right? Is Fodor committed to the causal efficacy of nonto- kened vector constituents? Clearly, not. The explanation relies on mathematical relationships as opposed to lawful relationships between events. Nothing about the explanation we?ve just gone through requires that Vx?s (or Vy?s) being a (nontokened) constituent of Vi (or Vo) is nomologically sufficient for the occur- rence of anything. Nor is there anything about the explanation that requires that something?s being an instance of Vx (or Vy) is nomologically sufficient for the occurrence of anything, since the explanation simply does not require any vector constituent to be instanced. Matthews also argues that Fodor and company?s rejection of the causal effi- cacy of vector constituents is incompatible with Fodor?s view that causation oc- curs at macroscopic levels of description, not only at more primitive levels. Matthews claims that Fodor and colleagues? ?complaint against tensor product representations ? is that they don?t actually have constituent structure. They don?t have it, because ? the normal modes [vector constituents] into which the tensor product vectors are decomposed don?t ?correspond? to causal agents in the network.? 15 And, Matthews claims, Fodor and his associates think that vector 47 14 Smolensky 1995c, p. 249. 15 Matthews 1996, p. 165. constituents are not causal agents because all the causal work in a Connectionist network is done at the level of individual units and connections. But if, as Fodor maintains, causation occurs at many levels, causation at the level of individual units and connections does not rule out causation at higher levels, in particular, at levels of representation. I find this objection to be quite puzzling. Certainly Fodor wouldn?t deny that tokened vectors are causally efficacious. Fodor would surely agree that they can have causal consequences, even though there are causal processes operating at the level of individual units and connections. What Fodor does deny is that nontokened vectors can have causal consequences. His view on this matter has nothing to do with levels of causation; rather, it simply rests on the extremely plausible assumption that nonexistents can?t cause anything. If there aren?t any network unit activation patterns that token a certain vector, then there aren?t any causal effects due to activation patterns that token that vector. Matthews seems to be presupposing, contrary to what we concluded earlier, that the vector constituents of tensor product representations do not lose their individual character in superposition, so that if a tensor product representation is tokened, so too must be its vector constituents. 16 That is, he appears to be as- 48 16 Smolensky at time writes as if he thinks this: ?The representation is distributed: since the vec- tors realizing all the constituents in the structure are superimposed upon each other, each unit participates in the realization of many symbols? (1995c, p. 249). suming that vector constituents are like wave components. On that assumption, if Fodor accepts the existence and causal efficacy of wave components, then he should do likewise regarding vector constituents. We?ve already rejected the idea that vector constituents are like wave components, but let?s see what its conse- quences would be if it were true. Suppose the vector constituents of a tensor product representation must be tokened whenever it is tokened. A consequence of this supposition is that tensor product representations would be Classical representations. For, first, vector con- stituency would be a kind of co-tokening relation. Second, the structure of tensor product representations would be governed by a combinatorial syntax. Certain vectors would be of certain formal types and would physically combine, or con- catenate, to form more complex vectors according to syntactic rules. The repre- sentation forming processes would be sensitive to the causally efficacious syntactic properties of the tokened vectors on which they would operate. More specifically, the following rules specify a perfectly good syntax, on the present supposition that the constituency relation for tensor product representa- tions is concatenative. 1. There are two sets of atomic vectors, role vectors ({r 0 , r 1 }) and filler vec- tors. ({A 1 , A 2 , A 3 , ?}). 2. For any atomic filler vector, A i , the vectors r 0 ? A i and r 1 ? Ai are wffs. 49 3. If the vectors r 0 ? V i and r 1 ? V k are wffs, then the vector (r 0 ? V i ) + (r 1 ?. V k ) is a wff. 4. If the vector V i + V k is a wff, then the vectors r 0 ? (V i + V k ) and r 1 ? (V i +. V k ) are wffs. 5. There are no other wffs. Of course, the symbols ???, ?+?, ?(?, and ?)?, as they occur in these rules, are not part of the object language. They describe how the relevant vectors are concate- nated, rather than designate any of the things that are concatenated. Tensor product representations having concatenative constituency and a combinatorial syntax might be acceptable to some Connectionists, provided that they could still make a good case for the claim that cognitive processes would nonetheless be non-Classical. However, the suggested reason for thinking that vector constituency is a co-tokening relation?vector constituents do not lose their individual character in superposition?would seem to apply equally to weight matrices. As we?ve seen, weight matrices themselves are superpositions of other weight matrices. Like tensor product representations, weight matrices are sums and products of weight matrices. So if vector constituency is a co- tokening relation, so too is matrix constituency. Furthermore, in a Smolensky architecture, fundamental weight matrices en- code steps of Classical algorithms for operating on vector constituents. There are weight matrices for extracting vector constituents and weight matrices for com- 50 bining vector constituents. However, if vector constituency and matrix constitu- ency are co-tokening relations, then it looks very much like Smolensky architec- tures are nothing more than massively parallel implementations of Classical ar- chitectures. For all the steps of the relevant Classical rules, and all the relevant Classical constituents on which they operate, would be encoded, and the encod- ing weight matrix constituents and vector constituents would be tokened and thus available to play causal roles. We?d end up with an implementation of a Classical system in a wave-like substrate. By way of illustration, notice that the above derivation that explains how a Smolensky network can compute a function by means of a single vector trans- formation outlines a Classical algorithm for computing that function solely by means of Classical operations on tokened vectors and their tokened constituents. A Classical machine could implement such an algorithm to compute, in a se- quence of steps, what the network computes in one step. Also, the extraction and construction matrices (and matrices for adding vectors and matrices) encode steps of that algorithm. If we assume that all the Classical-constituent encoding vectors and algorithm-step encoding matrices must be actually tokened in the network, then its hard to see how the network is not merely an implementation, albeit a massively parallel one, of a Classical machine. 51 3.1.3 Vector Similarity van Gelder 17 has argued that representation processing in Connectionist net- works is causally sensitive to the constituent structure of vectors. He attempts to derive the causal efficacy of vector constituents from both the causal efficacy of the tokened vectors which have them as constituents and the notion of distance in vector space. As he points out, vectors stand in nonsyntactic, internal-structure similarity relations. These similarities are (or correspond to) the distance relations among vectors in the vector space for the relevant system. 18 Furthermore, sys- tematically related vectors are more similar in this regard than non- systematically related vectors: systematic variants are closer in vector space than non-systematic variants. These similarities, van Gelder argues, are of causal significance. First, the behavior of a network causally depends on the precise activation values of its units. And the activation values of particular units instantiate vectors, which, of course, have locations in vector space. So the behavior of a network?the conse- quences of its vector operations?causally depends on the vector space locations of its currently instantiated vectors. Second, the location of a tokened vector de- pends upon its constituents and its constituent structure. Any two tokened vec- 52 17 van Gelder 1990, pp. 379?380. 18 For example, vectors with two elements can be classified as more or less similar according to their Cartesian-plane distance relations. tors which differ in constituents or constituent structure will instantiate different vector types and so have different locations. Thus, vector operations must caus- ally depend on the constituents and constituent structure of the relevant vectors. van Gelder?s argument, however, is either invalid or relies upon tendentious assumptions. True, a Connectionist network that exhibits some degree of syste- maticity will causally process vectors that have the same constituents and con- stituent structure in similar ways. So such a system would be at least functionally sensitive to constituent structure. Now, as van Gelder acknowledges, the con- stituents of tokened vectors are not explicitly available. So the system cannot be sensitive to similarity relations among vectors by directly detecting their con- stituents. But, according to van Gelder, it can detect similarities with respect to distance in vector space. And the distance in vector space between two vectors depends on what constituents and constituent structures they have. But how is it supposed to follow that the constituents of a vector token have causal roles in operations on that token? First, in what sense could it be true that a network literally detects similari- ties with respect to distance in vector space? Such similarities would seem no more explicit than the constituents of a complex vector. Of course, we can use similarity of location in a network?s vector space to describe some aspects of its behavior. But the network itself doesn?t make use of such descriptions. The only 53 respect in which it seems true that a network ?detects? such similarities is that the system is functionally sensitive to them; that is, it can exhibit appropriate systematic behaviors. But whether such sensitivity requires that the constituents of a vector token have causal roles in operations on that token is precisely what?s at issue. (The issue of whether such sensitivity requires that vectors carry infor- mation about their constituent structure is addressed below in Section 3.1.5.) Second, for any given network that exhibits some sort of systematicity, it would appear to be an empirical question whether systematically related vectors are closer in vector space than non-systematically related vectors. Servan- Schreiber et al. 19 studied various simple recurrent networks trained to predict legal continuations of symbolic expressions having a simple grammar. The net- works varied in number of units. For networks with a relatively small number of units, the encodings of similarly structured symbolic expressions had similar lo- cations in vector space. However, this correspondence in similarity became weaker as the number of network units increased. This suggests that for net- works with very large numbers of units, there might not be such a correspon- dence at all. Given that possibility, it becomes doubtful whether the systematicities ex- hibited by Connectionist networks require, for their explanation, causal sensitiv- 54 19 Servan-Schreiber et al. 1991. Servan-Schreiber et al.?s results are briefly discussed in Garson 1997, pp. 350?351. ity to similarity of location in vector space. For a network that exhibits a certain systematicity might encode similar structures with vectors that do not have a correspondingly similar location. Finally, in what sense does the location of a vector depend upon its constitu- ents and its constituent structure? If the former does not causally depend upon the latter, then van Gelder?s argument does not go through. For the argument requires a bridge from the causal efficacy of tokened complex vectors to the causal efficacy of their nontokened constituents. And van Gelder cannot just as- sume that the location of a tokened vector is causally dependent on its nonto- kened constituents, for that assumption would presuppose that such constituents are causally efficacious, and that is at issue. Lastly, there is no reason for thinking that a tokened vector?s location is causally dependent on it constituents. For a tokened vector would have its location (that is, be an instance of a specific vector type) regardless of its causal history. 3.1.4 Vector Constituents as Causal Precursors Some defenders of Connectionism 20 have argued that nonconcatenative constitu- ency is compatible with the adoption of the architectural (representational-level) requirement that a vector?s constituents must have played a causal role in the eventual production of that vector. Further, that requirement would be enough to 55 20 See Hadley 1997, Butler 1991, and van Gelder 1991. guarantee the causal efficacy of those constituents, even when they are not cur- rently being tokened. Causation is transitive, so if there are causal chains of events from the tokenings of a tokened vector?s constituents to the tokening of that vector, then the tokenings of those constituents will play causal roles in any operations on that vector. On this view, the constituency relation for complex vectors remains noncon- catenative. It is not metaphysically necessary that a complex vector is tokened only if its constituents have also been tokened. However, this is nomologically neces- sary, given the architectural properties of the sort of networks envisioned. There is a serious problem with the view under consideration: the proposed architectural property would add nothing to Connectionist explanations of sys- tematicity. It should be clear from Smolensky?s account of systematicity that a Connectionist system which exhibits certain systematicities with respect to vari- ous complex vector representations would exhibit those systematicities regard- less of whether or not the constituents of the relevant tokened vectors have ever themselves been tokened. In particular, this is true of Connectionist systems having the architectural property that a vector?s constituents (nomologically) must have played a causal role in the eventual production of that vector. To see this, note that, in such a system, it is nonetheless possible that a complex vector be (fortuitously) tokened without any of its constituents ever having been to- 56 kened. And supposing that the system exhibits systematicity, it will be nonethe- less be the case that, if it is capable of (fortuitously) tokening such a vector, then it will be capable of tokening its systematic variants. An explanation of the sys- tem?s systematic behavior, in this sort of case, couldn?t appeal to the causal effi- cacy of the appropriate tokened constituents, since, by assumption, there never were any. But if there is an explanation of the systematic behavior of the network in this sort of case, it should apply just as well to cases in which the constituents of the relevant complex vectors have been tokened. So whether or not there have been tokenings of those constituents shouldn?t matter; they would add nothing to the explanation. More generally, what enables Connectionist representational processes to be constituent-structure sensitive is that constituent structure is vector encoded. The only way in which processes that operate on syntactically simple representations can be sensitive to their constituent structure is to have that structure encoded in the representations. Constituent-structure sensitivity, then, needs to be explained in terms of properties of the encodings. But if the properties of the encodings (to- gether with the processing mechanisms) do the explanatory work, then there is no need to appeal further to tokenings of constituents of the encodings. To put this another way, Connectionist explanations of systematicity turn on the mathematical properties of vectors in relation to a network?s (causal) vector 57 operations. But tokened vectors have their mathematical properties independ- ently of their etiology. Rather, they are inherited from the vector types they in- stantiate. So they have their mathematical properties independently of tokenings of any of their constituents. Of course, a true causal explanation of the tokening of a particular complex vector might advert to tokenings of its constituents. But an explanation of the mathematical properties of a particular vector doesn?t re- quire a causal explanation of its tokening. Indeed, a vector has its mathematical properties regardless of whether it is ever tokened. In sum, the proposed architectural requirement that a tokened vector?s con- stituents (nomologically) must have played a causal role in the eventual genera- tion of that vector would add nothing to Connectionist explanations of systema- ticity. Its only consequence would be to give some causal roles to tokenings of the relevant vector constituents. And that alone is not enough to make Connectionist explanations of systematicity reliant upon those causal roles. 3.1.5 Causal Efficacy of Information about Constituents There is a further issue that needs to be addressed before we may confidently conclude that Connectionist accounts of systematicity which appeal to noncon- catenative distributed representations are not causal explanations. Horgan and Tienson appear to concede that nontokened vector constituents themselves do not have causal consequences. They argue, however, that the fact that a particular 58 tensor product representation has a certain vector constituent can play a causal role in Connectionist architectures. The question is not whether constituents can play a causal role. The ques- tion is whether the fact that a representation has a particular constituent can play a causal role. And that fact can play a causal role if the represen- tation carries the information that it has that constituent. 21 Furthermore, they argue that vector representations which encode symbolic structures do in fact carry the information that they have the constituents they do. So if Horgan and Tienson are right, nontokened vector constituents them- selves need not be causally efficacious in order for a Connectionist explanation of systematicity to be a causal one. I think that Horgan and Tienson?s attempt to refocus the issue does nothing to further the Connectionist?s cause. Let?s first examine why they think that a complex vector representation carries the information that it has a particular con- stituent. As far as I can see, their only reason for thinking this is that a Connec- tionist architecture can perform what they consider to be constituent-sensitive operations. 22 For example, as Smolensky has shown, networks can process tensor product representations so as to yield their systematic variants. How is this pos- 59 21 Horgan and Tienson 1996, p. 79. 22 See Horgan and Tienson 1996, pp. 80, 184n6. sible if tensor product representations don?t carry the information that they have certain constituents? We may grant that some Connectionist architectures can compute the same functions as certain Classical architectures. The issue is whether the explanation of such facts must appeal to processes that are causally sensitive to the constitu- ent structure of vector representations. Horgan and Tienson appear just to as- sume that this is so. That is, they appear to just assume that the information that a representation has a particular constituent must play a causal role in mental processing. They need to provide an explanation of how Connectionist systems compute the functions they do, where that explanation adverts to causal roles for such information. Horgan and Tienson do say that ?how tensor product representations carry such information is no miracle; it is explainable mathematically.? 23 But we?ve al- ready seen the form of such mathematical explanations, in our discussion of how decomposing tensor product representations into their constituents is essential to understanding and explaining the regularities in the behavior of a Smolensky architecture (? 3.1.2). Such explanations do not require that information about vector constituents is causally efficacious. For instance, nothing about the sample explanation presented earlier requires that there are causal consequences of the 60 23 Horgan and Tienson 1996, p. 80. fact that the vector Vi (or Vo) has Vx (or Vy) as a constituent. In fact, such expla- nations seem to show how vector operations can be ?constituent sensitive? with- out being causally sensitive to information about constituents. 24 I should emphasize that I?m not attempting to deny that complex vectors en- code information about their constituents. We need to be careful so as to not con- fuse the idea of ?carrying? information with the idea of ?encoding? information. Tensor product representations certainly encode their constituent structure. What I deny is that they carry information about their constituents in such a way that that information could play a causal role in operations performed on those repre- sentations. Rather, such information, I think, is used only by us in designing Connectionist networks or in understanding how they work. But so far I?ve ar- gued only that there is no good reason to think that Connectionist architectures causally use such information. Is there a positive argument for my claim that they don?t? I think there is. I?ll state my case in terms of Smolensky architectures. 61 24 Hadley suggests, but does not insist upon, a line of argument that appears to be either a version of Horgan and Tienson?s view or a version of the view, envisaged above, that vector constituents are somehow actually there, even though they are not separately tokened. He writes, ?informa- tion that is recoverable from complex structures, even when background mechanisms must be assumed, may be regarded as implicit in a special sense. One could argue further that information that is implicit in this sense in not merely imaginary, because the complex structures in question must possess specific properties that reflect the derivable information. In particular, Smolensky?s tensor-product representations possess special properties that reveal the identity of their (pur- ported) ?imaginary? atomic constituents. Thus, some trace of the atomic constituents is present even in the complex representations? (1997, p. 148; italics in original). But the argument applies as well to any Connectionist system for which the rep- resentational constituency relation is nonconcatenative. Assume that there is a tensor product representation, R, that has a vector, C, as a constituent. Also, let?s suppose that the fact that R has C as a constituent plays a causal role in the operations which a Smolensky network performs on R. We want to explain how that fact is causally efficacious. An important question to initially ask is, What is the fact that R has C? What makes it true that R has C? Well, given the notion of vector constituency, the answer is the fact that R uniquely translates a constituent structure tree, T, C uniquely translates the tree T*, and T* is a Classical constituent of T. According to Horgan and Tienson, that fact can play a causal role if R carries the information that that fact obtains. So how could R carry that information? First, notice that the information is about properties of R that are nonlocal and nonphysical (radically physically het- erogeneous). Nonlocal, because (1) trees are not tokened, and do not play causal roles, in Smolensky architectures, and (2) the tokening of R does not require the tokening of C. Nonphysical, because translation relations (and, thereby, vector constituency relations) are nonphysical. In Smolensky networks, there are no physical interactions between trees and the vectors that translate them. More- over, since vector constituency is not a co-tokening relation, the property of hav- ing C as a constituent is physically heterogeneous. It is not the case that if two 62 tensor product representations have C as a constituent, then they must thereby have a specific physical property in common. Having C as a constituent is simply not a physical property, in the required sense. Second, it is quite plausible, to say the least, that the only way in which a computational system, of any kind, could be systematically sensitive to nonlocal, nonphysical properties is by representing them. 25 If a computational device is to function properly, its mechanical, information manipulating processes need to be systematically sensitive to various local and physical properties of its information bearing structures. It could not be expected to function properly if its operations have to be sensitive to whether or not the representations on which they operate possess certain nonlocal or nonphysical properties. For example, we can?t expect a computational system to work if its processes have to detect whether or not the representations on which they operate are, say, within 200 yd of a school build- ing, or are numerals, rather than some other sort of symbol. So, since having C as a constituent is a nonlocal, nonphysical property of R, if R is to effectively carry the information that it has C, if that information is to be reliably and mechanically detectable, then R must somehow represent the fact that it has C by means of some of its local, physical properties. Let?s call the feature or 63 25 See Rey 1997, ? 4.3, and Rey 2003. features of R that instantiate the relevant physical properties the bearer of the information that R has C. What, then, is the nature of the bearer of that information? In a Smolensky architecture, representations are vectors. So the bearer of the information that R has C must be a vector, V, that represents the fact that R has C. Well, then, which vector is V? V represents the fact that R has C, and so must have constituents, R* and C*, which refer to R and C, respectively. So R could be V only if it has those same constituents. That is, R could be V only if its correct interpretation is ?R has C?. (Remember, it?s provable just what are the con- stituents of a given tensor product representation.) But then our explanation of how R carries the information that it has C as a constituent would have the con- sequence that it could do so only if that?s what it meant in the first place. Natu- rally, what we need is an explanation of how R carries the information that it has C, regardless of what is the correct interpretation of R. Of course, when we started out, R was supposed to be the vector that carried the information in question. But let?s see if some other vector could do the job. Could V be a constituent, or some other nontokened vector component, of R? That won?t do, since V has to instantiate those local and physical properties of R which bear the information that R has C. That is something V could do only if 64 it is tokened along with R. Remember, we?re looking for a causal explanation. So whatever carries the information that R has C has to really be there. The only remaining option is that V is a subvector of R. That is, if R is the vector <1, 2, 3, 4>, perhaps V is the vector <1, 2>. However, if that?s the case, then R has a representational constituent, V, that must be tokened whenever it is. This option, then, gives up nonconcatenative constituency. It also has another prob- lem. For V, too, would have to carry information about its constituents via its subvectors (which in turn would have to have their own constituents, in that they would attribute properties to V). But vectors are finite. So eventually there would have to be a vector, V*, that either did not carry information about its con- stituents (since it wouldn?t have a subvector to do the job) or did carry informa- tion about its constituents by some other means. If the former, then our explana- tion would have to allow that some vectors don?t carry information about their constituents, in which case one would wonder why any would have to. If the latter, then we?d need another explanation of how vectors like V* carry informa- tion about their constituents. We thus have a kind of reductio of the supposition that the information that R has C is causally employed by a Smolensky network. For if that supposition were true, then there should be a causal explanation of it. There should be an ex- planation of how that information plays a causal role. But it appears that such an 65 explanation is not to be had. Therefore, that information is not causally used by such systems. A defender of Horgan and Tienson might point out that, contrary to what my argument appears to assume, on their view, R alone does not carry the infor- mation that it has C. Rather, it carries that information relative to the entire sys- tem: In classical systems ? representations ? have constituents only in the context of the whole system. The structure of the system as a whole de- termines that representations have a causal role that is sensitive to their constituent structure. And it is only by virtue of their having such a causal role that it makes sense to say that certain physical items are constituents. In connectionist systems ? the information that representation R has con- stituent C is [sic] carried by the representation R?relative to the whole system, even though constituent C is not physically present. 26 My argument, however, does not assume that R carries the relevant information independently of the entire system. It?s fine with me if R carries that information only relative to the system as a whole. It would still remain that case that if R is to carry the information, it must somehow represent it by means of some of its physical properties, regardless of whether or not those properties represent that information in and of themselves. What Horgan and Tienson need is a causal ex- planation of how R could carry (relative to the system) the information that it has C, and I?ve argued that one cannot be had. 66 26 Horgan and Tienson 1996, pp. 79?80. Perhaps Horgan and Tienson are drawing attention to the distinction be- tween explicit information and implicit information. In a Classical system, a rep- resentation like Fa explicitly means whatever it means, but only implicitly carries the information that it has F as a constituent. So maybe my argument errs by construing the information that R has C as information which is explicitly, rather than implicitly, carried by R. But this sort of response to my argument would miss its point. Classical representations can implicitly carry information about their constituents because those constituents are right there, instantiating all their local and physical properties. In contrast, representations for which the constitu- ency relation is nonconcatenative can?t implicitly carry information about their constituents in that way, since their constituents aren?t there. So if they do carry that information, they have to do it in some other way. Since the information is about nonlocal, nonphysical properties of the representations, it must be carried by being represented. Still, one might wonder, Isn?t the fact that nonconcatenative representations encode their constituent structure enough to show that they implicitly carry in- formation about what constituents they have? Well, no. That just takes us back to where we started. We have an explanation of how systems which employ non- concatenative representations exhibit systematicity. That explanation appeals to 67 properties of the encodings. The issue then arises whether the explanation is a causal one. Hence, this chapter. Again, my argument applies to any Connectionist system for which the rep- resentational constituency relation is nonconcatenative. We need only replace the specific version of vector constituency that applies in the case of Smolensky ar- chitectures with a more general version: a vector, V n , is a vector constituent of another vector, V m , only if V n uniquely encodes a symbolic structure, S, V m uniquely encodes another symbolic structure, S*, and S is a (concatenative) con- stituent of S*. But one might question whether my argument goes through when applied to Connectionist systems which employ the architectural requirement that a to- kened vector?s constituents (nomologically) must have played a causal role in the eventual generation of that vector. For it might not be clear that, in such systems, the information that a particular complex vector has certain constituents is about properties of that vector which are nonlocal and nonphysical. However, having a certain constituent as a causal precursor doesn?t make having that constituent a local, physical property. It seems easy enough to imag- ine two tokenings of the same vector having the same nomologically possible ef- fects, where one has a tokening of one of its constituents as a causal precursor but the other does not. At the least, I find it hard to imagine a non-question begging 68 way of arguing that having a certain constituent as a causal precursor makes having that constituent a local, physical property. Moreover, having a certain constituent as a causal precursor isn?t enough to make having that constituent a physical property. For the property of having a certain constituent as a causal precursor is itself physically realizable in a very wide variety of ways. Based on the arguments presented in this chapter, I think we may confi- dently conclude that Connectionist explanations of systematicity are not causal explanations. Of course, this conclusion does not in itself pose an immediate dif- ficulty for Connectionism unless the only sort of acceptable explanation in cog- nitive science is causal explanation. However, the next two chapters do present more direct problems for Connectionism; and one of those problems arises once it is seen that Connectionist explanations of systematicity are not causal ones. 69 Chapter 4 Acausal Explanation? Defenders of Smolensky could concede that his explanation of systematicity is not a causal one in that it does not advert to causal laws governing the constitu- ents of systematically related vectors. For they could deny that the only accept- able form of explanation in cognitive science is causal explanation. In particular, they could argue that Smolensky?s explanation is a good one, even though it does not take the form of a causal explanation. If Smolensky?s explanation is not a causal one, then what kind of explana- tion is it? Well, presumably it is supposed to work in the following way. We un- derstand the systematicity of Classical systems of representation, such as consti- tuent-structure trees. The tree?vector algorithms show that there is a one-to-one mapping between trees and the vectors which translate them. Furthermore, sys- tems with Smolensky architectures are designed so that vector processing is car- ried out in a way that maintains that one-to-one mapping. One difficulty with this sort of explanation, according to Fodor, is that ex- planatory adequacy is not in general preserved under one-to-one correspond- 70 ence. 1 So that there is a one-to-one correspondence between trees and tensor product vectors does not show that vector constituents and the notion of vector constituency are doing any explanatory work. In fact, Fodor maintains, the ex- planatory burden seems to be carried exclusively by Classical trees and the no- tion of concatenative constituency. On his view, all that tree encoding/deriving algorithms and the notion of vector constituency do for Smolensky is allow him to completely parasitize the Classical explanation without adding anything of substance to it. If his explanation appears to be adequate, that is only because it is merely the Classical explanation in disguise. I think Fodor?s conclusion might be too strong. I?m not sure that it is, and I won?t attempt to convincingly establish that it is. Suffice it to say that the expla- nation presented in Section 3.1.2 is of the same kind as Smolensky?s explanation of systematicity; and the former appears to be an adequate and illuminating ex- planation. Moreover, it is hard to see how the burden of that explanation is sup- posed to be carried exclusively by Classical representations and the notion of concatenative constituency. Rather than take a definitive stand on whether Fodor is right, I want to re- cast the issue somewhat. I want to claim that, insofar as Smolensky?s explanation is a good one, it fails to explain what it sets out to explain. (Perhaps, in the end, 71 1 Fodor 1998, p. 120. my objection amounts to the same point as Fodor?s, viewed in a different light.) The best way I can think of to clarify all this is just to get on with it. 4.1 An Adequate Explanation, but Not of Systematicity One way to show that an explanatory strategy is a good one is to provide a case in which it clearly succeeds. And one might appeal to Smolensky?s ?Visa Box? example, as he himself does, in order to show that some ?acausal? explanations, as he calls them, are in fact good ones. 2 I?ll agree with Smolensky that the expla- nation of how the Visa Box works is a good one, but I?ll claim that the explana- tory strategy is inadequate with respect to systematicity. The Visa Box is a device that assists in restaurant bill tip calculation, when the bill is not itemized. It?s inputs are the bill subtotal (food total, plus tax), the local food tax percentage, and the chosen tip percentage. It?s output is the bill total (food total, plus tax, plus tip). One would naturally surmise that the device works by sequencing through the following calculations, or some very similar to them: 72 2 Smolensky 1995c, pp. 244?245. $FOOD = SUBTOTAL/(1 + x/100) 3 $TIP = $FOOD (p/100) TOTAL = SUBTOTAL + $TIP, where x and p are the tax and tip percentages, respectively. Thus, it is natural to suppose that the Visa Box employs $FOOD representations in its calculations. But, in fact, the device works by calculating a number, w, and then multiplying w by the subtotal to obtain its output: w = (100 + x + p)/(100 + p) TOTAL = w (SUBTOTAL) How does the Visa Box, without tokening $FOOD representations, compute the correct TOTAL for a given set of inputs? Here?s a derivation that provides an ex- planation: TOTAL = w (SUBTOTAL) = [(100 + x + p)/(100 + x)] SUBTOTAL????????????????????????????????????????????????????????????????Substitution = [(1/100) (100 + x + p)/(1/100) (100 + x)] SUBTOTAL?????????????????Multiplication by ?????????????????????????????????????????????????????????????????????????????????????????????????????????(1/100)/(1/100) 73 3 This equation is derivable as follows: SUBTOTAL = $FOOD + $FOOD(x/100) SUBTOTAL = $FOOD(1 + x/100) SUBTOTAL/$FOOD = (1 + x/100) 1/$FOOD = (1 + x/100)/SUBTOTAL $FOOD = SUBTOTAL/(1 + x/100) = [(1 + x/100 + p/100)/(1 + x/100)] SUBTOTAL???????????????????????????????Distribution = (1 + x/100 + p/100) [SUBTOTAL/(1 + x/100)]???????????????????????????????Association: ????????????????????????????????????????????????????????????????????????????????????????????????????????(m/n)s = m(s/n) = (1 + x/100 + p/100) $FOOD?????????????????????????????????????????????????????????????Substitution = ($FOOD + $FOOD (x/100) + $FOOD (p/100)??????????????????????????????????????Distribution = $FOOD + $TAX + $TIP??????????????????????????????????????????????????????????????????????????Substitution = SUBTOTAL + $TIP This derivation provides an explanation of how the Visa Box works without its employing $FOOD representations. We see how the algorithm it uses and one we would naturally expect it to use each compute the same function. The explana- tion appears perfectly adequate. So, although the Visa Box does not employ $FOOD representations, an adequate explanation of how it works may nonethe- less appeal to $FOOD representations. On Smolensky?s view, this should not be surprising. For, first, the content [food price] is a constituent of each of the contents expressed by x and p, since x expresses the content [local tax percentage on food price] and p expresses the content [chosen tip percentage on the food price] for the relevant bill. Second, [food price] is also a constituent of the content expressed by TOTAL, since TOTAL = SUBTOTAL = $FOOD + $TAX. 74 It is useful, then, for [food price] to enter into both the semantic characterization of the function the Visa Box computes and the explanation of how the device works. Of course, it is given that the Visa Box does not use $FOOD representa- tions. But the fact that we may appeal to [food price] in order to explain how the device operates shows that an adequate explanation may (perhaps must) use representations that express that content. Regarding the particular explanation under consideration, the appeal to the representation $FOOD is explanatorily ade- quate, despite that fact that the explanation does not posit a causal mechanism that involves the tokening of $FOOD representations. Thus, we may say that $FOOD, together with the above derivation, is acausally explanatory?or, perhaps more clearly, mathematically explanatory. Let?s relate the above explanation to Smolensky?s explanation of systematic- ity. Representations containing $FOOD are to be taken as analogous to constituent structure trees. And the equality [(100 + x + p)/(100 + x)] SUBTOTAL = $FOOD + $TAX + $TIP is to be taken as analogous to the bi-unique derivation relations between con- stituent structure trees and tensor product representations. The conclusion is that Classical trees, together with Smolensky?s tree?vector algorithms, provide the basis of an adequate acausal explanation of systematicity, just as $FOOD and the 75 aforementioned equality provide the basis of an adequate acausal explanation of the operations of the Visa Box. So what?s the problem? The problem is that, insofar as the Visa Box explana- tion is adequate, it is not really an acausal one. Moreover, insofar as it is an acausal explanation, it does not explain how the Visa Box operates. In order to see why this is the case, it is necessary to distinguish two senses in which an ex- planation could be an explanation of ?how something works.? The question, ?How does it work?? is quite vague. It could mean, to mention just two possibilities, (1) How did its inventor get it to work in the way it does?, or (2) What operations does it perform? Clearly these are two very different questions, requiring two very different kinds of answers. Now, the Visa Box ex- planation would be a natural and adequate explanation of how someone was able to make the device work as it does. 4 But then it would not really be an acausal explanation. Food-price representations and representations of the above equality would be attributed causal roles, since the inventor of the device would have made use of them in designing it. 76 4 Compare Smolensky (1995c, p. 245), who ?marvels? at the ?ingenuity? of the person who made the device. Similarly, consider a ?Swamp Visa Box? (I assume you are familiar with Swampman). 5 We might learn what it does with numeric inputs. And once we learn that, we might also discover that we can use the object to calculate tips, re- alizing that [(100 + x + p)/(100 + x)] SUBTOTAL = $FOOD + $TAX + $TIP. But the explanation of that discovery would also attribute causal roles to food- price representations and that equality, since they would have been employed by those who discovered that the object could be used for said purpose. In the ab- sence of any such representations, we could never discover that we could use the object as a tip calculator. In contrast, it is not the case that either $FOOD or the equality are useful for explaining how the Visa Box or Swamp Visa Box operates on its inputs, or for ex- plaining the nuts and bolts of its operation. For neither $FOOD nor the equality have causal roles in the objects themselves. So what we have in the case of the Visa Box is an (implicit) adequate causal explanation of why it can be used to calculate tips, along with an adequate 77 5 For those unfamiliar with Swampman thought experiments, the Swamp Visa box is a molecule- for-molecule duplicate of the genuine Visa box, but it is not the creation of a minded being. It ?popped? into existence (appropriately enough, somewhere in a remote swamp with an aura of deep mystery) as the result of, say, astronomically improbable quantum events. The idea is that, since the Swamp Visa box is not an artifact, created for a particular purpose, it cannot be charac- terized in intentional terms (in particular, as computing over representations) unless it is used as a device having semantically interpretable states. representational-level, causal explanation of the operations it performs on its numeric inputs. But what we don?t seem to have is an explanation that appeals to $FOOD representations but does not require that they have causal roles. It might appear to someone that we do have such an explanation only if he or she fails to keep distinct the different senses of ?How does it work?? Now, since the Visa Box explanation and Smolensky?s explanation of syste- maticity are of the same type, what we have regarding the latter is an (implicit) adequate causal explanation of how a Connectionist network could be designed to compute the same functions as certain Classical architectures. We also have, as part of that explanation, an adequate representational-level, causal explanation of the operations such a network could perform on tokened activity patterns. But what we don?t seem to have is an explanation that appeals to Classical represen- tations or tree?vector algorithms but does not require that they have causal roles. In short, we don?t seem to have an acausal explanation of systematicity. Perhaps someone might think that my argument relies too much on the Visa box?s being an invention, a tool, without intrinsic content. But my claim is that the sort of acausal explanation at issue works only for such devices. My point can be illustrated by means of the following hypothetical example. Suppose there is an organism whose systematic behavior can be explained by attributing to it a Classical architecture. We eventually discover, however, that it does not have a 78 Classical architecture; rather, it has a Connectionist architecture. Thus, the Classi- cal explanation of the relevant behaviors is simply false. Nonetheless, we may suppose that nature, not us, designed the organism, so that the actual contents of its representational states are independent of our purposes and of how we think about those states. However, nature presumably doesn?t have available any sys- tem of representation that it can use for purposes of designing Connectionist minds. How, then, could nature have designed the organism to work as it does? How could nature have bestowed upon it systematically related cognitive ca- pacities? The answer to that question, I submit, would remain a mystery. In par- ticular, what would not be forthcoming is a Connectionist explanation that ap- peals to Classical representations and requires that they have causal roles. Look at the matter from a slightly different angle. Clearly, the organism?s vector representations wouldn?t encode, in the sense of having been translated from, Classical representations and their structures. But presumably they would encode their own semantic structures. So we?d be able to see how the organism appears to have a Classical architecture without actually having one. However, what we wouldn?t be able to see is how its vectors could have come to encode their own semantic structures in the first place. Where could such structures have been instantiated? Not in the organism?s architecture, since Connectionist net- works don?t support structured vehicles of content. Nor could they have been 79 instantiated by anything in the organism?s environment?again, nature pre- sumably doesn?t have available any system of representation that it can use for purposes of designing Connectionist minds. The only remaining alternative is that they could have been instantiated in minds like our own. So it would be clear how we might have been able to design such an organism. What would be not at all clear is how nature could have. In the previous chapter, I concluded that Smolensky?s explanation of syste- maticity is not a causal one. A few words need to be said about how that conclu- sion relates to the present argument. His explanation is not causal in that (1) it explicitly rejects causal roles for Classical representations and tree?vector algo- rithms, and (2) neither vectors, as nontokened vector consituents, nor informa- tion that certain vectors have particular constituents have causal roles. On the other hand, the explanation is causal in the sense that it is an (implicit) causal ex- planation of how a Connectionist network could be designed to compute the same functions as certain Classical architectures do, including functions from representations to their systematic variants. As such, but only as such, it is an adequate explanation. To put the point a bit cursorily, the explanation is an ade- quate one only if it attributes causal roles to Classical representations; but it doesn?t, so it isn?t. What we end up with, then, is neither an adequate causal ex- 80 planation of systematicity nor an adequate acausal explanation of systematicity. In short, we don?t have an adequate explanation of systematicity at all. 4.2 Moral of the Argument The argument of the preceding section applies to the explanatory adequacy of any Connectionist account for which the constituency relation is nonconcatena- tive, and hence it applies to any Connectionist account that is an alternative to the Classical picture. The problem systematicity poses for Connectionism is to show how Connectionist-network operations defined over syntactically simple representations nomologically must be sensitive to representational-constituent structure. Since vectors are syntactically simple, constituent structure must be en- coded. Moreover, encoding of constituent structure requires computation of a function from constituent structures to encodings. Now, the representational structures encoded are not the formal, configura- tional structures of representations supported by the relevant Connectionist net- works; such networks don?t support representations with concatenative constitu- ency. So they must be structures of representations instantiated outside such networks. But, as I?ve argued, a Connectionist (purportedly adequate) explana- tion of systematicity that adverts to such representations could at best provide an adequate explanation of how we could design a network to compute the same functions as certain Classical architectures do, including functions from repre- 81 sentations to their systematic variants. And one that does not advert to such rep- resentations could at best provide an adequate causal explanation of the opera- tions a network could perform on tokened activity patterns. But we don?t get a Connectionist explanation of systematicity per se. In this chapter, I?ve argued that a Connectionist explanation of systematicity would not be an adequate explanation of systematicity. What I show next is that if we nevertheless construe Connectionist explanations as explanations of syste- maticity, the result, not surprisingly, is that they become unprincipled in a rather serious way. 82 Chapter 5 Structure Sensitivity and Principled Explanation Another difficulty for Connectionist explanations of systematicity is that they appear to be unprincipled, arbitrary, or ad hoc in a rather serious way. 1 Cummins et al. (who defend Connectionist explanations of systematicity) introduce this objection as the claim ?that classical representational schemes predict systematic- ity, whereas connectionist schemes at best accommodate it.? 2 Our first task with regard to this objection is to see just what it amounts to. 83 1 The source of this objection is Fodor and McLaughlin (1995, p. 216). The objection is part of (or perhaps just is) their argument that Smolensky architectures don?t provide an adequate basis for explaining the lawfulness of systematicity. I don?t address the lawfulness issue independently of the ?principledness? issue. For I regard a principled explanation as a necessary condition of an adequate explanation of lawfulness. Cummins et al. (2001) discuss the principledness and law- fulness issues separately (the latter as somewhat of an afterthought), but they provide no reason at all for thinking that the two issues are separable. In any event, I?m willing to grant that if a Connectionist explanation of systematicity is principled, then it is a principled explanation of the lawfulness of systematicity. 2 Cummins et al. 2001, p.172. See also Cummins 1996, p. 605. Cummins et al. (2001) explicate this objection in terms of how a Classical parser would parse sentences as opposed to how a Connec- tionist parser would parse sentences. I instead present the objection in terms of how Classical and Connectionist cognitive systems are supposed to be able to think systematically related thoughts. 5.1 Prediction versus Accommodation of Systematicity Aizawa provides two cases from the history of science which illustrate well the nature of the objection under consideration. 3 One case concerns Darwin?s and the Creationist?s explanations of why the close resemblance between blind subterra- nean forms of organisms and their sighted, surface counterparts is tied to their geographical location. The other case concerns the Copernican and Ptolemaic ex- planations of the fact that Mercury and Venus, unlike the other planets, are never found in opposition to the Sun. Darwin 4 notes that the blind forms of insects that live in limestone caverns in the United States resemble their sighted counterparts on the surface, and that the same is true regarding blind and sighted forms in Europe. However, the relevant European and American blind insects don?t bear a close resemblance to each other, despite the close similarity of their environments. On the evolutionary ac- count, this is easily explained by the hypothesis that the blind forms and sighted forms, in their respective regions, evolved by natural selection from a common ancestor. For if that is true, the observed similarities and dissimilarities would be 84 3 Aizawa 1997. Aizawa argues that both the Classical and the Connectionist explanations of sys- tematicity are unprincipled. However, his argument works only if the kinds of representational- level processes required by each account are arbitrary with respect to the kinds of mental repre- sentations each account posits. I argue that this is not the case for Classicism, but that it is the case for Connectionism. The objection, as I present it, makes use of Aizawa?s cases but follows the outline of the objection as presented by Cummins 1996, pp. 605?608. 4 Darwin 1985, chapter 5, pp. 178?179. just what you would expect. On the Creationist?s account, the similarities and dissimilarities are due to the Creator?s plan. But given the close similarity of the environments of the American and European caverns, the Creator could just as easily have placed similar blind in- sects in the two habitats. Indeed, the Creator could just as easily have made the blind forms in Europe similar to the sighted forms in America, and vice versa. Nothing about Creationism alone precludes this. Creationism alone does not ex- plain the facts. In order to cover the data, then, the Creationist account must in- voke an arbitrary assumption to the effect that the Creator did one thing, when he or she could just as easily have done something else. This gives us a defeasible reason to prefer the Evolutionist explanation to the Creationist one. Turning to the Copernicus?Ptolemy case, the position of Mercury, as seen from Earth, never deviates from that of the Sun by more than about 28? of arc. Venus is never farther from the Sun than about 45?. On the other hand, the posi- tions of the other planets can deviate from that of the Sun by 180?. Now, the Co- pernican and Ptolemaic theories of the solar system both advert to deferents and epicycles. But the Copernican hypothesis that the planets orbit the Sun in the or- der Mercury, Venus, Earth, Mars, Jupiter, and Saturn provides an immediate ex- planation of the observation that Mercury and Venus never stray far from the Sun. No further assumptions are required. In contrast, the Ptolemaic theory pro- 85 poses that the solar bodies orbit the Earth in the order Mercury, Venus, the Sun, Mars, Jupiter, and Saturn. That theory alone, however, does not explain the planetary movements. Another hypothesis is required, namely, that the deferents of Mercury, Venus, and the Sun are ?locked? together, so that the centers of the epicycles of Mercury and Venus are always in line with the Sun (while none of the deferents of the remaining planets are locked with any other). Thus, unlike the Copernican account, in order to cover the data, the Ptole- maic explanation must invoke an arbitrary assumption. The Ptolemaic theory alone is insufficient. For while geocentrism allows the deferents of Mercury, Ve- nus, and the Sun to be locked together, it also allows them to be independent of each other. This gives us a defeasible reason to prefer the Copernican explanation to the Ptolemaic one. With respect to the above cases, the objection that the Connectionist expla- nation of systematicity is unprincipled likens Classicism to Evolutionism and Copernican theory, and likens Connectionism to Creationism and Ptolemaic the- ory. As we?ve seen (? 2.1), Classicism explains systematicity by hypothesizing that mental representations have a combinatorial syntax and semantics and that mental operations are sensitive to the syntactic properties of mental representa- tions. It will be useful here to revisit the essentials of that explanation (and the Connectionist style of explanation) in light of the preceding discussion. 86 Thus, consider a complex Classical mental representation, aLb. It is com- posed of the simpler representational constituents a, L, and b. By virtue of some of their local, physical properties, a and b have the syntactic role of designator, while L has the syntactic role of 2-place predicate. The mental processes that contribute to the formation of aLb are sensitive to those syntactic properties. From this it should be clear that if the relevant cognitive system can form aLb, it is to be expected that it can just as easily form bLa. The very same mental proc- esses which can construct the former can construct the latter. No additional op- erations are required. For a is placeable in the subject or object slot of L, as it were, by virtue of being a designator; and the same is true of b. Of course, when the constituents of aLb stand in construction so as to form that representation, a acquires the syntactic role of subject, and b acquires the syntactic role of object, and vice-versa for bLa. But these further syntactic roles are consequences of the representation forming process. The formation of tokens of aLb and bLa employ the very same types of mental operations. On Connectionism, representations are vectors, and mental processes are vector operations. Vectors are syntactically atomic, so a Connectionist explana- tion of systematicity cannot appeal to processes that are sensitive to their syntac- tic structure. But vectors are capable of encoding, by virtue of their local, physical properties, representational-constituent structure. So vector operations can be 87 sensitive to constituent structure through their sensitivity to the relevant physical properties of vectors. Thus, consider a vector, V aLb , where a, L, and b are its vector constituents. One way this vector can be tokened in a Connectionist system is by means of a vector operation on its vector constituents. But (in contrast with Classicism) the system must do more than just combine those constituents. It can?t merely su- perimpose them, say, for that wouldn?t account for the different structural roles of a, L, and b in V aLb . The Connectionist solution is to posit operations that bind constituents to the appropriate structural roles. 5 (As we?ve seen, Smolensky ar- chitectures bind a particular vector to a particular structural role by taking the tensor product of that vector and the vector that ?represents? that structural role [? 2.2].) So any process that constructs V aLb must be different from one that con- structs V bLa . For their constituents must be bound to the appropriate structural roles, and the structural roles of a and b differ in the two vectors. With role binding operations in place, systematicity is then explained in terms of the sensi- tivity of vector operations to the local, physical properties of complex vectors that encode all of the structural roles of their constituents. 88 5 Computer scientists call this variable binding. Notice that Connectionists also need to distin- guish vectors representing individuals from vectors representing attributes of individuals (cf. Marcus 2001) and thus need to posit operations that bind the former to designator structural roles and the latter to predicate structural roles. The principle ways Connectionists have attempted to achieve variable binding are reviewed in Browne and Sun 1999. This sort of explanation, however, like the Creationist and Ptolemaic expla- nations above, is unprincipled in that it requires an arbitrary assumption. It is not a tenet of Connectionism that networks have operations that bind vectors to structural roles. To employ such operations is not part of what makes a system a Connectionist network. What makes a system such a network is that its repre- sentations are syntactically simple vectors, and its operations are vector opera- tions, such as matrix multiplication. A Connectionist system could just as easily have structural-role binding operations as not have them. Therefore, Connec- tionism by itself fails to explain systematicity. More hypotheses are required. 6 This gives us a defeasible reason to prefer the Classical explanation to Connec- tionist explanations. 7 This objection is at times misunderstood. For example, Hadley complains that ? on the classical account, the systematicity of representations arises only in the presence of assumed algorithmic processes. ? It follows, then, that 89 6 Compare Phillips (1998, p. 157): an ?architecture based on a network of units coupled with a learning algorithm ? is attractive. It makes fewer commitments to the design of specific mecha- nisms that realize cognitive behaviours ?. Nevertheless, if one accepts the requirements of sys- tematicity, then those requirements are not met by just this type of architecture. Either additional properties are necessary to explain why networks are configured in a particular way so as to ex- hibit systematicity or additional subnetworks are required to preprocess potential components into similarity-based representations, for which it may be possible to demonstrate ? systematic- ity. Either way, the standard approach will not suffice.? 7 One could stipulate that the additional requisite hypotheses are part of the theory. But, as Ai- zawa (1997) observes, this move clearly wouldn?t help in the cases of Creationism and Ptolemaic Geocentrism. So it shouldn?t help in the case of Connectionism either. when the ? characteristics of a connectionist architecture are considered, we must permit the connectionist to assume that correspondingly general processing mechanisms are in place. ? Yet [Fodor and McLaughlin] seem unwilling to allow Smolensky the connectionist mechanisms that would permit a network to process his tensor-product representations ? in a manner that would engender systematic relations between those rep- resentations. 8 But the unprincipledness objection does allow the Connectionist correspondingly general processing mechanisms that permit a network to do the job. The point of the objection is that such mechanisms don?t guarantee that every network em- ploying them can do the job. Further mechanisms, not essential to or definitive of Connectionism, are needed. Hadley fails to see that the Connectionist mecha- nisms of a Smolensky architecture are of the latter sort, not of the former. Still, though I?ve already taken some pains to do so, I might not have made it sufficiently clear that the unprincipledness objection attributes correspondingly general processing mechanisms to Classicism and Connectionism. Perhaps I?ve failed to attribute to Classicism all the processing mechanisms the Classical ex- planation of systematicity requires, in which case I should attribute further (and correspondingly general) processing mechanisms to Connectionism. It?s impor- tant, then, to say a bit more on this issue. 90 8 Hadley 1997, p. 143 (emphasis in original). 5.2 The Nonarbitrariness of Classical Processes Aizawa 9 argues that the two hypotheses that mental representations have a com- binatorial syntax and semantics and that mental processes are causally sensitive to syntactic structure do not explain systematicity. His argument amounts to the observation that cognitive architectures of which those hypotheses hold can just as easily be nonsystematic as systematic. For one could easily program a system having such an architecture so that it can token, say, aLb, but not bLa. If that?s right, it looks like the Classicist can explain systematicity only by hypothesizing mental processes specifically designed to capture it. And that would mean either that the Classicist?s explanation is just as unprincipled as the Connectionist?s or that the Classicist must allow the Connectionist to appeal to correspondingly general mental processes. The easiest way to make it clear that Aizawa?s argument fails is to consider what it would take to program a Classical system that is capable of tokening aLb but incapable of tokening bLa. The representation forming mechanisms would have to be sensitive to more than just the syntactic properties of a, L, and b. Oth- erwise they could just as easily produce bLa as aLb. They would also have to be sensitive to the nonsyntactic ?shapes? of a and b. That is, a rule to the effect that ?If x = b and yLz, then x ? y? would be required. Such a rule, however, is com- 91 9 Aizawa 1997, pp. 127?135. pletely arbitrary with respect to Classicism. So within the Classical framework, it is asystematicity, not systematicity, that has to be specifically designed into Classi- cal systems. Although it is clearly possible to have a Classical system that can token aLb but not bLa, such a system would have to be specially so crafted. For Connectionism, the situation is reversed. A Connectionist architecture that is able to token both V aLb and V bLa will not be able to do so because they are systematic variants. Rather, a Connectionist architecture could do so only if that capacity has been specifically built into the system. It could just as easily have been built out of the system. There is here a connection with the argument of the previous chapter. Recall the hypothetical organism discovered to have a Connectionist cognitive archi- tecture. I argued that it?s clear how we might have been able to design such an organism, whereas it?s not at all clear how nature could have. Now it looks like we have a reason why that is so. For, insofar as the organism has a Connectionist cognitive architecture, it seems that nature could just as easily have made the organism?s mind nonsystematic as systematic. 92 5.3 Unprincipledness is Not Structured-Domain Relative Cummins and colleagues 10 (hereafter, Cummins) argue that the Classicist must either concede that the unprincipledness objection is not all that serious or admit that some, perhaps a great deal, of mental representation is non-Classical. Cummins begins by pointing out that acquiring knowledge about some do- mains requires acquiring knowledge about their underlying structure. Acquisi- tion of a language requires acquisition of its grammar. Learning which direction from a novel location is homeward requires learning the relationships between various directional cues and certain places you?ve been. Likewise, learning the layout of one?s environment requires learning the relationships among various locations within it. Some domains are not like this. Learning the state capitals does not require learning about any structural properties of states or their capi- tals, other than simply what capital is situated within what state. According to Cummins, the fact that a cognitive system has learned about the structure of a certain domain will manifest itself in various psychological ef- fects. That is, the system will become subject to certain psychological laws, the specific nature of which will depend on the structure of the relevant domain as well as on various properties of the system?s cognitive architecture and physical organization. Cummins calls such effects ?systematicity? effects. I call them 93 10 Cummins et al. 2001, Cummins 1996. ?structural? effects, so as to avoid the tendentious suggestion that this sort of systematicity is to be identified with the systematicity of thought. Cummins distinguishes primary structural effects from incidental structural effects. Primary structural effects are laws relating a cognitive system?s inputs to its outputs. If, for example, Andy has learned how to multiply integers, his cog- nitive system will be governed by a psychological law stating (more-or-less) that if a mathematically well-educated cognitive system s is asked on an exam to multiply two integers, n and m, s will, ceteris paribus, provide the answer nm. Incidental structural effects are the result of not only what a system com- putes but also a number of other factors, including what algorithms the system uses to perform its computations, the kind of hardware on which those algo- rithms are implemented, and the effects of external or internal environmental conditions on the system?s operation. Thus, two systems can exhibit the same pri- mary effect while exhibiting different incidental effects. Andy and Betty could ex- hibit the same primary multiplication effect, but they could nonetheless exhibit dif- ferent incidental multiplication effects if they use different procedures to multiply. Important to Cummins? argument is the distinction between structural rep- resentations, structural encodings, and pure encodings (all of which are repre- sentations). Structural representations and what they represent are isomorphs. They have constituents which represent (at least in the context of the representa- 94 tion as a whole) parts of the relevant domain; and how the constituents of such a representation are structurally related represents how the represented parts of the domain are structurally related. An accurate map of Boston is a structural representation of Boston. Classical binary trees can serve as structural represen- tations of sentences. The constituency relation for structural representations is a part ?whole rela- tion and thus a kind of co-tokening relation, as it is on the Classical account. However, it is not the case that all structural representations are Classical repre- sentations. For some structural representations (such as standard maps, photo- graphs, and scale models) do not have a combinatorial syntax and semantics. The content of a representational part of a structural representation need not be con- text independent. Nor must such a part, independently of its representational context, represent anything at all. Structural encodings, on the other hand, do not share structure with what they represent. However, the structure of what a structural encoding represents is systematically recoverable from it, by means of a genera?/productive algo- rithm. As we saw in Section 2.2, tensor product representations can serve as structural encodings of binary trees. G?del number representations can serve as structural encodings of sentences. 95 Finally, pure (or arbitrary) encodings do not share structure with what they represent, nor is the structure of what a pure encoding represents systematically recoverable from it. According to Cummins, an adequate argument from the systematicity of thought to the conclusion that mental representations are Classical requires the assumption that the systematicity of thought is an incidental rather than a pri- mary structural effect of having acquired knowledge about certain domains. For primary structural effects don?t provide us with evidence about how a cognitive system represents a domain or processes information about it. And that?s what?s at issue. So, for Cummins, an adequate Classical explanation of systematicity, as an argument for Classical mental representations, should have the following form: 1. There are incidental structural effects of having acquired knowledge of domain D. 2. If there are such effects, then mental representations somehow preserve information about D?s structure. 3. D?s structure is sentence-like. 4. Assuming that the structure of mental representations is sentence-like provides the best explanation of the fact that mental representations pre- serve information about D?s structure. 5. Therefore, mental representations have sentence-like, that is, Classical structure. 96 Cummins regards steps 1 and 2 as uncontroversial. He also thinks that positing structural representations is the most natural way to explain the various inci- dental structural effects associated with the acquisition of knowledge of different domains. So he considers step 4 to be very plausible. The trouble with the argu- ment, on his view, is that step 3 is clearly not true for every domain. Different domains have different structures; in particular, many domains have non-Clas- sical, that is, non-sentence-like, structures. Cummins? case in point involves the perception of objects in space. 11 If a cognitive system has learned about the structure of visual scenes containing dis- tinct objects, it will exhibit certain incidental structural effects. For example, any- one who can perceive (imagine) a scene in which two objects are situated at com- pletely distinct locations can also perceive (imagine) a scene in which the loca- tions of those two objects are switched. Such structural effects would be naturally explainable in terms of multidimensional-graph-like, structural representations having representations of objects among their constituents. However, Cummins maintains, such structural effects would not be naturally explainable in terms of Classical representations. For Classical representations have sentence-like struc- ture (a combinatorial syntax and semantics), not graph-like structure. Thus, 97 11 Cummins 1996, p. 604. whereas they could serve as structural encodings of visual scenes containing ob- jects, they could not serve as structural representations of them. On Cummins? account, then, any incidental structural effects of having ac- quired knowledge of a non-Classically structured domain provide the basis of an argument, of the above form, for the existence of non-Classical, structural mental representations. The result is (perhaps massive) representational pluralism: for every differently structured domain, we have an argument for the existence of a structurally distinct kind of structural mental representation. This result shouldn?t sit well with Classicists who believe that all thought is grounded in Classical representation. The Classicist can avoid this sort of representational pluralism, according to Cummins, by arguing that some incidental structural effects can be best ex- plained by appeal to Classical representations that are structural encodings of what they represent (that is, by rejecting premise 4 of the above argument). But, Cummins argues, if structural encoding is allowed, then the objection that the Connectionist explanation of systematicity is seriously unprincipled must be given up. For, according to Cummins, if the Classicist concedes that certain inci- dental structural effects can be given an adequate explanation by appealing to (Classical) structural encodings, then the incidental structural effects associated with the systematicity of thought can likewise be given an adequate explanation 98 by appealing to non-Classical structural encodings (say, tensor product repre- sentations). Such an explanation would be no more unprincipled than one that appeals to structural encodings that are Classical representations. So if Cummins is right, it looks like the Classicist must either give up the objection that the Connectionist explanation of systematicity is seriously unprin- cipled or admit that a great deal of mental representation is non-Classical. That?s Cummins? argument. The rest of this chapter is about why it fails. Before I set out my main responses, though, it is worth noting that it is not neces- sarily incompatible with Classicism if some kinds of mental representations are non-Classical. The important part of Classicism is that a significant part of a com- plete account of the mind will have to advert to Classical representations and Classical operations defined over them. The Classicist could allow that part of that complete account will have to appeal to non-Classical representations and processes. 12 In any event, Cummins hasn?t come close to showing that the Classi- cist must either concede that the unprincipledness objection is not all that serious or admit that some, perhaps a great deal, of mental representation is non- Classical. 99 12 See, for instance, Fodor 2000. 5.3.1 The Relationship between Content Structure and Representation Structure First, it is not at all clear that an adequate argument from systematicity to the conclusion that mental representations are Classical requires the assumption that systematicity is a structural effect of having acquired knowledge about certain domains. The Classical explanation of systematicity (? 2.1) makes no appeal whatsoever to knowledge about any particular structured domain. It?s an argu- ment from the systematicity of thought, and thought (fortunately) is not domain specific. Systematicity is neither a primary nor an incidental structural effect of having acquired knowledge about any specific domain. Rather, it?s a psychologi- cal effect of having acquired the ability to think or reason, regardless of domain. Cummins does have a reply to this objection. 13 The Classical explanation of systematicity appeals to the contents of mental representations. On Cummins? view, the problem with that approach is that there is no nontendentious way of identifying the systematic variants of a content. He first points out that an ade- quate Classical explanation of systematicity must not depend on the assumption that contents have the structure of Classical representations. Contents might be structurally atomic or have a different kind of structure. So we should not con- strue the notion of systematicity in this way: 100 13 See Cummins 1996, pp. 594?599. Systematicity 1 Anyone who can think a content of the form aRb can think a content of the form bRa. This problem can be avoided by construing systematicity somewhat as follows: Systematicity 2 Anyone who can think the content c can also think the systematic variants of c. But then the question arises, How are the systematic variants of a content to be identified? As it turns out, Cummins argues, what contents appear systematically re- lated, and what structure contents appear to have, depends upon the structure of the representations we use to refer to them. Thus, consider the claim (1) Anyone who can think the content [Andy loves Betty] can think the content [Betty loves Andy]. According to Cummins, the intuitive force of (1) is due entirely to the systema- ticity present in natural language?in particular, to the fact that the sentences ?Andy loves Betty? and ?Betty loves Andy? are systematic variants of each other. This should be clear, he says, from the fact that (1) would lose some or all of its intuitive force if we substituted atomic or differently structured (Classical or non- Classical) representations for those sentences. He asks us to consider claims such as the following: 101 (2) Anyone who can think Betty?s favorite content can think the content [Betty loves Andy]. Even assuming that Betty?s favorite content is [Andy loves Betty], Cummins maintains, (2) fails to elicit systematicity intuitions. Cummins also asks us to compare the following claims: (3) Anyone who can think that a face is smiling can think that a face is frowning. (4) Anyone who can image a smiling face can image a frowning face. Claim (3) is dubious. But, Cummins says, given an appropriate scheme of imag- istic representation (say, one which builds images from a palette of circles, lines, and arcs), claim (4) is quite plausible. For such a scheme would permit an image of a frowning face that is a permutation of an image of a smiling face. Thus, if our preferred scheme for representing contents were one of just that sort, then a suit- able counterpart of (3), say, (3*) Anyone who can think that A can think that K, would become plausible as well. 14 What these examples are supposed to show is that: Absent some representation-independent access to the structure of propo- 102 14 Cummins doesn?t appeal to any statement like (3*), but it seems to me that doing so further clarifies his point. sitions, which propositions seem to be systematic variants of each other will depend on one?s preferred scheme for representing propositions. If you linguistically represent the contents to be thought, then you will want mental representation to be linguistic, since then the systematicities in thought that are visible from your perspective will be exactly the ones your mental scheme can explain. 15 In short, an explanation of systematicity that identifies contents linguistically is one that covertly assumes that contents have the structure of Classical represen- tations, and that?s cheating. According to Cummins, then, what we need is a way of identifying syste- maticities in thought that is independent of any assumptions about the structure of thought contents. This can be done if we focus on how we acquire knowledge of structured domains. I don?t find this reply to the first problem I see with Cummins? argument to be all that forceful. To begin with, it?s not clear to me that the Classical explana- tion of systematicity makes any commitment to the structure of contents. 16 Sup- pose for the sake of argument that contents are atomic. That wouldn?t change the fact that they have many properties. For example, the contents [Andy loves Betty] and [Betty loves Andy] have different truth conditions and thus might 103 15 Cummins 1996, p. 597. 16 As McLaughlin (1993, p. 186) notes, ?classicism is not committed to what Clark (1988) calls ?the transparency thesis?, namely the thesis that there is a one-to-one correspondence between the concepts exercised in a thought and the (public language) words used to specify the content of the thought. The relationship between such words and concepts might prove to be quite compli- cated.? have different truth values. The first is true iff Andy stands in the loving relation to Betty, while the second is true iff Betty stands in the loving relation to Andy. So even if those contents are atomic, they stand in different relations to Andy, Betty, and the loving relation. And how one of those contents is related to those three things (whatever their ontological status) is quite plausibly systematically related to how the other content is related to them. My line of reasoning might seem to commit me to the possibility that the simplicity of mental representations is compatible with systematicity. For even if mental representations are atomic, they still have their contents and associated truth conditions. So perhaps thought is systematic just because truth conditions are. This move, however, wouldn?t work. We?d still have to explain why we can think thoughts with systematically related truth conditions. That is, we?d still have to explain why anyone who can think a thought, T, can think those thoughts the truth conditions of which are systematic variants of the truth condi- tions of T. And that appears to be difficult to do without positing a language of thought. I suspect that a defender of Cummins might respond by rerunning his ar- gument, substituting ?truth-conditions? for ?contents?; that is, by arguing that what truth conditions appear systematically related, and what structure they ap- pear to have, depends upon the kinds of representations that we use to refer to 104 them. What I would do then is rerun his argument, substituting ?structured do- mains? for ?contents.? The trouble is that we have to use representations to identify the relevant entities at some point. Otherwise, it?s hard to see how we could state our theories or theorize at all. In some circles, that might be seen as a problem for both the Classical view and Cummins? way of arguing from structural effects to the structure of mental representations. But it?s not a problem for either. And that?s because Cummins? hasn?t shown that what contents appear systematically related, and what struc- ture they appear to have, depends upon the structure of the representations we use to refer to them. Consider again claims (1) and (2): (1) Anyone who can think the content [Andy loves Betty] can think the content [Betty loves Andy]. (2) Anyone who can think Betty?s favorite content can think the content [Betty loves Andy]. Cummins does show that if we want to illustrate systematicity, we?d be advised (under normal circumstances) to use (1). But I don?t see why this is anything more than a matter of pragmatics. If Betty?s favorite content is in fact [Andy loves Betty], given an appropriate context (say, one in which it?s common knowledge what Betty?s favorite content is), (2) could be used to provide an example of sys- tematically related contents. Furthermore, and more to the point, if the phrase 105 ?Betty?s favorite content? were an idiom that had the content [Andy loves Betty], then (2) would serve as well as (1) as an illustration of systematicity. Compare: (5) Anyone who can think that Andy has been shanghaied can think that Andy has shanghaied some person or persons. Even though the sentences ?Andy has been shanghaied? and ?Andy has shang- haied some person or persons? are not syntactic permutations of each other, (5) seems to provide an acceptable illustration of systematicity. And (5) seems to do so because those two sentences have the contents they do. When providing an example of systematicity, the identification of the relevant contents does require the mediation of appropriate representations. But Cummins doesn?t show that it?s the structure (syntactic or nonsyntactic) of the mediating representations, as opposed to their contents, which determines whether or not we see systematicity in thoughts with those contents. After all, we use the words ?dog? and ?god? in that ?clauses to refer to the contents [dog] and [god]. And those words are sys- tematic variants of each other. But that they are so related doesn?t in the least in- cline us to believe that anyone who can token the concept having the content [dog] can token the concept having the content [god]. It may be worthwhile to note, regarding (5), that the sentences ?Andy has been shanghaied? and ?Andy has shanghaied some person or persons? arguably are syntactic permutations of each other. For syntactic structure need not neatly 106 correspond to surface structure. However, any argument for the claim that the two sentences have the same syntactic structure would quite plausibly have to appeal to their respective contents. And that would be to admit that syntactic structure and (internal or relational) content structure are intimately related, which is precisely what Cummins questions. Cummins? treatment of claims (3) and (4) is also flawed. (3) Anyone who can think that a face is smiling can think that a face is frowning. (4) Anyone who can image a smiling face can image a frowning face. The problem here is that Cummins? argument would appear to prove too much. Perhaps if our preferred representational scheme for representing contents was one of the imagistic sort Cummins envisages, then (4) and (3*) Anyone who can think that A can think that K, would gain intuitive force. However, if (3*) would gain intuitive force, so too should (6) Anyone who can think that A can think that . For ? A ? and ? ?would be structural permutations of each other. But that pre- sents a problem, since the content of ? ? could be merely arbitrarily relatable to the content of ? A ?, or it might have no content at all (even assuming the two 107 images are ?image-grammatically? well formed). It?s content could be, say, [The sculpture in the square is outr?]. And I strongly suspect that we wouldn?t find the claim Anyone who can think that a face is smiling can think that the sculpture in the garden is outr?, to be very plausible. And we certainly wouldn?t find a nonsensical claim to be plausible. Thus, even assuming a suitable system of representation, (6) might not serve as a plausible illustration of systematicity. Another of Cummins? examples is similar to the image case. Cummins claims that if you think mental representations are activation vectors, then you are entitled to Anyone who can think a thought of the form < ?a ? b ?> can think a thought of the form < ? b ? a ? >. 17 Well, perhaps if you can token one of those vectors, you can token the other. But again, the contents of those vectors could be merely arbitrarily related to each other, and the latter might not pick out a thought, even if the former does. In light of all this, it?s not too hard to see why (3*) has some intuitive force (within Cummins? scenario). It?s because the semantic relations between ?a face smiling? and ?a face frowning? are nonarbitrary. So it looks like we are free to ar- 108 17 Cummins 1996, p. 598. gue that because our intuitions about systematicity depend on semantic rela- tions, it can?t be right that it?s the (syntactic or nonsyntactic) structure of the me- diating representations, as opposed to their contents, which determines whether or not we see systematicity in thoughts having those contents. 5.3.2 Unprincipledness Rests with Vector Constituency, Not Encoding Let?s move on to a second serious problem with Cummins? argument. It doesn?t appear to be true that if Classicists grant that certain structural effects can be adequately explained by appealing to (concatenative) structural encodings, then they should also grant that the systematicity of thought can likewise be ade- quately explained by appealing to (nonconcatenative) structural encodings. For the objection that the Connectionist explanation of systematicity is unprincipled doesn?t turn simply on the fact that vectors are structural encodings. Rather, it turns on the fact that the constituency relation for vectors is nonconcatenative. So it?s hard to see how an explanation of certain structural effects in terms of struc- tural encodings that are Classical representations would be inadequate (because unprincipled). This should be fairly easy to show. Suppose we have a Classical, structural encoding scheme for representing maps. Suppose further that we have a Classi- cal system which can represent, in that scheme, any structural (?systematic?) variant of any map it can represent. Would any Classical explanation of this be 109 unprincipled in the way I have argued Connectionist explanations of systematic- ity are unprincipled? I don?t see why it should be. My point is easily made in terms of a relatively concrete example. Maps of Earth which illustrate plate tectonics provide a relatively good case of maps which are structural transformations of each other. 18 Consider two such maps, one that represents South America as being n miles from Africa, and an- other that represents South America as being m miles from Africa. Part of the ex- planation of the fact that Classical systems can represent one of these maps if they can represent the other would presumably appeal to facts like this: The Classical encodings of these maps each have constituents representing map?distance representations, Dxy = z. But they differ in that the encod- ing of the first map has the representation Dsa = n (but not Dsa = m) as a constituent, whereas the encoding of the second map has Dsa = m as a constituent (but not Dsa = n) (where the contents of these constituents are the obvious candidates). 19 The explanation would also appeal to the syntax sensitivity of the operations which construct the relevant representations, showing that if a Classical system 110 18 For present purposes, I?ll assume that to structurally permute a map is to rearrange its repre- sentational parts by rearranging (and possibly reshaping) those parts in such a way that they re- tain their contents. Different ideas of what counts as a structural permutation of a map would merely require a different sort of example. 19 It?s true that any artificial, Classical system of structural encoding will have to be tailored to the structural permutations one is interested to capture. But the same will be true for any artificial, distributed-vector system of representation. Furthermore, as I am in the process of arguing, this sort of arbitrariness is not what would make an explanation of domain-structure sensitivity that appeals to such a representational system unprincipled. can construct a representation of one map, then it can construct a representation of a structural permutation of that map, proceeding along the lines of the Classi- cal explanation of systematicity. Given a representation of a particular map, a Classical system?s syntax-sensitive operations allow it to construct a representa- tion of a different map, where the second has as parts the same representations as the original map but standing in a different arrangement. Now imagine a Connectionist system that captures the very same structural transformations that the above Classical system does. How does it do that? Well, it presumably employs an encoding scheme that represents map parts and map structural relations. And presumably its vector operations correspond to the ?le- gal? permutations 20 that can be performed on the map structures of interest; per- haps its vector representations and operations are related to those permutations in just the way that the vector representations and operations of a Smolensky architecture are related to Classical trees and rules for extracting and combining tree constituents. But now it looks like we can run the unprincipled-explanation objection against the claim that such Connectionist systems could serve as the basis of a good explanation of the cognitive capacity (assuming there is one) to represent systematic map variants. It?s important to emphasize that there are two distinct 111 20 I don?t have in mind any particular technical notion of legal permutation for maps. The intui- tive idea is that a legal permutation is one which yields a well-formed map. kinds of structural encodings. There are structural encodings having concatena- tive constituents and structural encodings with nonconcatenative constituents. The envisaged Classical system uses structural encodings of the former sort. With such representations (to echo a point Cummins makes regarding structural representations), ?the theorist is constrained by the form of the representations: you can only write permutation rules when there are permutable constituents.? 21 However, Connectionist models have no way of enforcing a comparably princi- pled constraint. For such models, it?s not the case that you can only have certain vector operations when there are permutable representational constituents, for there simply aren?t any such constituents. Such models, then, need structural-role binding operations, which are arbitrary with respect to Connectionism. So it looks like it?s the case that, for any structural map variants M and M*, a Connec- tionist system that is able to represent both maps will not be able to do so because they are structural variants. Rather, if a Connectionist system can represent certain map structural variants, it will be able to do so only because that capacity has been specifically built into the system. It could just as easily have been built out of the system. What misleads Cummins is his view, which we?ve rejected, that arguments from systematicity to the structure of mental representations ought to focus on 112 21 Cummins 1996, p. 607. the cognition of structured domains. From that perspective, a Classical explana- tion of the structural effects for, say, the linguistic domain would be principled because the structure of Classical representations is isomorphic with the structure of that domain. On the other hand, a Connectionist explanation of those effects would be unprincipled, because the structure of vectors does not mirror the structure of that domain. Likewise, a Classical explanation of the structural ef- fects for a domain having map-like structure would be equally unprincipled? map-like domains do not have syntactic structure. If the issue is couched in these terms, then the distinction between syntactically complex structural encodings and syntactically simple structural encodings is bound to seem irrelevant. But, as I?ve argued, it?s not. 5.4 Representations for Navigation I conclude this chapter by stating its bottom line in terms of the explanation of navigational capacities and by briefly noting a minor point relevant to represen- tational pluralism. I argued in this and the previous chapter that Connectionist explanations of systematicity are inadequate. This chapter yields a further conclusion. I will ar- gue that certain navigational capacities are best explained by appeal to mental representations for which the constituency relation is concatenative. If that is right, then, regardless of whether those representations are structural representa- 113 tions or structural encodings, and regardless of whether they are Classical repre- sentations, Connectionist explanations of the same capacities will be inadequate. Any such explanation will be subject to at least two strong objections: the expla- nation (if a good one) doesn?t explain what it sets out to explain (Chapter 4), and it is unprincipled. Finally, a relatively minor point relevant to representational pluralism. Even if adequately explaining certain navigational capacities requires positing map- like structural representations, that in itself is not logically incompatible with the possibility that the required explanation is a Classical one. For although ordinary maps lack a combinatorial syntax and semantics, this need not be true for formal maps (or for cognitive maps, if there are such things). That is, it is possible to de- vise limited, formal systems of representation that are both language-like and map-like. For illustrative purposes only, I?ve provided a simple, artificial exam- ple in Appendix A. 22 The expressive power of all such systems of representation very well might be quite limited. But that?s not a problem if adequately repre- senting facts in the relevant domain doesn?t require all that much expressive power to begin with. A system of representation devoted to or useful for naviga- tion might not need to be productive, say. 114 22 Compare Casati and Varzi 1999, chapter 11. Their system is more complex and less artificial than mine. The only reason I provide an alternative is that it is much easier to state formally. Chapter 6 Structure of the Honeybee?s Navigational Domain Although complex representations such as charts and maps are quite useful for purposes of navigation, in part because of the structural similarities they bear to the regions they represent, that does not imply that any animal capable of ac- complishing even fairly sophisticated navigational tasks represents some features of its environment with mental representations having similarly complex con- tents or configurations. Through careful behavioral or neurophysiological ex- perimentation, however, it is possible to discern the structural features, of a cer- tain domain, to which an organism is sensitive. The nature of such features, to- gether with details about the organism?s sensitivities to them, can provide clues regarding the semantic simplicity or complexity of its mental representations. And, if it turns out to be probable that they are semantically complex, then fur- ther considerations can be marshaled to address the question of the configura- tional structure of such representations. In this chapter, I review a number of recent behavioral studies which reveal various structures that honeybees, as navigators, are capable of learning. I also review those and other studies that show what bees are able to do with some of 115 that information. Based on the nature of the navigational capacities exhibited, I argue, in the next (and final) chapter, that certain classes of information acquired by honeybees exhibit systematicity. I conclude that we thus have at least one good reason to prefer Classical theories of honeybee navigational capacities over Connectionist ones. Note that while I speak of honeybees acquiring information about various distances and directions, relying on their solar compass, and so on, I remain neu- tral on the issue of what the contents and extensions of the representations in question actually are (I discuss this matter further in ? 7.1.1). Moreover, I want to avoid commitment to any particular theory of content or reference. 6.1 Simple Structures Honeybees are capable of acquiring information about a number of relationships between various places of interest. My focus is on distance and direction rela- tions. The following three sections are especially pertinent to the arguments for the systematicity of bee navigational capacities presented in Sections 7.1.2 and 7.2.1. There I argue that the general capacities described below require certain more-specific capacities to acquire systematically related information. One of my main points will be that the capacities of bees to coherently track locations of in- terest (including their own current location) require that the semantic relations among the items of information they acquire are nonarbitrary. 116 6.1.1 Distance and Direction Relations A honeybee learns the distance and direction, from the hive, of a foraging site it discovers. During the bee?s outbound flight, it continually updates the informa- tion it has about its location in relation to the hive by a process known as dead reckoning, or path integration: the bee continually integrates its most recent flight segment, or vector of travel, with the sum of its previous flight vectors. 1 The result is a single vector that informs it of its current direction and distance from the hive. When the bee discovers a foraging site, it stores some kind of rep- resentation of the site?s location; and when it returns to the hive, it can go there directly, even if its outward path was circuitous, and even if no landmarks near the hive are visible from the site. 2 The waggle run, or dance, is the means by which a honeybee informs other colony members of the approximate distance and direction to a foraging site or a potential nest site. 3 Some honeybee species, such as Apis mellifora and Apis cerana, orient their waggle runs with respect to gravity. They perform their runs in dark- 117 1 Two terminological matters. First, I use the terms ?integrate? and ?sum? here somewhat figura- tively, for convenience. At this point I?m not making any specific claims about the nature of hon- eybee mental representations or processes. Second, also for convenience, I will often rely on con- text to disambiguate terms such as ?vector??whether they refer to some feature of the environ- ment (or the bee?s own behavior) or to the bee?s information about such a feature. 2 For recent discussions of path integration in insects, see Collett (M.) and Collett 2000; Collett (M.) et al. 1998; Collett (T. S.) and Collett 2000, 2002; Giurfa and Capaldi 1999; Schmidt et al. 1992; Wehner et al. 1996, 2002; and Wohlgemuth et al. 2001. 3 von Frisch 1967, Riley et al. 2005. See Dyer 2002 and Michelson 1999 for recent reviews. ness on the vertical surface of a comb within the hive. The duration (and other properties) of the run corresponds to the distance to the site, and the angle of the run with respect to gravity indicates its current solar bearing (a vertical run indi- cates that the direction to the site is toward the sun). Honeybees can use remembered hive-to-site vectors to return directly to previously visited locations. They can also use remembered site-to-hive vectors to return directly to the hive after having been displaced by an experimenter to a familiar or unfamiliar site. 4 But they can learn vectors other than those relating the hive to other important locations. They can also learn ?local? vectors: those connecting a landmark (or other visual cue) to another landmark, or connecting a landmark to a goal site. 5 Honeybees estimate distance flown by monitoring optic flow, or image movement across the retina. 6 They estimate direction principally by means of their solar compass. 7 The next section presents features of the solar compass mechanism that are important for issues of cognitive architecture. 118 4 Menzel et al. 1998, 2000a, 2005; Riley et al. 2003; Sch?ne et al. 1998. 5 Chittka et al. 1995a,b; Collett (M.) et al. 2002; Collett (T. S.) and Baron 1994; Collett (T. S.) et al. 1993, 1996; Srinivasan et al. 1997; von Frisch 1967. Desert ants also learn local vectors, as shown by Collett (M.) et al. 1998. 6 Esch and Burns 1996, Si et al. 2003, Srinivasan et al. 2000. For optic-flow-based distance estima- tion by ants, see Ronacher and Wehner 1995. 7 Dickinson and Dyer 1996; Dyer 1987, 2002; Dyer and Dickinson 1994, 1996; Wehner 1983, 1984; Wehner et al. 1996. 6.1.2 Solar Compass and Solar Ephemeris Honeybees are able to use the sun (as well as the pattern of polarized sunlight) in order to set and hold a compass course. Because the sun moves in relation to the landscape, a bee?s returning to a familiar site at different times of day requires its flying at different angles in relation to the sun?s compass direction, or azimuth. In order to do so, it must be able to estimate how much the solar azimuth changes during the relevant time spans. This in turns requires the organism to be in- formed about the time of day (information provided by its circadian clock) and to have a record of the solar azimuth as a function of time of day. Such a record is called a solar ephemeris. The solar ephemeris varies with time of year and latitude. Hence, the current ephemeris for a particular locale must be learned. Complicating matters is the fact that the rate of change of the sun?s azimuth varies with the time of day, being slowest in the morning and evening and fastest at midday. Nonetheless, it has been shown that bees raised in an incubator and exposed to the sun only during a limited part of the day (for example, for a couple hours in the afternoon) learn the current solar ephemeris for their locale. 8 Honeybees learn how the solar azimuth varies in relation to the position or orientation of certain landscape features over the course of the day. Thus, on 119 8 Dyer and Dickinson 1994. heavily overcast days, when neither the sun nor the pattern of polarized sunlight is visible, honeybees can estimate the direction of the sun by means of familiar landscape features in conjunction with their internal solar ephemeris. 9 Remarka- bly, this ability allows bees to estimate the solar azimuth on moonlit nights, pro- vided that the necessary landmarks are visible. 10 The solar ephemeris learning mechanism, then, produces a record that al- lows bees to estimate the position of the sun at times when they do not see it (due to heavy overcast), have not ever seen it (due to controlled, limited expo- sure), or never will see it (due to the time of night). This strongly suggests that bees are capable of freely generalizing their solar ephemeris function to novel inputs. That is, for times of day for which a bee has 120 9 Dyer 1987. Sch?ne et al. (1998) report that, on sunny days, displaced bees tend to be initially but briefly disoriented upon release, if their vision of the surrounding terrain is blocked until the time of release. Their perception of the sun and its associated skylight patterns alone, under such con- ditions, appears to be insufficient for them to set a course. In light of this, Sch?ne et al. suggest that solar compass course setting involves integration of terrestrial and celestial cues. If ?terres- trial? is construed broadly, the truth of their suggestion is actually knowable a priori. For if what your compass ?needle? points at is in continual motion, it must be related to a stable reference direction if it is to be reliable. Consider how useful a standard compass would be in a possible world where the position of magnetic north varied predictably but rapidly over, say, 180?. Under such circumstances, a standard compass would be useless without a ?magnetic-pole ephemeris? chart. In the case of the bee, stabilizing cues are provided by its circadian clock and internal solar ephemeris. If ?terrestrial cue? is construed to mean ? landscape visual cue,? then Sch?ne et al.?s suggestion is clearly wrong. Route-trained bees do not need such cues to set a course and typi- cally ignore them when released at an unexpected location (Riley et al. 2003). On the other hand, terrestrial cues are essential for setting a correct course. For example, knowing which way is east or west doesn?t help much if you don?t know whether you are east or west of your destination. Sch?ne et al.?s observation, then, is probably related to the bees? attempting to locate themselves in relation to familiar landscape features. 10 Dyer 1985b. never experienced the corresponding solar azimuth, it nonetheless is able to fairly accurately map those times to azimuthal positions. As we?ll see (? 7.5), the ability to freely generalize certain universally quantified functions appears to require the exercise of rules that operate on instances of variables. Honeybees are capable of relating the solar ephemeris to different groups of landscape features. 11 This is necessary, for example, if the colony moves to a dis- tant new nest site. And there are further complications. The landscape features visible at (or along the way to) one foraging site will often be different from those associated with another site. Terrain features visible from two different sites will most likely be seen from those places from different perspectives. And the land- scape features visible from the hive will not be the same as those visible from relatively distant foraging sites. This fact is especially pertinent in the case of the Asian honeybee Apis florea. On overcast days, members of this species orient their horizontal waggle runs in relation to the panorama of vegetation near the nest; but that panorama is not available to foragers when away from the nest. 12 It is worth noting that members of A. mellifora can be trained to orient their waggle runs in relation to landscape features, even though they normally use only gravity or celestial cues as a reference. In fact, they are just as good at this as 121 11 Dyer 1987. 12 Dyer 1985a. members of A. florea. 13 As I point out later (? 7.2.1) the presence of such unexer- cised capacities is just what one should expect if related capacities exhibit a cer- tain form of systematicity. On the basis of the data reviewed above and in the previous section, it is clear that honeybees can acquire information about a large number of relations important for navigation. These include (among others): The distance between the nest and a particular goal site. The distance from the bee?s current location to the hive. For each ?time of day? and location of the bee, the solar bearing of a par- ticular goal site (nest, foraging site, etc.) at that time and place. For each time of day, the location of the solar azimuth in relation to the surrounding landscape. Also, there are different sets of these relations for (sufficiently) different places. For each time of day, the bearing of a particular goal site with respect to the landscape feature(s) associated with the solar azimuth at that time. This information is necessary in order to learn the location of a foraging site on the basis of gravity-referenced waggle runs on heavily overcast days. 14 So what a honeybee can learn about, say, a particular foraging site, consists of information about quite a number of relations involving that site. This is evident, even though we have yet to examine other navigational capacities; for instance, 122 13 Capaldi and Dyer 1995. 14 That the waggle dance communicates the distance and direction to a goal site, with a fair de- gree of precision, has been recently confirmed by Riley et al. 2005. how honeybees can employ landmarks to locate a goal. Nor have we considered certain capacities which are not strictly navigational, such as the capacity to learn the type of a certain foraging site (resin, pollen, nectar, water) and that site?s cur- rent value to the colony. 6.1.3 Updating Previously Learned Relationships Naturally, bees are sensitive to changes in their environment. What isn?t obvious is what they learn when they acquire information about such changes. How is the new information related to the old information? What information is updated and what information remains the same? Some light is shed on these issues by the results of station shift experiments. In such a study, the hive and feeder are placed along an extended, straight land- mark, such as a tree line, an edge of a field, or a row of artificial markers. The bees are trained to the feeder under sunny conditions. If a natural landmark is used, a test site that is just like the training site, except with respect to the com- pass orientation of the extended landmark, will have been chosen. After training, the hive and feeder arrangement is displaced to the test site. Since the bees will be unaware of their change in venue, they will be faced with conflicting informa- tion. Their memory of the landmark?s compass orientation, acquired at the training site, will differ from their experience of the landmark?s compass orienta- tion at the testing site. The experimenter can gain insight into how the bees re- 123 spond to that conflict by observing their waggle dances when they?ve returned to the hive from the feeder under sunny or overcast conditions. Gould 15 performed a station shift experiment in an open field, in which a row of artificial markers led from the hive to the feeder. Under sunny conditions, the arrangement was suddenly rotated by about 30?. The bees located the feeder by flying along the row of markers. Over a period of about 40 min, the direction indicated by their waggle dances gradually shifted from the solar bearing of the feeder as learned during training to the new solar bearing. That the shift was gradual suggests that the permutation of the training setup resulted in a corre- sponding permutation in how the bees represented that setup. In other words, it suggests that the bees altered their previously acquired information about the orientation of the extended landmark and the direction of the feeder. They did not simply acquire new, additional information. Dyer 16 performed a series of station shift experiments in which the orienta- tion, at the test site, of a field?s edge, differed by 90? from the orientation, at the training site, of the corresponding field edge. Tests began with the hive being opened under heavy overcast. The bees found the feeder by flying along the field?s edge, as they would have done during training. As long as the sky re- 124 15 Gould 1984. 16 Dyer 1987. mained overcast, their waggle dances indicated where the solar azimuth would have been if the hive had not been displaced, and thus they were off by about 90?. Once the sky opened enough for the bees to use their solar compass, Dyer observed five distinct types of responses. Most bees immediately adjusted their waggle dances to indicate the correct solar bearing of the feeder and continued to indicate the correct bearing when the sky again became heavily overcast. These bees disregarded the orientation of the training site field edge and learned the orientation of the test site field edge. Their doing so is compatible with their ac- quiring new, additional information as opposed to their altering previously ac- quired information. Some bees ignored the compass bearing of the visible sun for at least one trip. Sometimes many foraging excursions under bright sunlight were required before the bees? dances indicated the correct solar bearing of the feeding station. Some bees, like those in Gould?s study, exhibited gradual reorientation. The solar bearing indicated by their dances progressively shifted from the one correct for the training site to the one correct for their current location. Again, this sort of response suggests that the permutation of the training setup resulted in a corre- sponding permutation in how the bees represented that setup. Some bees indicated the correct solar bearing in the presence of sunlight but indicated the incorrect, previously acquired solar bearing when the sky again 125 became overcast. Also, some bees exhibited bimodal dances once the sun became visible. That is, they indicated both the old and new solar bearings on alternate waggle runs. The bees that exhibited one of these two responses must have stored their information about the original and new solar bearings separately. Further, the new and the old information could independently direct behavior. Bimodal waggle dances are particularly intriguing. One possibility is that the mechanism responsible for producing the angle of the waggle run with re- spect to gravity has simultaneous access to conflicting compass information but does not resolve the conflict. Another possibility is that the mechanism alter- nately accesses the conflicting information but does not detect the conflict. In ei- ther case, the ability of bees to perform bimodal dances would seem to require the ability to acquire respective items of information that ?predicate? two con- flicting attributes to one and the same thing. One way to investigate how bees deal with environmental changes on a smaller scale is to study how they modulate their learning flights under various conditions. To learn the precise locations of newly discovered foraging sites, bees will perform specialized learning flights on their departure. While facing the site, the bee slowly backs away, flying side-to-side in increasingly larger arcs roughly centered on the place of interest. Such a flight pattern is ideal for learning, via motion cues, the position of a site relative to nearby landmarks. The duration of a 126 bee?s learning flight declines over subsequent visits to the same location, until the flight pattern is no longer performed. Wei et al. 17 (hereafter, Wei) examined the factors that influence changes in learning flight duration. He introduced bees to an inconspicuous feeder situated near a tetrad of black cylinders, which served as proximal landmarks. The feeder and cylinders were contained within an oblong arena having 0.5-m-high walls, which blocked most external landmarks from the bees? view while they were in- side the apparatus. In one experiment, Wei measured the durations of the learning flights of in- dividual bees on repeat visits, beginning with their initial departure from the feeder. The duration of learning flights gradually decreased until the amount of time from when a bee left the feeder to when it left the arena stabilized. Once de- parture flight duration for an individual bee had stabilized, Wei imposed delays of various lengths between the time the bee arrived in the arena and the time it found the feeder. This was done by removing the feeder before the bee arrived and replacing it once the intended delay had been effected. Under natural condi- tions, an increase in search time might be the result of changes in the appearance or location of the local landmarks used to pinpoint the goal. 127 17 Wei et al. 2002. Wei found that the bees increased the duration of their learning flights after an enforced delay. The longer the delay between arrival at the arena and location of the feeder, the greater the increase in learning flight duration. Post-delay learning flights were briefer and exhibited a more rapid duration decay rate than post-initial departure learning flights. Some possible factors other than prior learning on the modulation of learning flight duration were ruled out. In another experiment, the entire formation of the cylinders and feeder was moved to a different place within the arena for each visit by an individual bee, beginning after their first departure flight. Consequently, the relationship be- tween arena-external and -internal cues was altered. The longest learning flights of the bees tested under these conditions tended to occur on the second or third departure, whereas the longest learning flights of the bees tested under stable landmark conditions occurred on the first departure. This suggests that the bees that encountered a change in scene upon their return performed a learning flight longer in duration than the one they would have performed if the scene had re- mained stable. Wei also showed that learning flight durations increased when bees were introduced to a new, qualitatively similar feeding site having a higher sucrose concentration. The bees modulated their learning flight durations in accordance 128 with the difference between the new concentration and the old one, rather than in accordance with the absolute value of the concentration. Barring the influence of unknown factors, Wei?s results indicate that bees update or acquire their information about feeding sites in light of their past expe- rience. Their behavior in this case does not follow a rigid, inflexible pattern. They do not mindlessly repeat the original learning process in response to perceived differences, as if learning about the site for the very first time. The delay and moving configuration experiments indicate that bees integrate remembered and current information in response to certain changes. Further, comparison of the learning flight durations in those experiments suggests that bees are capable of modifying their information about a particular place in a way that corresponds to the changes that occur at that very place. I should also note that Wei?s experiments provide examples of behavior that are difficult to explain without appealing to a notion like ?expectation? or ?pre- diction?. The behavior thus reveals the operation of learning mechanisms that go beyond those of simple association. For it would seem that the bees modified their learning flights when they encountered a delay that was longer than ex- pected or a location of the landmark array that was different than expected. This is just one example of why it is becoming increasingly difficult to explain honey- bees? behavior in nonrepresentational terms (? 1.5). 129 Chittka et al. 18 have provided a somewhat different sort of example with similar implications. They trained bees to a feeder in a flat, open area devoid of prominent landmarks, except for a car parked near the feeder They found that when the compass direction, from the hive, of the landmark-feeder configuration was slightly changed from its direction during training, but the relative positions of the landmark and feeder remained unchanged, bees did not correct their return-to-hive flight vectors. They flew along a vector that would have taken them directly to the hive during training. When the landmark remained in its training position, but the feeder was moved, the bees did correct their return-to- hive flight vectors. As in Wei?s experiments, learning appeared to be influenced by the occurrence of something unanticipated, either an increase in search time, a change in the location of the feeder in relation to the landmark, or both. 19 6.2 Complex Structures: Sequences, Rules, and Maps Honeybees are capable of learning about structures more complex than distance and direction relations between locations. They can learn vector sequences: or- dered lists of flight segments, each of which specifies a certain direction and dis- tance of travel. They can learn the correct path through a maze, their learning of which at least sometimes involves learning part of the maze?s structure. They can 130 18 Chittka et al. 1995b. 19 Some experiments with desert ants have similar implications. See Wehner et al. 2002. learn rules for negotiating mazes. There is even recent evidence that they can learn (derive) novel routes on the basis of stored place information and local cues, suggesting that some of the information they acquire can at least serve as a kind of map. 6.2.1 Vector Sequences Honeybees and other insects are able to learn a specific, landmark-based route from the nest to a familiar foraging site and to learn a specific route from the for- aging site back to the nest (the two routes might differ). 20 The same insect typi- cally takes the same routes each trip, while different insects may take different routes. Such learned routes are often complex, consisting of multiple segments of various distances and directions. Thus, a typical route might consist of segments such as: (1) a flight from the nest to a prominent landmark; (2) a flight from the vicinity of that landmark to another near the feeding site; and (3) a flight from the vicinity of the latter landmark to others very close to the site, with respect to which its location can be pinpointed. 21 Also, it is not unusual for a honeybee to visit more than one foraging site on a single excursion. Bees that do so can learn a 131 20 Collett (T. S.) and Zeil 1998. 21 Cartwright and Collett 1983; Chittka et al. 1992; Collett (T. S.) 1992, 1996. specific route connecting them. When they return to the sites, they will visit them in the same order along the same ?trapline? route. 22 What do honeybees learn that enables them to follow complex routes? They do learn route segments; but how are the memories about different route seg- ments related to each other? In particular, are they stored completely independ- ently of one another, or are they somehow combined into a memory of the entire route? If the latter, then bees do not simply acquire sets of memories that happen to get triggered in the appropriate sequence; rather, they acquire memories of sequences. That would suggest that the content of such memories is complex, having as constituents memories of individual route segments. The experiments I?m about to discuss suggest that bees do in fact acquire memories of sequences. 23 I?ll argue (???7.1.2 and 7.2.1) that bees? capacities to acquire information about the different sorts of sequences discussed in this section and below (? 6.2.2.1) exhibit systematicity. This section and others (?? 6.2.3.2 and 6.2.4) also serve as the basis for an argument that some bee representational constituents play a semantic role akin to that of indexicals (? 7.3.3). Collett et al. 24 (hereafter, Collett) performed a series of experiments designed to provide insight into what honeybees learn when they learn complex routes. 132 22 Heinrich 1976, Janzen 1971, Kratzsch et al. 1998, Manning 1956. 23 See also Chittka et al. 1995b. 24 Collett (T. S.) et al. 1993. One set of experiments specifically addressed the issue of whether bees learn an ordered list of flight vectors?that is, whether they learn something roughly like the following: first, fly n distance units in direction d; second, fly m distance units in direction d*; and so on. Collett trained bees to fly an obstacle course contained within a box (Fig. 6.1). The course required the bees to fly in a zig-zag pattern, through holes in transparent plexiglas partitions, in order to reach a sucrose re- ward. The holes were very difficult for the bees to see, so they were sometimes marked with either small, black disks (just above them) or black rings (around them). The markers, however, were periodically removed during training to pre- vent the bees from becoming too reliant upon them. The walls and floor of the 133 Figure 6.1. Plan view of the principal train and test course configurations employed in Collett?s vector sequence experiments. The entrance and partition holes were 15 cm in diameter. The feeder entrance hole was 2 cm in diameter. Coordinates of the holes for training: entrance hole, (130, 30); first partition hole, (160, 60); second partition hole, (90, 120); feeder hole, (160, 180). Co- ordinates of the entrance hole for displacement tests: (0, 90). Axes units are centimeters. e, en- trance hole. f, feeder hole. Redrawn from Journal of Comparative Physiology A, vol. 172, 1993, pp. 693?706, ?Sequence learning by honeybees,? Collett, T. S., Fry, S. N., and Wehner., R., Figure 1, ? Springer-Verlag 1993, with kind permission of Springer Science and Business Media. box were white with random dark marks. The marks provided stabilizing visual cues (input for optic flow and perhaps some distance-to-wall information). Tests were conducted with the plexiglas partitions removed. In some tests (standard tests), the entrance hole remained at its training location. In other tests (displacement tests), the position of the entrance hole was shifted, enough so that the bees might be able to detect some of the resulting differences in their location as they flew through the box (Fig. 6.1). The displacement tests were performed to see whether the bees would fly to specific places in the box, rather than princi- pally rely on remembered vectors. The standard-test trajectories turned out to be significantly different from the displacement-test trajectories. In standard tests, the locations of the bees? first and second turns inside the box were approximately the same as they had been dur- ing training. In displacement tests, the resultant shift in each turn?s location, along each axis, was approximately the same as the amount of displacement of the entrance hole. Furthermore, in each sort of test, when the position of the first turn in an individual bee?s flight path differed from the correct location, there was a slight tendency for the position of the second turn to differ from the correct location by the same amount. The second flight segment, then, did not appear to correct for any inaccuracies in the first. 134 Thus, the results were consistent with the hypothesis that the bees learned an ordered list of vectors, and they were significantly different from what they would have been had the bees learned only to fly to specific locations in the box. In other words, the only apparent cause of the bees? adopting the second flight segment in the training sequence was the playing out of the first flight vector. So it would seem that the vector memories for the two flight segments must have been directly linked in some way. Similar results were obtained with a different zig-zag route and with a route consisting of two turns in the same direction. Although the above experiments suggest that the bees relied principally on their memories of the appropriate flight vector sequence, Collett acknowledges that there probably were some relatively weak effects of place information on the bees? trajectories. Those effects could be explained in terms of slight differences between the visual scenes at the two partition hole locations. For although the panorama within the box was fairly uniform, a bee relatively close to one wall would quite likely experience a visual scene that was, in detail, a bit different from the one it would experience if it were close to the opposite wall. The black marks on the close wall would likely appear to be larger and more distinct than those on the opposite wall. 135 Collett 25 recently investigated the effects of panoramic context on honeybees? performance of route flight segments. His results confirmed the hypothesis that bees are capable of learning vector sequences. He trained bees to a food reward situated within a channel that contained two landmarks between its entrance and the feeder?s entrance. The experiments made use of two types of landmarks, boundary landmarks and isolated landmarks. Boundary landmarks are sharp transitions between two perceptibly different panoramic contexts. An example of such a landmark would be anywhere along the border between an open meadow and a forest. Isolated landmarks are prominent, localized landscape features for which the panoramic context encountered before the feature (relative to a line of travel) appears similar to the context encountered after the feature. An example of this type of landmark would be a solitary tree in an area of grassland. Different groups of bees were trained with, respectively, two different types of channels (Fig. 6.2A). One type of channel contained two boundary landmarks, each of which was an abrupt change in wall pattern from randomly distributed black and white squares to alternating black and white vertical stripes. The first boundary occurred 1?m beyond the channel entrance, and the second occurred 1?m beyond the first. The feeder entrance (a 10-mm round hole in one of the walls) was positioned 1?m beyond the second boundary. The other type of chan- 136 25 Collett (M.) et al. 2002. nel contained one boundary landmark and one isolated landmark, a baffle through which the bees flew. The boundary occurred 1?m beyond the channel entrance, and the baffle was situated 1?m beyond the boundary. The feeder en- trance was another meter past the baffle. Throughout training, the channels ex- tended at least 4?m beyond the feeder entrance (the entrance?landmark?feeder configuration was regularly moved along the channel walls in order to control for various cues). 137 Figure 6.2. Train and test configurations in Collett?s channel experiments. Wall patterns are shown together with the locations of boundary landmarks (open triangles), baffles (filled trian- gles), and the feeder entrance (filled arrows; open arrows indicate the training position of the feeder in relation to the last landmark). (A) Training configurations for bees trained with two boundary landmarks (top) and with one boundary landmark and one baffle (bottom). (B) Test configurations for boundary-only-trained bees with the distance from the channel entrance to the first boundary increased by 1?m from the training distance (top) and for baffle-trained bees with the distance from the entrance to the boundary increased by 2?m from the training distance (bot- tom). (C) Test configurations for boundary-only-trained bees (top) and baffle-trained bees (bot- tom) with the distance between the landmarks increased by 1?m from the training distance. (D) Test configurations for boundary-only-trained bees (top) and baffle-trained bees (bottom) with the final landmark removed. Adapted from various figures in Collett et al. 2002. Collett performed three series of tests. For each test, the wall segment that, during training, contained the feeder entrance was replaced with an identical segment that did not contain a hole. In one series of tests, the relative positions of the landmarks remained as they were during training (Fig. 6.2B). The distance from the channel entrance to the first landmark either was the same as in training or was increased. For all tests in this series, and regardless of the types of land- marks employed, bees searched at the training distance from the final landmark. That they did so, regardless of the distance from the channel entrance to the first landmark, confirmed earlier findings 26 that bees? searches are sometimes con- trolled by a local vector extending from a particular landmark to the place, rela- tive to that landmark, where the goal had been. In the second series of tests, the second landmark was placed 2?m (rather than 1?m) beyond the first landmark, where the feeder entrance had been relative to the first landmark during training (Fig. 6.2C). Bees trained with two boundary landmarks searched at the training distance from the final landmark, as they had done in the previous series of tests. Bees trained with the baffle, however, exhib- ited a search pattern centered at the baffle. The baffle, then, did not activate a baffle-to-goal vector. Still, like the bees trained with two boundary landmarks, 138 26 Srinivasan et al. 1997. they did search at the appropriate location in relation to the final (and only) boundary landmark. In the final series of tests, the second landmark was removed (Fig. 6.2D). Bees trained with the baffle searched about 2?m after the boundary landmark (at the appropriate location in relation to that landmark), as they had done in the second series of tests. Their search, however, was a bit broader, and its focus was a bit less well defined, than the searches of baffle-trained bees in the first series of tests. Adding 2?m to the distance between the channel entrance and the boundary shifted the focus of the search farther into the channel, well beyond where it should have been had the bees been guided by an estimate of the distance of the feeder from the hive or from the channel entrance (rather than an estimate or es- timates in some way related to the boundary). Bees trained with only boundary landmarks exhibited a search which was much less constrained than that of the baffle-trained bees. It also lacked a well-defined focal point. Clearly, though, the focus of their searches was well past the correct location. Most of the bees flew until they were close to the end of the channel before turning back. So how are the above results pertinent to the issue of whether the bees learned a sequence of vectors? The results for boundary-only-trained bees, by themselves, are consistent with their having learned independent flight seg- 139 ments, each associated with one of the boundary landmarks. 27 They might have learned, in effect, merely to fly to the second boundary upon encountering the first, and to fly to the feeder upon encountering the second. Perhaps their flight segment memories were triggered by only their having seen the landmarks asso- ciated with them, with the role of active local vectors having been only to sup- press the vector recall system until they were played out. In short, their perform- ance, taken alone, could be explained in terms of their having relied upon memo- ries recalled in sequence as opposed to a recalled memory of a sequence. The performance of the baffle-trained bees, however, cannot be explained in quite the same fashion. 28 For, unlike the boundary-only-trained bees, they did not use a landmark-to-feeder vector upon encountering the second landmark (for them, the baffle) in tests in which it was moved (Fig. 6.2C). Nor did they prema- turely search near the training location of the baffle (relative to the first land- mark) in tests in which it was moved or removed, even though (i) the available visual stimuli at that location were more consistent with the feeder?s training location than the baffle?s training location (Fig. 6.2C,D), (ii) the learned land- mark-to-feeder vector couldn?t have been triggered by the baffle (which wasn?t there), and (iii) there consequently wasn?t a baffle-triggered active vector that 140 27 Collett does not explicitly make this point, though I assume he would accept it. He says nothing to the contrary. 28 This claim is not explicit in Collett (M.) et al. 2002. However, it?s truth, or the truth of another claim to the same effect, appears to be necessary in order for their argument to go through. could have suppressed a search for the goal. Moreover, it is unlikely that the baffle-trained bees employed either only a baffle-to-feeder vector or only a boundary-to-baffle vector. Otherwise, in one or more tests, they presumably would have searched at a much shorter distance from the boundary than they in fact did. What remains to be determined, then, is whether the baffle-trained bees re- lied upon a memory of a single boundary-to-feeder vector or upon a memory of a sequence of vectors (from the landmark to the location of the baffle, then from the location of the baffle to the location of the feeder entrance). The former alter- native is compatible with the bees? having learned the difference between the global coordinates of the boundary and those of the feeder entrance. Collett argues that the baffle-trained bees relied upon a remembered se- quence of vectors. First, when the test configuration of landmarks was the same as in training (the first series of tests), the searches for the two groups of bees were quite similarly focused. This suggests that each group relied upon a vector from the second landmark to the feeder location. For spread of search is posi- tively correlated with local-vector length. 29 That the baffle-trained bees relied upon a baffle (location)-to-feeder vector is further supported by the fact that their 141 29 Srinivasan et al. 1997. searches were much more focused when the baffle was at its training location (in relation to the boundary) than when it was removed. Second, the searches of bees trained with only boundary landmarks were not controlled by a single, remembered local vector from the first boundary landmark to the feeder entrance. Otherwise, in tests in which the second bound- ary landmark was shifted to the location (relative to the first landmark) of the feeder entrance during training, they would have searched at the second bound- ary landmark. Instead, they searched at the trained distance from that landmark. The absence, in boundary-only-trained bees, of the operation of a single vector from the first landmark to the feeder strongly suggests the absence of the opera- tion of such a vector in baffle-trained bees. The only remaining alternative is that their searches were produced by a recalled sequence of local vectors. Collett considers his major finding to be that, in every test, the two groups of bees searched at the trained distance along the panoramic context that contained the feeder. Furthermore, the only case in which bees did not search at the trained distance from the last-encountered boundary landmark was when boundary- only-trained bees were tested with the final training landmark removed. In that test, the panoramic context of the feeder occurred nowhere along the channel. As Collett points out, this suggests that the correct panoramic context is necessary for activation of the appropriate local vector. 142 Panoramic contexts, though, cannot be relied upon to precisely specify loca- tions. They are, by definition, much the same over a wide area. And for baffle- trained bees, in every test, the panoramic context was the same from just past the boundary landmark to near the end of the channel. (Recall also that, in Collett?s vector sequence experiments, the panorama was fairly uniform throughout the box.) Furthermore, there is no reason to think that any other sensory information relevant to where the baffle should have been in tests was acquired by the bees at or near that point. It is highly likely, then, that the principal cause of the activa- tion of the baffle-trained bees? baffle (location)-to-feeder vector was the playing out of their boundary-to-baffle vector. Again, it appears that those vectors must have been connected in memory. 6.2.2 Maze Learning The capacity of honeybees to learn to correctly negotiate various sorts of mazes has implications regarding what they are able to represent. Several studies sug- gest that bees are able to represent maze configurations, or sequences of sensory stimuli or motor commands. Other studies suggest that bees can represent rules for navigating mazes. 143 6.2.2.1 Configurations and Sequences Honeybees can learn to fly the correct path through a maze containing several decision points, without the help of markers to guide them. Zhang et al. 30 (here- after, Zhang) successfully trained bees to follow the correct path through one of either of two such mazes (Fig. 6.3). Zhang took his results to suggest that the bees learned either the spatial lay- out of the maze or the sequence of the correct turns through it. Unfortunately, for 144 30 Zhang et al. 1996. Figure 6.3. Two mazes used by Zhang in his maze learning experiments. The width and height of each box was 30 cm. Every box had four 4-cm diameter holes, one in the center of each wall, with one or two of the exit holes blocked. The interior walls were effectively textureless. Boxes with two exit holes (decision boxes) are numbered. A solid line indicates the correct path through the maze; a broken line indicates an incorrect path. e, maze entrance; f, feeder. Reprinted from Neuro- biology of Learning and Memory, vol. 66, Zhang, S. W., Bartsch, K., and Srinivasan, M. V., ?Maze learning by honeybees,? 267?282, ? Copyright 1996, with permission from Elsevier. my purposes, the results in question can be explained without appealing to the bees? having learned either sort of structure. First, consider maze 1 31 (Fig. 6.3). Decision boxes 1 and 3 have the same compass orientation and the same exit hole locations. Also, the correct turn is to the left in decision box 1 and to the right in decision box 3. So it is true that bees repeatedly navigating the maze without making errors (at above-chance levels) would require their having information which enables them to treat the two boxes differently. As we are about to see, however, the required information could be in the form of a sequence of memories, rather than a memory of either a sequence or a spatial layout. Notice that the entrance to the maze, which is also the entrance to decision box 1, could be taken, not implausibly, to be a boundary landmark, marking the transition from the panorama of the lab to the panorama of the box?s interior. On the other hand, the entrance to decision box 3 does not mark a transition of dis- tinct panoramas. Furthermore, decision box 2 is in detail visually distinct from the other boxes, in that one of its exit holes is to the left and the other is straight ahead. So the bees? performance could be explained by their having learned to do the following: To turn left just after encountering the lab?maze boundary landmark. 145 31 What I call maze 1 is called path 2 in Zhang et al. 1996. Upon entering a box having a single exit hole, to fly through it. Upon entering a box having an exit hole to the left and one straight ahead, to fly through the one straight ahead. Upon entering (without encountering a boundary landmark) a box having an exit to the left and one to the right, to turn right. Moreover, it would seem that the above acquired information need not be linked in memory in order for the bees? to correctly navigate the maze. Second, consider maze 2 32 (Fig. 6.3). Decision box 3 is in detail visually dis- tinct from the other boxes in that one of its exit holes is to the left and the other is straight ahead. Also, each of the remaining decision boxes differ in compass ori- entation. Since it is known that honeybees learn the compass orientation (relative to their line of flight) of landmarks along a route, 33 it is not implausible that the bees in Zhang?s experiment learned the respective compass orientations of the relevant boxes in the maze. So the bees? performance could be explained by their simply having associated the appropriate behaviors with the relevant visual stimuli and compass information. They need not have acquired a memory of a sequence or of a spatial layout. For one of the experiments with maze 1, Zhang did control for compass in- formation by frequently rotating it during training. However, as we?ve seen, 146 32 What I call maze 2 is called path 9 in Zhang et al. 1996. 33 Cartwright and Collett 1983, Collett (T. S.) and Baron 1994, Dickinson 1994, Dyer 1987, Gould 1984. compass information is not required to reliably negotiate maze 1. Compass in- formation could be crucial for learning maze 2. But Zhang did not control for compass information with maze 2; he controlled only for odors by exchanging some of the boxes for one test. I?ve assumed that, for the bees in Zhang?s experiments, open holes were visually distinct from blocked holes. This assumption could be questioned, since it appears that blocked holes were covered on only the exterior surfaces of the boxes. Zhang, though, does not address this issue. Nor does he relate how often, if ever, bees attempted to fly through blocked holes. If, however, my assumption is incorrect, then his claim that his results suggest that the bees learned either the spatial layout of the maze or the sequence of the correct turns through it would become more plausible. I would welcome that outcome, for we would then have a plausible case of bees? having learned another kind of complex structure, in addition to vector sequences. In fact, Collett 34 has provided such a case, one in which honeybees appear to have acquired information about a maze?s configuration. 35 His experiments not only support the claim that bees can acquire semantically complex information (and that the relevant capacities exhibit systematicity) but also raise the possibil- ity that bees are capable of transitive reasoning (? 7.4). 147 34 Collett (T. S.) et al. 1993. 35 Pastergue-Ruiz and Beugnon (1994) obtained similar results with ants. Collett trained honeybees to negotiate a relatively simple, maze-like appa- ratus comprised of three boxes, placed end to end (Fig. 6.4). For most of the ex- periments, the training configuration was as follows. Two distinct patterns were fixed to the back wall of each box, one on the left and one on the right. A hole 2?cm in diameter occurred in the center of each pattern. The hole in one of the two patterns (the positive stimulus) led either to the next box (if any) or to a su- crose reward. The hole in the other pattern (the negative stimulus) led to a small, blocked-off compartment. The left ?right positions of the patterns were frequently switched, whereas the same two patterns always occurred in the same boxes (and no pattern occurred in more than one box). Thus, a bee could learn to fly 148 Figure 6.4. Plan views of sample train (top) and test (bottom) configurations of the apparatus in Collett?s visual-sequence learning experiments. Bees flew from left to right. Each box was 40 cm high, 60 cm wide, and 50 cm long. The entrance to the first box was 5 cm in diameter. All patterns were 25 by 25 cm. The walls and floor of each box were white, with random dark marks. W, white; Blk, black; Y, yellow; Blu, blue; H, black?white horizontal stripes; V, black?white vertical stripes; +, positive stimulus, ?, negative stimulus. Training configuration redrawn from Journal of Comparative Physiology A, vol. 172, 1993, pp. 693?706, ?Sequence learning by honeybees,? Collett, T. S., Fry, S. N., and Wehner., R., Figure 10, ? Springer-Verlag 1993, with kind permission of Springer Science and Business Media. directly through the apparatus to the reward only if it learned, for each box, which of the two patterns was the positive stimulus. In tests, the negative pattern in one of the boxes was replaced with the posi- tive pattern from one of the other boxes, resulting in a box having two positive stimuli. The remaining patterns were left unchanged (Fig. 6.4). As in training, the left ?right positions of the patterns were frequently switched, so that each pattern was on a particular side of the back wall for half of the trials. In one experiment, the pairs of training patterns in the front, middle, and back boxes were, respectively, white paper (positive) and black paper (negative), blue paper (positive) and yellow paper (negative), and black?white vertical stripes (positive) and black?white horizontal stripes (negative). After the bees had learned which pattern of each pair identified the way to the reward, they were tested with both the white pattern and the vertical pattern (positive for the front box and back box, respectively) in either the front box or the back box. The bees preferred the white pattern in the front box, and they preferred the vertical pattern in the back box. Similar results were obtained in four other tests that paired the positive stimuli from the front and back boxes. Results were different when bees were tested in the middle box, with the positive stimulus of that box set beside the positive stimulus from one of the other boxes. For all such tests, the bees either preferred the positive stimulus 149 from the front box or the back box or showed no preference. Nonetheless, the way they treated a pair of test stimuli in the middle box was different from the way they treated the same pair of test stimuli in one of the other boxes. In three of six experiments, the preference for the positive stimulus of either the front box or the back box, when bees were tested in one of those boxes, was significantly stronger than it was when bees were tested in the middle box. The results of Collett?s experiments, then, suggest that the bees learned the order in which they encountered the relevant positive stimuli. They certainly did not learn merely to fly through the opening in any positive pattern. Furthermore, Collett attempted to gain insight into what cues told the bees where they were in the sequence. A ?box swapping? experiment ruled out the possibility that the bees discovered or created differences among the boxes them- selves, independent of their position in the series. And the following experiment told against the possibility that the bees simply associated the positive stimulus in (or the appearance of) one box with the positive stimulus in the next. Bees were trained with yellow paper marking the entrance to the boxes (which was always on the left), white (positive) and black (negative) in the first box, blue (marking the only exit and always on the right) in the second, and ver- tical (positive) and horizontal (negative) in the third (Fig. 6.5). As in the other ex- periments, the left ?right positions of the positive and negative stimuli were fre- 150 quently switched. In two respective tests, bees chose between white and vertical in the front box and the back box. As expected, they preferred white in the front box and vertical in the back box. In a further test, bees chose between white and vertical in the middle box (Fig. 6.5). The back box remained the same as in train- ing, whereas the front box was made to look as similar as possible to the middle box in training, with blue on the right marking the only exit. Nonetheless, the bees preferred white in the middle box and vertical in the back box. They did not, then, simply associate the perceived characteristics of the middle box in training with the succeeding, vertical positive stimulus. Collett did not discuss the possibility that the bees (in this experiment) learned to correctly navigate the apparatus by their having associated only the 151 Figure 6.5. Plan views of the train (top) and one of the test (bottom) configurations of the appa- ratus for Collett?s ?blue?single exit? sequence learning experiment. For further details, see the caption to Figure 6.4. W, white; Blk, black; Y, yellow; Blu, blue; H, black?white horizontal stripes; V, black?white vertical stripes; +, positive stimulus, ?, negative stimulus. Adapted from Journal of Comparative Physiology A, vol. 172, 1993, pp. 693?706, ?Sequence learning by honeybees,? Collett, T. S., Fry, S. N., and Wehner., R., Figure 10, ? Springer-Verlag 1993, with kind permission of Springer Science and Business Media. global or local positions of the boxes with the appropriate positive stimuli. That is, one possible explanation of the results is that the bees associated certain ranges of distance?their estimates of their distance from, say, the entrance of the apparatus?with the respective correct choices. In other words, they might have associated the location of the middle box in training with the succeeding, vertical positive stimulus. The results, however, count against this sort of explanatory hypothesis as well. It has four possible versions. (1) The bees? associated their (local or global path integration) coordinates with the pattern positive for the currently occupied box, and, (a) upon entering the front box (testing), they reset their coordinates to those appropriate to the middle box (training), or, (b) they did not reset their coordinates. (2) The bees? associated their coordinates with the pattern positive for the box (if any) which came just after the currently occupied box, and, (a) upon entering the front box (testing), they reset their coordinates to those appropriate to the middle box (training), or, (b) they did not reset their coordinates. Every version is consistent with the bees? having flown through the blue-marked opening in the front box, it having been the only available alternative. Every ver- sion is consistent as well with the bees? having chosen vertical over white in the 152 back box. But it is difficult to reconcile any version with the bees? having pre- ferred white over vertical in the middle box. Hypotheses (1a) and (2a) maintain that the bees reset their coordinates upon entering the front box. Thus, on (1a), when the bees were in the middle box (test- ing), their coordinates would have been appropriate to the back box (training). Since (1a) requires the bees to have chosen the positive pattern for the box they took themselves currently to be in, they should have chosen, in the middle box, the positive pattern for the back box. That is, they should have preferred vertical over white, contrary to their actually having preferred white. On hypothesis (2a), when the bees were in the front box (testing), their coordinates would have been appropriate to the middle box (training). Bees in the front box, then, would have taken the subsequent box (in reality, the middle box) to be the back box. Since (2a) requires the bees to have chosen, in what they took to be the subsequent box, the positive pattern for that box, they should have chosen, in the middle box, the positive pattern for the back box. Again, they should have preferred vertical over white, contrary to their actually having preferred white. Hypotheses (1b) and (2b) maintain that the bees did not reset their coordi- nates upon entering the first box. On either hypothesis, then, the bees were highly likely to have been correct about which box they were in. Thus, on (1b), bees in the middle box would have taken themselves to be in a location interme- 153 diate with respect to that appropriate for choosing white (the front box) and that appropriate for choosing vertical (the back box). In that case, they should not have shown any significant middle-box pattern preference. Hypothesis (2b) would have predicted the same result. The bees would have associated their first- box location with exiting the second box through the blue pattern. In tests, when they arrived in the middle box, no blue pattern was present. So, considering just the hypothesis in question, it should not have made any difference to them which middle-box pattern to choose. I hope to have established the plausibility of the possibility that the bees in Collett?s experiments acquired a memory of a sequence (the box-to-box sequence of positive stimuli) rather than behaved in accordance with sequentially recalled memories. For example, in the experiment just examined, they may have stored a representation having a content somewhat analogous to [white, then blue, then vertical] or [white before blue and blue before vertical]. 36 We thus have another plausible case of bees? having learned a kind of complex structure. 154 36 If in fact this is correct, then there is a possibility that the bees? having preferred white when tested in the middle box was a result of a kind of reasoning process. From ?white before blue and blue before vertical,? say, the bees might have derived ?white before vertical.? Of course, it is also possible that the bees independently learned ?white before vertical.? More on the possibility of reasoning in honeybees will be presented below (? 7.4). 6.2.2.2 Rules Zhang 37 performed maze experiments in addition to those described in the pre- ceding section. For many experiments, he trained honeybees to correctly negoti- ate mazes (such as those shown in figure 6.3) by following marks of a particular color. He found, for example, that bees trained on one maze with one color are able to accurately navigate a differently configured maze by following either marks of the same color or marks of a different color. He also found that bees so trained are able to negotiate an identically configured maze without marks. Their performance is less accurate than it is in the case of marked mazes, but it is still significantly more accurate than the performance of controls. My intent is not to evaluate or examine the implications of the experiments just mentioned. Instead, I focus on another of Zhang?s maze experiments. Using maze 3 (Fig. 6.6), 38 he trained bees to turn right when the wall opposite the en- trance to a compartment was blue and to turn left when that wall was green. The only nonmarked compartments were those both having a single exit hole and not requiring a turn. Tests were carried out with mazes 3, 4, and 5 (Fig. 6.6). The test with maze 4 was performed immediately after the test with maze 3, and the test with maze 5 was performed immediately after the test with maze 4. 155 37 Zhang et al. 1996. 38 What I call mazes 3?5 are called paths 6?8, respectively, in Zhang et al. 1996. The bees performed very well in every test, and their levels of performance in the three tests did not significantly differ from one another. The percentages of error-free trials were 92.2% for maze 3, 97.7% for maze 4, and 93.2% for maze 5. Two explanations of these results readily come to mind. One is that the bees simply associated turning left with green and turning right with blue (and con- tinuing straight ahead with the color of the bare walls). The other is that the bees learned a (nonassociative) rule that caused them to go through the hole right of the colored wall when it was blue and to go through the hole left of the colored wall when it was green. Zhang?s results do not help us to decide between these alternatives. A key feature of a rule, as I here employ the notion, is that it allows its pos- sessor to generalize over a broad range of different stimuli, where that range in- 156 Figure 6.6. The maze configurations used by Zhang to test the ability of bees to turn left or right in response to color cues. A solid line indicates the correct path through the maze; a broken line indicates an incorrect path. For further details, see the caption to Figure 6.3. Reprinted from Neu- robiology of Learning and Memory, vol. 66, Zhang, S. W., Bartsch, K., and Srinivasan, M. V., ?Maze learning by honeybees,? 267?282, ? Copyright 1996, with permission from Elsevier. cludes stimuli that bear no apparent resemblance to those included in the train- ing set. Ruled-based generalization, then, is different than association-based gen- eralization, in that the latter involves generalizing only over stimuli that are similar to those used in the training set. As remarked in the first paragraph of this section, bees trained to negotiate mazes by following marks of a single color did at least appear to exhibit some ability to generalize beyond the training con- ditions. But Zhang did not perform experiments designed to assess whether or not the bees had the ability to generalize beyond the training conditions of the experiment currently in question. On the other hand, Giurfa et al. 39 (hereafter, Giurfa) did perform simple- maze experiments which showed that honeybees are indeed able to generalize a learned task to novel, dissimilar stimuli. Thus, it is likely that they acquired a rule, rather than an association. As we?ll see in the next chapter, Giurfa?s results are relevant to the issue of whether different types of bee representations have different semantic roles (? 7.3.2). They also bear on whether bees implement rules that operate on the values of variables (? 7.5) and on whether some honeybee cognitive processes are sensitive to the constituent-structure of the representa- tions on which they operate (?? 7.4 and 7.5). 157 39 Giurfa et al. 2001. In the first stage of Giurfa?s experiments, he successfully trained six respec- tive groups of bees to solve four delayed matching-to-sample tasks and two de- layed non-matching-to-sample tasks. A Y-maze served as the experimental appa- ratus (the configuration of the maze for one experiment is shown in Figure 6.7). In the delayed matching-to-sample experiments, the bees encountered one of a pair of stimuli (the sample stimulus) at the maze entrance. (Which of the two served as the sample was varied.) The entrance arm of the maze ended at a chamber in which the bees had to decide between the two remaining arms. One arm contained the sample, or matching, stimulus, while the other arm contained the nonmatching stimulus. The bees were rewarded only if they chose the arm which contained the matching stimulus. (Which arm served as the ?matching? arm also was varied.) The training procedure for the delayed non-matching-to- 158 Figure 6.7. Configuration of the Y-maze use by Giurfa in a delayed matching-to-sample experi- ment in which bees were trained with odors and tested with colors. The odors were presented by means of odorant-soaked tissues in perforated vials. Exhaust fans prevented odor mixing in the decision chamber and removed feeder odors. Baffles prevented the bees from experiencing the stimuli present in a chamber until they had entered it. In the transfer test, the scented vials were replaced with visually identical, odorless vials. b, baffles; c, colors; d, dummy vials; e, entrance; o, odor vials; f, feeder; x, exhaust fan. (Adapted by permission from Macmillan Publishers Ltd: Na- ture, vol. 410, pp. 930?933, Giurfa, M., Zhang, S., Jenett, A., Menzel, R., and Srinivasan, M. V., ?The concepts of ?sameness? and ?difference? in an insect,? ? Copyright 2001.) sample experiments was the same, except the bees were rewarded only if they chose the arm which contained the nonmatching stimulus. In each experiment, after the bees had learned the relevant discrimination, Giurfa performed a test to determine whether or not the bees would transfer what they had learned to a pair of novel stimuli. The pairs of train and transfer test stimuli used in the experiments are given in Table 6.1. The levels of perform- ance of the bees in transfer tests were about the same as the respective levels of performance they had achieved in training. 40 Thus, the bees not only learned the 159 40 The one exception was experiment 3, in which bees were trained on radial and circular gratings and tested on oriented (45? and ?45?) linear gratings. Nonetheless, the bees? preference for the appropriate test grating was highly significant (P < .001). Experiment Stimulus Pairs Train Test Experiment 1 Blue Yellow Vertical grating Horizontal grating Experiment 2 Vertical grating Horizontal grating Blue Yellow Experiment 3 Radial grating Circular grating Oriented (45?) linear grating Oriented (?45?) linear grating Experiment 4 Lemon odor Mango odor Blue Yellow Experiment 5 Blue Yellow Vertical grating Horizontal grating Experiment 6 Vertical grating Horizontal grating Blue Yellow Table 6.1. Stimulus pairs used in Giurfa et al.?s delayed matching-to-sample (1?4) and delayed non-matching-to-sample experiments (5 and 6). All gratings were black?white. matching and nonmatching tasks but also transfered what they learned to novel stimuli. Furthermore, they exhibited transference not only between different sorts of visual stimuli but also from olfactory stimuli to visual stimuli. Giurfa took his results to strongly suggest that honeybees have the capacity to acquire (or make use of) sameness and difference concepts. Depending on what one means by ?concept,? that may or may not be the case. For example, it?s not clear that the bees? could have solved the tasks only if they had made use of representations with the content [same] or [different]. What Giurfa?s results more clearly suggest is that the bees acquired a rule that operates on a variable. In par- ticular, in the ?matching? transfer tests, they seem to have made use of a rule something like, ?Choose the x-marked arm if x was at the entrance,? where x ranges over (at least) colors, patterns, and odors. It?s possible that neither the ac- quisition nor the execution of such a rule requires explicit judgments about whether what is now present is the same as what was present at the entrance. The bees? exhibited capacity to generalize to novel stimuli is key. They cer- tainly did not simply associate which of two specific stimuli occurred at the maze entrance with the reward arm of the maze. Otherwise, they would not have been able to transfer their learning across different sorts of visual stimuli, much less across different sensory modalities. One might be tempted to suggest that the bees associated ?whatever stimulus? was (or was not) at the entrance with the 160 reward arm. But it should take at most only a bit of reflection to see that ?what- ever stimulus? is a variable. The question then arises what it could be to ?associ- ate? a variable with something, if not to acquire a rule that operates on a variable. 6.2.3 Novel Shortcuts and Vector Averaging Displacement experiments are those in which bees are captured at the hive or a foraging site, transported to a familiar or unfamiliar location, and then released. Their subsequent course is then recorded. Such experiments have proven useful for revealing what bees are capable of learning about the layout of their foraging territory. They have also proven useful for illuminating some of the ways in which bees? current motivations interact with their current sensory information and their recalled and stored locational information. 6.2.3.1 Novel Shortcuts to the Hive Menzel et al. 41 (hereafter, Menzel) performed a series of experiments that demon- strated (among other things) the capacity of honeybees to take a novel route when displaced to an unfamiliar location. Menzel accounts for his results by ar- guing that the novel-shortcut bees averaged known site-to-hive vectors to obtain a novel site-to-hive vector. I argue that Menzel?s hypothesis is indeed the best explanation of his results (? 6.2.3.2). If that?s the case, then it appears that bees are 161 41 Menzel et al. 1998. capable of performing operations defined over certain semantic constituents of complex representations (? 7.4). Menzel?s results also have implications regard- ing the circumstance- and motivation-independence of certain bee representa- tional constituents (? 7.1.2). The Classicist can explain such independence by positing context-independent syntactic constituents. Menzel trained bees to forage at two feeding stations, one in the morning and the other in the afternoon (Fig. 6.8). The area chosen for the experiments was unfamiliar to the bees, and there were no natural food sources in the regions around or between the hive, the feeding sites, and the two release-only sites. 162 Figure 6.8. A map of the area chosen by Menzel for his displacement experiments. The landscape was dominated by a large, cone-shaped hill, surrounded by flat farmland. Sm, morning site; Sa, afternoon site; S3, Site 3;, S4, Site 4. Reprinted from Animal Behaviour, vol. 55, Menzel, R., Geiger, K., Jourges, J., M?ller, U., and Chittka, L., ?Bees travel novel homeward routes by integrating separately acquired vector memories,? pp. 139?152, ? Copyright 1998, with permission from El- sevier and Randolf Menzel. Consequently, the trained routes were the only routes established by the bees in the experimental area. The morning site was situated within an area of harvested agricultural fields, with no apparent local landmarks within a 150-m radius. The afternoon site was about 60 m from a low bush, which was visible to the bees. Site 3, a release-only location, was situated within an area of uniform grassland. A clump of trees and a few scattered trees should have been just visible to bees at the spot. Site 4, another release-only location, was in a pasture. A row of bushes along a creek and some scattered tall trees were nearby landmarks. In experiment 1, bees were captured at the hive upon arrival from one of the feeding sites. The bees were expected to be motivated to get back to the hive to discharge their foraging load. Bees arriving from the morning site were displaced to either the afternoon site or Site 3. Bees arriving from the afternoon site were transferred to either the morning site, Site 3, or Site 4. Bees (controls) that had visited only the afternoon site were transported to Site 3. Sites 3 and 4 were very unlikely to have been visited by the bees. As in all of the experiments, all bees were released within 20 min of capture. The direction in which a bee departed from a release site was estimated by recording its vanishing bearing, or the com- pass direction of its flight at the point at which it disappeared from view. 163 The bees which had learned the locations of both feeding sites and were dis- placed to the morning site, the afternoon site, or Site 3 flew toward the hive upon release (Fig. 6.9). The bees displaced from the afternoon site to Site 4 flew in the direction that would have taken them from the afternoon site to the hive. The bees which had visited only the afternoon site, when released at Site 3, also flew in the direction that would have taken them from the afternoon site to the hive. The results showed that bees familiar with the two feeding sites could recall the homeward vector, learned at a different time of day, appropriate to the feed- 164 Figure 6.9. Distributions of vanishing bearings of bees captured at the hive upon arrival from the afternoon site (open circles) and from the morning site (filled circles), in Menzel?s vanishing bearing, displacement study. Thick arrows indicate the means of the distributions. Thin arrows indicate specific headings that the bees might have adopted. H, hive; Sa, afternoon site; Sm, morning site; S3, Site 3; S4, Site 4. Reprinted from Animal Behaviour, vol. 55, Menzel, R., Geiger, K., Jourges, J., M?ller, U., and Chittka, L., ?Bees travel novel homeward routes by integrating sepa- rately acquired vector memories,? pp. 139?152, ? Copyright 1998, with permission from Elsevier and Randolf Menzel. ing site to which they had been transported. Also, the group of such bees re- leased at Site 3 took a novel shortcut from there to the hive. Landmarks near the hive were thought to be imperceptible to bees at Site 3 (based on what is known about their visual resolution). The fact that the bees which had visited only the afternoon site adopted the afternoon-site-to-hive direction when released at Site 3 controlled for both the possibility that the novel-route bees steered toward a bea- con near the hive and the possibility that they relied on learned-route-associated landscape features. It also suggests that having learned the two feeder?hive routes was necessary for having been able to take the novel shortcut. The novel- route bees, then, must have somehow combined the two respective route memo- ries. In addition, the fact that the bees released at Site 4 adopted the afternoon- site-to-hive direction suggests that bees don?t combine route or vector memories whenever they are released at an unfamiliar location. In experiment 2, bees were captured at the hive when they were about to depart to one of the feeding stations. The bees were expected to be motivated to get to the feeding site appropriate to the period of time, morning or afternoon, during which they were captured. They were displaced to either the morning site if captured in the afternoon), the afternoon site (if captured in the morning), Site 3, or Site 4. Bees (controls) that had visited only the morning site were trans- ported to Site 3. 165 The bees displaced to the afternoon site in the morning flew toward the hive; they did not adopt the course that would have taken them to the morning site (their original destination) had they not been displaced, nor did they set off in the actual direction of the morning site. As in experiment 1, they recalled the home- ward vector appropriate to the site to which they had been transported. The bees displaced to the morning site in the afternoon adopted one of two headings. About half of them flew toward the hive, whereas as the other half headed in the direction that would have taken them from the hive to the after- noon site, their original target. They did not take the actual course to the after- noon site. Menzel explains the difference in behavior between the bees displaced to the morning site and those displaced to the afternoon site in terms of differences in the local cues available at the two locations. The landmarks visible from the af- ternoon site were more prominent than those visible from the morning site. The former was characterized by a nearby bush, and it was much closer to the large hill than the latter. Thus, the bees transferred to the afternoon site were more likely to recognize their location than those transferred to the morning site. By the same token, they were also more likely to change their original feeder-direct- ed motivation (and corresponding flight vector) to a hive-directed one. 166 The vanishing bearings of the hive departing bees transported to Site 3 ex- hibited a bimodal distribution. This was the case for both the morning-displaced bees and the afternoon-displaced bees. Most of the bees in each group departed Site 3 on a course that would have taken them from the hive to the time-appro- priate feeder. A significant proportion of the bees in each group took the novel course toward the hive. However, none of the peaks in the distribution of van- ishing bearings corresponded to the actual direction from Site 3 to the time- appropriate feeder. The bees transferred to Site 4 in the morning chose the hive-to-morning-site compass direction and hence behaved as if they had not been displaced. The bees which had visited only the morning site, when released at Site 3, also behaved as if they had not been displaced?they, too, opted for the hive-to-morning-site di- rection. These results reaffirm the implications of experiment 1. The possibility that the novel-route bees homed toward a beacon near the hive, as well as the possibility that they were attracted to familiar-route-associated landscape fea- tures, was excluded. Also, learning the two feeder?hive routes appears to have been necessary for taking the novel shortcut. So it again appears that the novel- route bees somehow combined the two respective route memories. Furthermore, the fact that the bees released at Site 4 behaved as if they had not been displaced 167 suggests that bees don?t automatically combine route or vector memories when- ever they are released at an unfamiliar location. In experiment 3, bees were captured at the feeders, some upon arrival and some upon departure. Bees arriving at or departing from the morning site were displaced to either the afternoon site or Site 3. Bees arriving at or departing from the afternoon site were displaced to the morning site. Feeder arriving bees were expected to be motivated to feed and thus to return to the feeding site at which they were captured, if able to do so. Feeder departing bees were expected to be motivated to return to the hive. All of the bees taken from the morning site set a course that, in the absence of displacement, would have taken them back to the hive from the morning site. This was the case, regardless of whether they were captured upon arrival or de- parture, and regardless of whether they were displaced to the afternoon site or Site 3. Menzel found this result surprising, since hive arriving and hive departing bees were able to fly directly home from the feeding sites and Site 3, and since feeder departing bees should have been at least as motivated to return to the hive as bees from either of those two groups. Menzel suggests both that the home- ward vector is loaded into working memory upon arrival at a feeding site and that it is strong enough to override local landmark information. (I?ll suggest a dif- ferent explanation shortly.) 168 Bees captured at the afternoon site and released at the morning site showed a bimodal distribution of vanishing bearings, presumably uncorrelated with whether the bees were captured upon arrival or departure. About half of them oriented their flights in the afternoon-site-to-hive direction, whereas the remain- der oriented their flights toward the hive. The bees captured at the afternoon site, then, behaved in a different manner than those captured at the morning site. Menzel?s explanation of the behavior of the bees captured at the morning site doesn?t explain the behavior of the bees captured at the afternoon sight. If the homeward vector in working memory was strong enough to override local landmark information in the former case, then it should have been strong enough to override local landmark information in the latter case as well, since the landmarks at the morning sight were less prominent than those at the afternoon site. The bees released at the morning site would have had less local information to override than the bees released at the afternoon site. An alternative explanation focuses on the prominence of local cues at the capture site, rather than the strength of the homeward vector and the prominence of local cues at the release site. Since the morning site was characterized by rela- tively few local cues, the bees captured at that site were likely to be disposed to rely upon compass information, rather than local cues, to set their course upon departure. This explains their failure to notice that they had been displaced to 169 either Site 3 or the afternoon site. The afternoon site, on the other hand, did have at least one prominent local feature, a nearby bush. So it is reasonable to suppose that they were more disposed than morning-captured bees to rely upon local cues to set their course upon departure. Consequently, they should have been more likely than morning-captured bees to recognize where they were once re- leased and to set the correct homeward course. This explanatory hypothesis pre- dicts that, if the experiment is repeated under the same conditions, then a signifi- cant portion of bees captured at a location corresponding to the afternoon site will fly directly to the hive if displaced to a location corresponding to Site 3. Note that the just-offered explanation comports well with the idea that bees learn sequences of route segments (?? 6.2.1?6.2.2.1). For if that idea is correct, it would explain how bees could already be disposed to fly in the feeding-site-to- hive direction upon arrival at that site. A route from the hive to a feeding site and back to the hive can be viewed as one journey with multiple route segments, just as well as a route from the hive to a foraging site, or one from the hive to a for- aging site and then to another foraging site. From his results (summarized in Table 6.2), Menzel infers that the course a displaced bee sets upon release depends upon both its motivation when captured and the information it acquires at the release site. It does appear likely that the bees linked their memories of the feeder-to-hive vectors to cues available at the 170 feeding sites. 42 This explains why hive departing bees and hive arriving bees were able to set the correct course from either feeding site to the hive. It explains why hive departing bees transported to the afternoon site were more likely than those transported to the morning site to choose the correct course home. And, as we?ve just seen, the behavior of feeder arriving bees and feeder departing bees can be explained in terms of the relative prominence of local cues at the two feeding sites. 171 42 This finding corroborates Wehner et al. 1990. Release site Hive arriving Hive departing Feeder arriving Feeder departing Morn Aft Morn Aft Morn Aft Morn Aft Sm Sm-to-H Sm-to-H H-to-Sa Sa-to-H Sm-to-H Sa-to-H Sm-to-H Sa Sa-to-H Sa-to-H Sm-to-H Sm-to-H S3 S3-to-H S3-to-H S3-to-H H-to-Sm S3-to-H H-to-Sa Sm-to-H Sm-to-H S4 Sa-to-H H-to-Sm S3 a Sa-to-H H-to-Sm Table 6.2. Courses set by the bees in Menzel?s vanishing bearing, displacement experiments. morn, captured in the morning at the hive or the morning feeding site; aft, captured in the after- noon at the hive or the afternoon feeding site; Sm, morning site; Sa, afternoon site, S3, Site 3; S4, Site 4; H, hive. a Control experiments in which the bees had visited only the site at which they were captured. 6.2.3.2 Explanations of Novel-Shortcut Behavior Menzel demonstrated that bees are capable of setting a novel course from an un- familiar site to the hive, without homing to recognized visual cues near or along the way to the hive. There are several possible explanations of this finding. Image matching The novel route was the result of the bees? traveling so as to match their stored image of distant landscape features, as seen from the hive, with their current image. 43 Noninferential Interpolation The similarity between the distant visual cues at Site 3 and those at each of the two feeding sites directly caused the bees in question to compromise between their established homeward vectors; no inferential processes were involved. 44 Sequential Memory Referral The novel route was the result of the bees? al- ternately relying upon the two remembered feeder-to-hive vectors. 45 General Landscape Memory The novel-route bees did not rely upon their feeder-to-hive vectors; rather, they employed their ?general landscape? memory, established during their exploration, or orientation, flights. 46 Cognitive Map The bees were able to set a novel course by locating them- selves on their cognitive map, which encoded the coordinates of the two feeding sites, and other places in the bees? explored territory, in a common frame of reference centered on the hive. 47 172 43 Collett (T. S.) and Collett 2002; Wehner et al. 1996. 44 Menzel et al. 1998, p. 149. 45 Menzel et al. 2000b. 46 I mention this as a possible explanation based on the results of Menzel et al. (2000a, 2005), which I present below. 47 Giurfa and Capaldi 1999; Menzel and Giurfa 2001; Menzel et al. 1998, 2000b. Vector Averaging The similarity between the distant visual cues at Site 3 and those at each of the two feeding sites caused the bees to recall both of the acquired feeder-to-hive vectors, which they then averaged to obtain a Site 3-to-hive vector. 48 Menzel favors the vector averaging hypothesis. It does, in fact, currently provide the best explanation of his results, as I will now argue. There is a great deal of evidence confirmatory of the idea that honeybees pinpoint the location of their goal by matching their stored image(s) of land- marks near the goal with their current image. 49 One could argue, then, as do Wehner et al., 50 that bees might employ such a landmark-based guidance strat- egy on larger scales, at least when the relevant visual cues do not have to com- pete with vector information in working memory. It is not at all clear, however, that this large-scale image matching hypothesis could adequately explain Menzel?s results. It is prima facie incompatible with several of them. First, bees captured upon arrival at the hive have just played out their feeder-to-hive vector. Thus, on the image matching hypothesis, they should be quite capable of employing landmark-based information in order to return to the hive via a novel shortcut. Now, in Menzel?s experiment 1, the hive arriving bees which had visited both feeding sites did depart from Site 3 in the direction 173 48 Collett (T. S.) and Collett 2002; Giurfa and Capaldi 1999; Menzel et al. 1996, 1998, 2000b. 49 Cartwright and Collett 1983; Collett (T. S.) 1992; Wehner 1992. 50 Wehner et al. 1996. of the hive. But the hive arriving bees which had visited only the afternoon site departed from Site 3 in the afternoon-site-to-hive direction. This poses a diffi- culty for advocates of imaging matching, since the bees which had visited only the afternoon site should have been just as able to take the novel shortcut by means of image matching as the bees which had visited both sites. Moreover, any of the bees which had been visiting the afternoon site should have been quite fa- miliar with the large hill, toward which they flew when traveling to that location (Fig. 6.8). Similarly, if the image matching hypothesis were correct, the hive arriving bees displaced to Site 4 in the afternoon should have been able to set the correct homeward course upon departure. Instead, they picked the direction that would have taken them from the afternoon site to hive in the absence of displacement. It is true that the hill?s compass direction at Site 4 differed by more than 90? from its compass direction at the afternoon site and at the hive (Fig. 6.8). So the bees dis- placed to Site 4 might not have treated that prominent visual cue as the hill with which they were familiar. However, this possibility runs counter to at least the spirit of the imaging matching hypothesis under consideration, since it purports to explain the ability of insects to take novel shortcuts, even when the relevant landmarks are viewed from very different perspectives. 51 174 51 Wehner et al. 1996, pp. 133?134, 137?138. Further, morning and afternoon hive departing bees were able to take the novel route back to the hive. But hive departing bees which had visited only the morning site did not orient toward the hive when released at Site 3. Again, this is a difficulty for the image matching hypothesis. The bees which had visited only the morning site should have been just as able to take the novel route by means of image matching as the bees which had visited both sites. One could claim that the morning-site only bees were somewhat less familiar with the position of the hill than bees which had visited the afternoon site, since the hill was not situated in line with their hive-to-morning-site route. This, then, might account for their failure to take the novel shortcut. But this response is unsupported, given that hive arriving, afternoon-site only bees also failed to take the novel route. That is, the bees? degree of familiarity with the hill doesn?t account for any differences in navigational performance between the two groups. Relatedly, an appeal to large-scale imaging matching would have to account for the fact that the hive departing bees released at Site 3 did not depart toward either of the feeders. Rather, they set off either on a heading toward the hive or on their original hive-to-site heading. The idea that image matching is not relied upon in the presence of a vector in working memory does not help here, since, to reiterate, a significant proportion of the hive departing bees released at Site 3 headed toward the hive. Thus, hive departing bees were able to disregard their 175 original flight vector (and this is in accord with the results of other studies 52 ). They were also motivated to forage when captured. It would seem, then, that if the novel-route, hive departing bees used imaging matching in order to return to the hive, then they should have been able to use image matching in order to lo- cate their original destination. The noninferential-interpolation hypothesis avoids some of the problems faced by the image matching hypothesis, since it attributes the bees? novel- shortcut ability, in part, to their having visited both of the feeding sites. Menzel mentions the view merely as providing a possible explanation of his results. I?m unaware of anyone who actually defends it. Consequently, it?s not clear what the claim is, exactly. The idea, I gather, is as follows. The bees associated the visual scenes at the two familiar sites with the respective homeward vectors. The scene at Site 3 resembled those familiar scenes closely enough that when some of the bees released at Site 3 attempted to match the available visual cues with one of the familiar scenes, both of the learned associations became active. The vector memories then somehow competed for control of the bees? behavior, the result having been a compromise flight direction. The interpolation hypothesis, as I understand it, attempts to occupy a mid- dle ground between the imaging matching account and the vector averaging ac- 176 52 Dyer 1991, Menzel 1989, Riley et al. 2003, Sch?ne et al. 1998. count. It adds to the former an appeal to vector navigation. However, it stops short of an appeal to computational processes, relying only on associations and association strengths. Vectors come into play, but they are not rule manipulated. Nonetheless, it is not clear that the hypothesis occupies a stable position. First, what sort of process is supposed to yield a compromise between two vec- tors that is not a kind of inference? And if it is not a kind a inference, how is the process to be distinguished from large-scale image matching? In any event, even if the hypothesis is coherent, it?s not likely to be adequate. Before I present my case for that conclusion, I turn to Menzel?s. Menzel finds the interpolation hypothesis unlikely because the bees released at Site 4 left on the heading they would have taken if they had not been dis- placed, even though the hill was a prominent visual cue there. That assessment suggests that he takes the hypothesis to be a variant of the image matching ac- count. But his response is not clearly adequate, if emphasis is placed on the pro- posal?s requirement that the bees? recalled both learned vectors. The hill?s com- pass direction at Site 4 differed from that at each feeding site by 100?110? (Fig. 6.8). It could be argued, then, that the bees did not treat Site 4 as sufficiently similar to either of the familiar sites. Furthermore, Menzel?s objection undercuts his own position. For if it is the case that the distant visual cues at Site 4 were similar to those available at the familiar sites, enough so that the interpolation 177 account should have applied to bees at Site 4, then those cues should have been similar enough for Menzel?s vector averaging thesis to have applied as well. (Note that the hill?s orientation with respect to the various sites poses a problem for image matching, but not clearly for interpolation, since the former, but not necessarily the latter, purports to explain novel shortcuts, even when the relevant landmarks are viewed from very different perspectives.) Since interpolation and vector averaging each appeal to vector navigation, it might be thought that they should account for Menzel?s data equally well. But I hope to convince you that this is not the case. Again, the interpolation account appeals to only associations and association strengths. It explains the behavior of the novel-route bees in terms of the strengths of their site?vector associations and the degree to which the visual cues at Site 3 stimulate each familiar-site- associated vector memory. But association strengths and degrees of stimulation can be highly variable factors. Moreover, it is more likely than not that two vector memories would differ in their associative influence on a bee?s course (perhaps with one dominating). Consequently, on the interpolation view, it seems that the vanishing-bearing distribution at Site 3, for experiment 1 (hive arriving bees), should have been relatively broad, perhaps also with multiple peaks: one be- tween the afternoon-site-to-hive and Site 3-to-hive directions, and one between 178 the morning-site-to-hive and Site 3-to-hive directions, possibly with additional peaks at each of the feeder-to-hive directions. However, no such peaks are discernible (Figs. 6.9 and 6.10). Furthermore, histograms of Menzel?s vanishing bearing data appear to show that the distribu- tion of vanishing bearings for hive arriving bees released at Site 3 is not signifi- cantly different from the distributions for hive arriving bees released at Site 4 and afternoon-site only bees released at Site 3 (controls). In fact, the similarity of the distributions is fairly close (the Site 3 distribution for hive arriving bees also bears some resemblance to the Site 4 distribution for hive departing bees) (Fig. 6.10). This suggests that similar mechanisms were operative in the three groups of bees. We may infer, then, that since interpolation did not occur in the Site 4 or control group, it did not occur in the Site 3 group either. Also, since the Site 4 bees and the control bees showed a tendency to rely upon a single vector after displacement, the same is probably also true of the Site?3 bees. The general-landscape-memory, cognitive-map, and vector averaging ac- counts each attribute novel-course setting to a single, flight controlling vector. The sequential-memory-referral hypothesis, to which I now turn, does not. As he does in the case of the interpolation hypothesis, Menzel mentions the sequential-memory-referral account merely as providing a possible explanation of his results. Again, I?m unaware of anyone who actually defends it. In any case, 179 180 Figure 6.10. Histograms showing the distribution of vanishing bearings for four groups of bees in Menzel?s displacement experiments. The distributions shown in panels B?D are normalized so that their maximum values are equal to the maximum value of the distribution shown in panel A. (A) Hive arriving bees, Site 3. The data for morning- and afternoon-captured bees are combined (see Figure 6.9; n = 210). The superimposed black line approximates the distribution curve. (B) Hive arriving bees, Site 4 (n = 37). The black line is identical to the one in panel A. (C) Hive de- parting bees, Site 4 (n = 32). The black line is the left-right mirror of the line in panel A. (D) Hive arriving bees that had visited only the afternoon site, Site 3 (n = 55). The black line is identical to the one in panels A and B. it is easy to dispense with. For if the novel-route bees alternately relied upon their two route vector memories, their flights should have exhibited a zig-zag pattern, with an alternation frequency high enough to allow a vanishing bearing distribution directed toward the hive. Otherwise, the distributions would have been bimodal, with one peak for each of the two vectors. A difference in flight behavior, then, between novel-shortcut bees and others should have been ob- servable from the release site. However, Menzel reports that the departure flight characteristics for novel-route bees did not differ in any discernible way from those of any other group of bees. Thus, the hypothesis in question is unsup- ported. (Notice also that the view leaves for further investigation the question of why the distributions of vanishing bearings are similar across the different ex- perimental groups.) The hypothesis that appeals to the general landscape memory of bees, unlike the hypothesis just examined, is based on known bee navigational abilities. In order to provide a clear enough statement of the idea, I first contrast bees? gen- eral landscape memory with both their landmark-based route memory and their vector memory. Prior to foraging for the first time, or for the first time in a new area, honey- bees will make a series of exploration, or orientation, flights. Individual bees typically will explore multiple regions around the hive, though each flight is 181 usually limited to a particular sector. 53 On these excursions, bees learn the local solar ephemeris. They also learn the distance and direction, from the hive, of various landscape features. The sum of such stored distance and direction infor- mation is referred to as the bees? general landscape memory. Once bees begin to forage, they learn routes to and from foraging sites. Ex- perienced foragers rely on landmark-based route memories and flight vector memories as their primary means of navigation. That is why, when such bees are released after displacement to a location they are unaccustomed with, they tend to depart either along the flight vector they would have adopted had they not been displaced or in the direction of their original destination by means of hav- ing recognized landmarks that lie along the relevant established route. Bees which have flown only orientation flights are able to return rapidly to the hive after displacement, about as rapidly as they would have returned if they had learned a direct route connecting the hive and the place of release. 54 They can recognize landscape features near the release point and recall the associated homeward vector acquired during exploration. 55 Because experienced foragers primarily rely on established-route memories, they might take significantly longer to return to the hive after displacement, depending on where they are re- 182 53 Capaldi et al. 2000. 54 Menzel et al. 2000a. 55 Menzel et al. 2005. leased in relation to the hive and familiar routes. For example, a bee trained to a feeder 200 m north of the hive and displaced from the feeder to a location 200 m south of the hive will initially fly the learned southward feeder-to-hive vector, taking it farther away from the nest. But experienced foragers, too, can recall locational information acquired during their orientation flights. They are most likely to do so when familiar-route vector information is absent from working memory. That is the case for bees which have just played out a particular recalled vector. And that occurs when they have arrived at the hive or when they have flown the entire length of the vector without encountering their destination. 56 Since experienced foragers have access to their general landscape memory, the possibility arises that the novel-route bees in Menzel?s displacement experi- ments were able to take a shortcut to the hive because they recalled a Site 3- associated homeward vector, which they learned during their orientation flights. Although an appeal to general landscape memory could account for some novel-shortcut behavior, it is doubtful that such an account could explain Men- zel?s results. The fact that the bees which had visited only the morning site or only the afternoon site failed to orient toward the hive from Site 3 is a problem for the idea. The hive arriving bees which had visited only the afternoon site 183 56 Menzel et al. 2005. failed to exhibit novel-shortcut ability. Also, whereas about half of the hive de- parting bees displaced to Site 3 adopted the novel homeward course, those which had visited only the morning site did not. Again, this suggests that experience with both trained routes was necessary in order to be able to take the novel route. That would not have been necessary on the general-landscape account, which would require only that the bees became acquainted with Site 3?s vicinity on at least one of their orientation flights. Moreover, it is highly unlikely that the bees which had learned both routes could access their general landscape memory af- ter displacement to Site 3, but both the morning-only-site bees and the afternoon- only-site bees could not. As we have seen (? 6.2.3.1), the behavior of bees that had learned only one of the trained routes controlled for the possibility that the novel-route bees homed toward landscape features near the hive as well as for the possibility that they relied on route-associated landscape features. We see now that it also controlled for the possibility that those bees employed a single orientation-flight-acquired vector associated with Site 3 local cues. What could explain why single-route bees did not activate general-land- scape vector memories when released at Site 3? The landscape features associ- ated with homeward vectors during orientation excursions are not unlikely to be relatively local features along the bee?s line of flight. They certainly need to be 184 more localized than distant panorama features, since distant cues appear much the same over a broad area and hence are not useful for accurate assessment of position. Site 3 was situated within a uniform expanse of grassland. It?s plausible, then, that the site?s local scene was not distinctive enough for exploring bees to have associated the location with a homeward vector in the first place. It remains to evaluate the vector averaging and cognitive-map hypotheses. According to the former, the similarity between the distant visual cues at Site 3 and those at each of the two feeding sites caused the bees to recall the two feeder-to-hive vectors, which they then averaged to obtain a Site 3-to-hive vector. The first thing to notice about the account is that it?s no worse off than the inter- polation view with regard to explaining the difference in behavior between the groups of bees that did take the novel shortcut and the groups that did not. For example, advocates of either claim may appeal to the fact that the hill?s compass direction at Site 4 differed from that at each feeding site by 100?110?, in order to explain why the bees released at Site 4 failed to orient toward the hive. Second, the vector-averaging account provides an explanation of the simi- larity among the vanishing bearing distributions for the different groups of bees displaced to unfamiliar sites. For, first, the values of the vectors which are pro- posed to have been averaged are unlikely to have differed significantly among 185 individual bees. 57 Second, there is no reason to suppose that the bees would have weighted the vectors differently. Third, the result of an averaging process de- pends on the values of the vectors averaged, not on their ?strengths.? On the other hand, it?s at least quite unclear whether or not noninferential interpolation would result in such similarities. In fact, as I?ve argued, we should expect some discernible differences. Since each group of control bees had learned only one route, vector averaging explains why neither of them oriented toward the hive from Site 3. It should also be clear that vector averaging doesn?t predict unusual departure flight patterns. What about the lack of novel routes to feeders? The vector averaging ac- count does not imply that the bees in Menzel?s experiments should have been able to take a novel shortcut from Site 3 to the time-appropriate feeding location. The proposed computation operates on two hive-directed vectors, neither one of which has the bee?s actual location as its point of origin. It requires only that the bee recall and average those vectors. Whereas a computation of the heading and distance from Site 3 to a feeding site would require that the bee first compute a Site 3-to-hive vector, maintain that vector in working memory while recalling a hive-to-feeder vector, and then sum them. Clearly, then, the ability to average 186 57 Menzel et al. 2005, Riley et al. 2003. two vectors does not bring with it an ability to compute a course from an unfa- miliar location to a familiar location other than the hive. It?s important to recall that the bee?s in Menzel?s experiments either did not store a Site 3-to-hive vector in their general landscape memory or could not recall such a vector after displacement to that site. For if they had access to such a vec- tor, then it appears that they could have summed it with the relevant hive-to- feeder vector in order to set a course toward a feeder location. In fact, as we will see below (? 6.2.4), there is evidence that suggests that bees do have this ability. What about the fact that hive departing bees released at the morning site in the afternoon, or at the afternoon site in the morning, did not choose a shortcut to the relevant feeder? Couldn?t they have summed the feeder-to-hive vector for their release site with the hive-to-feeder vector for the other site? Perhaps. But this isn?t a serious worry, given the strong tendency of experienced foragers to give primacy to acquired route information. Also, hive departing bees have lim- ited energy resources. 58 Consequently, when find themselves at an unexpected location, they are apt to return to the hive, along a familiar route, rather than set out on a riskier course that would take them over unfamiliar territory. 187 58 Menzel et al. 2005. In sum, none of Menzel?s results pose any special problem for his vector av- eraging hypothesis. Let?s now turn to the cognitive-map account and see how well it fares. On the cognitive-map thesis, the bees in Menzel?s displacement experiments recorded the coordinates of the two feeder locations, and other places in their ex- plored territory, in a common frame of reference centered on the hive. The bees acquired this information over the course of their exploration and foraging trips. The sum of this information functions as a map, since it enables bees to set a di- rect course between any two recorded locations. Thus, the bees? were able to set a novel course from Site 3 to the hive by first (somehow) locating themselves on their mental map. Once they did so, they were able to compute a homeward flight vector with the help of information provided by their solar compass. However, there is a difficulty for the cognitive-map approach. The trouble is that if the bees? in Menzel?s experiments did in fact construct a cognitive map of their foraging territory (and their having done so was responsible for their novel shortcuts), then the hive departing bees, at least, should have been able to take a shortcut from their place of release to their original foraging site destination. Again, hive departing bees (at least when captured) are motivated to fly to a particular foraging place. Moreover they have been shown to be able to set a novel course, from the place of their release, to either the hive or, in certain cir- 188 cumstances, their original destination. 59 Since a significant proportion of the hive departing bees in Menzel?s experiments were able to take a shortcut from Site 3 back to the hive, the cognitive-map view requires that they must have been able to estimate the position of their place of displacement in relation to the hive. Also, they were informed about the location, with respect to the hive, of the time- appropriate feeder. Nonetheless, they failed to demonstrate an ability to set a Site?3-to-feeder course. It might be thought that if the novel-route bees navigated using a cognitive map, then hive departing bees released at the morning site in the afternoon, or at the afternoon site in the morning, also should have been able to choose a shortcut to the relevant feeder. For they possessed information about the hive-relative po- sition of both the release site and the time-appropriate feeding site (this was not the case for the control group). But, as in the case of vector averaging, the strong tendency of experienced foragers to give primacy to acquired route information allays this worry. Table 6.3 summarizes the conclusions of this section. Menzel?s vector aver- aging hypothesis is judged to provide the best explanation of his results. Each of the alternatives fails to deal adequately with at least one of them. 189 59 Dyer 1991, Gould 1986. 6.2.4 A Kind of Cognitive Map The failure to demonstrate a role for a cognitive map in the production of the vanishing bearing distributions in Menzel?s displacement experiments does not show that the honeybee does not have a cognitive map. It shows only that the bees in those experiments probably did not rely upon a cognitive map to set their initial course from the release site. As Menzel points out, training bees to specific routes might result in reliance on flight vector information for course setting. The operation of a cognitive map might not become apparent until a route-trained bee finds itself to be lost after a flight vector memory fails to lead it to its destina- 190 Explanandum Hypothesis IM SMR GLM NI VA CM Shapes of vanishing bearing distributions ? ? ? ? No shortcuts to hive from Site 4 ? ? ? ? ? Controls a : No shortcuts to hive from Site 3 ? ? ? ? No novel routes to feeders ? ? ? ? Typical departure flight patterns ? ? ? ? ? Table 6.3. Comparison of explanations of Menzel?s displacement experiment results, based on my evaluations. A checkmark indicates either that the hypothesis explains the result relatively well or that the result poses no apparent difficulty. Abbreviations: NI, noninferential interpolation; CM, cognitive map; GLM, global landscape memory; IM, (large-scale) image matching; SMR, sequen- tial memory referral; VA, vector averaging. a Control experiments in which the bees had visited only the site at which they were captured. tion. In fact, the general landscape memory of bees was not revealed until they were tested in displacement experiments in which route learning was prevented. 60 The general landscape memory is a kind of map, but it is not as robust as the sort of cognitive map we?ve been considering. It consists of multiple, hive point- ing vectors associated with various respective landscape features. Bees could have this sort of ?vector? map without representing the spatial relations between any places other than certain landscape features and the hive. Furthermore, bees could have this sort of map and yet not be able to integrate any of its vectors with any other. In that sense, a vector map could be fragmented and piecemeal. Menzel and colleagues, 61 however, claim to have demonstrated the existence of a kind of cognitive map in the honeybee, a map that allows bees to take novel shortcuts between known locations, neither of which is the hive. I later propose that the novel shortcuts flown were the results of novel combinations of flight vector memories and their semantic constituents (? 7.4). Using harmonic radar, 62 Menzel tracked displaced bees over the entire course of their flights. Three groups of bees were tested in the study. One group was trained to a stationary feeder situated 200 m east of the hive. A second group 191 60 Menzel et al. 2000a, Capaldi and Dyer 1999. 61 Menzel et al. 2005, p. 3045: ?The question now in bee navigation is not so much whether there is a map-like spatial memory but rather what structure this map has and how it is used.? 62 Riley et al. 1996, 1998. was trained to a feeder that slowly revolved around the hive at a distance of 10?m. A third group consisted only of bees that had not visited the stationary feeder but were recruited to it by a waggle dance. The two groups of feeder-trained bees were captured at the feeder after they had filled their crops. The dance-recruited bees were captured upon departure from the hive. Captured bees were placed in a dark container and transported to one of eight sites, where they were released within 15 min of capture. The experiments were performed in an expanse of flat grassland which contained very few natural food sources. Ground patterns due to different mowing times and soil conditions provided the only natural landmarks. Two groups of tents of various colors served as artificial landmarks. The height of the skyline as seen from the hive area varied within a range of less than 1.5?. Due to the resolution of the honeybee visual system, no features of the sky- line were pronounced enough to guide the bees to the area of the hive. Neither the hive nor the feeder was visible to the bees beyond a range of 60 m. The tents could not be seen by the bees outside a range of 100 m. Of one group of tents, the tent closest to the hive was 110-m distant. Of the other group, the nearest tent was 190-m distant. Hence, the tents were not suitable for purposes of homing by image matching. 192 The bees were experienced foragers, but the study site was new to them. The bees tested during one study period were allowed to perform orientation flights for 3 days. The bees tested during the other study period were permitted to per- form orientation flights for 6 days. Tests were carried out with the two groups of tents either in their original positions, rotated 120? about the hive, or removed. Orientation flights and test flights occurred under conditions well suited for solar-compass navigation (with the exception of some of the test flights of dance- recruited bees). Irrespective of release site and test conditions, the bees trained to the sta- tionary feeder initially flew their feeder-to-hive flight vector (on the heading and for the distance they would have flown in the absence of displacement). They next performed a search flight, followed by a straight homing flight toward the hive or first toward the feeder and then toward the hive. Hive departing, dance- recruited bees initially flew their hive-to-feeder vector with very good accuracy. 63 After a brief search for the feeder, they flew back toward the release site and ini- tiated a search for the nest. That search was followed by a straight, homeward homing flight. Bees trained to the moving feeder began to search for the hive immediately upon release. They too eventually performed a straight homing flight toward the hive. Since all groups of bees performed equally well, they 193 63 Riley et al. 2005. must have acquired information sufficient for homing during their orientation flights. For all groups of bees, search flight speed (12.9 ? 3.5 km/h) was significantly slower than both vector flight speed (19.1 ? 2.4 km/h) and homing speed (19.4 ? 1.8 km/h). Search flight paths were curved and highly variable. Searching bees often returned to the release site multiple times. 64 With very few exceptions, homing flights were initiated at points well out- side the 60-m-radius ?visibility zone? around the hive. Patterns of small patches of slightly differing kinds of vegetation were much the same over the entire study area. Also, bees homed toward the hive and approached the point at which they initiated their homing flights from all directions. Furthermore, bees released at the same site multiple times were able to approach the hive from different directions. 65 So it?s very unlikely that any particular pattern of ground patches visible beyond 60 m from the hive was used as a beacon. 194 64 Bees? search patterns are somewhat reminiscent of the search patterns of desert ants (see Weh- ner and Srinivasan 1981, and M?ller and Wehner 1994). Similarities include looping trajectories out from and back to the search?s point of origin and a continual expansion of the area searched. Bees? search patterns, however, are much more irregular than those of desert ants. Bees also ap- pear to be able to move the focus of their search. At the time of this writing, a variety of examples of the entire flight paths of bees in Menzel et al.?s (2005) study are available online (http://www.honeybee.neurobiologie.fu-berlin.de/Menzel-Greggers-Smith-PNAS-2005/supple- ment.html). 65 Examples of this have been provided online (http://www.honeybee.neurobiologie.fu-berlin .de/Menzel-Greggers-Smith-PNAS-2005/supplement.html). Bees clearly often used the tents as landmarks when they remained in their orientation-period locations. The presence of the tents was not essential for accu- rate homing, since bees homed just as effectively both when the tents were ro- tated 120? clockwise about the hive and when the tents were removed. Bees also homed just as effectively under heavy overcast, when solar cues were not avail- able. Thus, ground features were sufficient for accurate homing. A group of 29 stationary-feeder-trained bees were released at one of two sites under sunny skies and with the tents in their original positions. Of those bees, ten performed homing flights toward the feeder prior to returning to the hive. (Some other bees also homed toward the feeder under different conditions.) The homing routes taken by the feeder homing bees were certainly novel, as- suming that all stationary-feeder trained bees never flew outside the direct pathway between it and the hive, as Menzel reports. Although its possible that some of the hive homing bees, during exploration, had flown along the path of their homing flight, Menzel maintains that it?s very likely that at least a signifi- cant proportion of their homing flights were novel shortcuts. Menzel argues that his results show that the large-scale spatial memory of bees has a map-like organization. The bees took novel shortcuts to the feeder as well as the hive. The shortcuts were not a direct result of path integration, since the bees could not observe anything during transport to the release sites. Finally, 195 the shortcuts were not produced by beacon homing or image matching. Thus, during their orientation flights, the bees must have associated homeward vectors (provided by path integration) with views of various landscape features they en- countered. Furthermore, since the bees took novel shortcuts to the feeder as well as the hive, they must have been able to integrate hive?feeder route vectors into that general landscape memory. 66 As I indicated in the introduction (? 1.4) the capacity to take novel shortcuts is one that seems to require the capacity to represent various places of interest and certain relations (topological, metric, etc.) among them, as well as the capac- ity to make inferences involving those representations. Indeed, I?ll argue that the novel shortcuts flown to the feeder, in the above study, were the results of novel combinations of flight vector memories and their semantic constituents (? 7.4). 196 66 There is another possible instance of the integration of vector information into general land- scape memory. Seemingly, new recruits do not respond to dancers when they indicate a source of food as being situated where there in fact is only water (Gould and Gould 1988, Tautz et al. 2004). However, it?s not clear whether the dance observers don?t respond at all, or whether they do re- spond, but simply can?t find the feeder. Likely cases of bees? integration of reward value infor- mation into their information about a small-scale layout has been provided by Greggers and Mauelshagen (1997) and Fulop and Menzel (2000). Chapter 7 The Systematicity of Honeybee Navigational Capacities In this chapter I argue that various honeybee navigational capacities are system- atically related. Insofar as the systematicity hypotheses I propose involve attri- butions of content, the meaning and explanatory role of such attributions needs to be addressed (? 7.1.1). I spell out some of the semantic roles played by various honeybee representations as constituents of complex representations (? 7.3). One of these roles is that of an indexical (? 7.3.3). I argue that some honeybee cogni- tive processes are sensitive to the constituent-structure of the representations on which they operate (? 7.4). Relatedly, I argue that honeybees implement opera- tions defined over variables (? 7.5). Finally, I conclude by tying together the con- clusions of Chapters 2?5 with those of the present chapter. I propose that honey- bees have a simple language of thought. I also argue that even if they don?t, we have good reason to prefer non-Connectionist explanations of honeybee naviga- tional capacities over Connectionist ones. 197 7.1 Systematicity of Information Acquired by Honeybees Much of the previous chapter is pertinent to whether certain classes of informa- tion acquired by bees exhibit systematicity. My concern in this section is to pro- pose and defend the following general hypothesis: For various classes of information, if a honeybee has the capacity to ac- quire information I, then it also has the capacity to acquire systematic variants of I, where two items of information are systematic variants just in case they have the same informational constituents, have the same informational structure, but are formal permutations of each other. I argue for this general hypothesis by arguing for specific instances of it. The point of restricting the hypothesis to some classes of information is to avoid its having as consequences claims like: if a bee can learn that the sun is directly above the crest of the hill, then it can learn that the crest of the hill is directly above the sun. 1 As I mentioned in Chapter 1, there are a number of possible varieties of systematicity, and different kinds of cognitive ca- pacities might be systematically related in different ways. (See also ? 7.2.) Note that the general sort of systematicity just referred to is the same as that discussed in Chapter 2. There I considered two structurally complex thoughts to be systematically related just in case they have the same logical and representa- 198 1 Thus, Dennett?s (1989) supposition that systematicity hypotheses, at least as applied to nonhu- mans, would require that they have the capacity to think ecologically anomolous thoughts is erroneous. Penn and Povinelli (submitted) make the same supposition. tional constituents and are formal permutations of each other. Thus, whereas the thought that Fa ? Gb is a systematic variant of the thought that Ga ? Fb, this is true neither of the thought that Fa ? Hb nor the thought that ~ (Fa ? Gb). But dis- cussions of systematicity are often about one or another somewhat weaker no- tion, one that does not have a formal-permutation requirement. These weaker notions focus on the nonarbitrariness of the semantic relations among represen- tations. In Section 7.2, I argue that bee navigational capacities also exhibit a par- ticular type of ?weak? systematicity. The first specific hypothesis I propose concerns the capacity of bees to ac- quire information about distance and solar bearing relations between various places, such as the hive, landmarks, and foraging sites (?? 6.1.1, 6.1.2, 6.2.1, 6.2.4). As I will argue, that capacity does not come in isolated pieces. That is, the capac- ity of bees to acquire information about some particular distance and direction relations comes along with capacities to acquire intrinsically related information about other distance and direction relations. In particular, Systematicity 1 If a honeybee has the capacity to estimate that the solar bearing of a particular foraging site from the hive is, say, 45? west of the sun, then it also has the capacity to estimate that the solar bearing of the hive from that site is 45? west of the sun. I emphasize that this is a claim about informational content and not a claim about the configurational structure of honeybee mental representations. Hy- 199 potheses about the configurational structure and semantics of mental represen- tations contribute to explanations of the truth of hypotheses like Systematicity 1 and are not part of such hypotheses themselves. But since such hypotheses do involve attributions of content to nonhuman organisms, the issue arises as to how such attributions should be understood. It will save us some trouble if I ad- dress this issue prior to arguing for specific systematicity hypotheses. 7.1.1 Attributions of Content to Insects Much work on animal cognition concerns how to best characterize the contents acquired by various organisms. For example, the debates over whether or not various animals, including insects, possess a kind of cognitive map make sense only as issues about content. They are debates over how the spatial information an organism acquires is semantically organized. Little discussion, if any, is de- voted to the configurational structure of the bearers of the information. 2 And claims like, (1) The bees learned that the feeder is 200 m to the east of the hive. are common in the literature on insect navigation. But it?s a safe bet that those who make such claims would consider the idea than an insect could have a rep- 200 2 This is not to say that the distinction between the two issues is never ignored or overlooked. resentation with the content [meter] (or [sucrose reward], or [200], 3 or [east], or [hive]) to be absurd or baseless. So how are we to understand content attribu- tions such as those involved in claims like (1) and Systematicity 1? I won?t attempt here to provide a complete, comprehensive answer to that question. For my purposes, it is enough to provide the basic details of a way of understanding such attributions that both conforms with scientific practice and allows them to be confirmable and disconfirmable within currently possible ex- perimental paradigms. The general issue may be framed in terms of the relationship between the content of the that-clauses in the attributions and the information thought to be actually acquired by the organism. That relationship certainly is (or is at least ex- tremely unlikely to be) identity. I suspect that any expert on bee cognition would admit that there is a sense in which (1) could be true, even though the content of the bees? representations would not be [the sucrose reward is 200 m to the east of the hive]. Perhaps the extensions of the semantic constituents of the that-clause need to be identical with the extensions of the respective constituents of the bees? infor- 201 3 The case of contents about quantities is somewhat puzzling. It does seem odd to attribute to honeybees representations with, say, the content [200]. Nonetheless, the capacity of bees to per- form the computations required for path integration seems to require the capacity to manipulate information about relatively specific quantities. One possible way to remove the tension here would be to argue that the information about distance that is manipulated in path integration is a kind of nonconceptual content. mation. This suggestion might seem more promising than the first, but it is still off the mark. For it?s not currently possible to determine the actual extensions of insect representations. For example, we can?t with any confidence claim that the extension of a term like ?sucrose reward? or ?the hive? is the same as the exten- sion of some piece of information acquired by bees. The extension of ?sucrose reward? is very unlikely to be the same as the extension of any bee representa- tion. Similarly, it?s possible that bees don?t represent the hive per se. Rather, they may represent only various parts of it, or features of it, while lacking a represen- tation of the entire structure. It?s even possible that the extensions of many or all bee mental-representational constituents do not include anything external to bees at all. 4 They could turn out to be ?lucky? (though not accidentally successful) hallucinators. This could be the case if the correct theory of content for bee men- tal representations is an internalist theory, rather than an externalist one. Thus, bees? representations of the ?hive? might refer to only the relevant aspects of the snapshots they take on hive-departing learning flights. Consider whether bees can be tricked or caused to be mistaken as a result of various experimental manipulations. As we?ve seen, bees acquire information pertaining to distance by measuring optic flow. When trained to fly through a tunnel in order to obtain a reward, the close proximity of the tunnel walls may 202 4 Compare Trullier et al. (1997) on neural-network models of navigational capacities. induce more optic flow than the bees would have experienced had they flown to the reward location under normal circumstances. 5 If that is the case, bees that have returned to the hive will signal to recruits, via the waggle dance, a ?dis- tance? that is farther than the actual distance. But would such dancers literally be making a mistake? They would be, if the information they employ in producing the dance refers to distance, for then that information will refer to the wrong distance. But it?s possible that the information bees employ in the waggle dance actually refers to the quantity of optic flow that would be experienced during a normal, direct flight to the reward. If that were the case, the dancers would not be making a mistake. Or consider a case in which bees are stimulated to forage at night, and in which they rely on an artificial light source for orientation. Would such bees be mistaking the light for the sun? They would be, only if the referent of the relevant representations is in fact the sun. But perhaps those representations have an ex- tension that includes any suitable light source. Or perhaps the extension includes only certain illumination intensities. In either of these latter cases, the bees would not be making a mistake. Fortunately, for purposes of addressing the issues about systematicity with which we?re concerned, we don?t have to decide what are the actual contents and 203 5 Esch et al. 2001, Srinivasan et al. 2000. extensions of honeybee mental representations. For the explanatory purpose of attributions of content to bees (and other organisms) can be accomplished prior to settling those issues. For it is reasonable to interpret those attributions as hy- potheses about what features of the environment bees are able to track; and such hypotheses can be confirmed or disconfirmed, independently of establishing the specific contents and extensions of the information that allows bees to track those features. Crucially, evidence about what features of the environment bees are capable of tracking constrains what the contents and extensions of their acquired infor- mation could be. Whatever the contents and extensions of bee mental represen- tations are, they must be such as to permit bees to track what they do. Part of the burden of the following arguments for the presence of systematicities in honey- bee navigation is to support an additional claim: if bees can track certain struc- tures composed of elements that they can also independently track, then the in- formational contents by virtue of which they track those structures have seman- tic constituents by virtue of which they track those elements. In what follows, then, I?ll continue to employ nonliteral content attributions like claim (1) above. My concern is the semantic relations among items of infor- mation acquired by bees; and that issue can be addressed without making ten- dentious assumptions about the actual contents so related. 204 7.1.2 Some Honeybee Systematicies Various classes of information acquired by bees exhibit systematicity. The argu- ments for the systematicity hypotheses I propose here each exhibit the same pat- tern. Each type of systematicity is shown to be a consequence of bees? having a particular general capacity. As promised, I begin with Systematicity 1: Systematicity 1 If a honeybee has the capacity to estimate 6 that the solar bearing of a particular foraging site from the hive is, say, 45? west of the sun, then it also has the capacity to estimate that the solar bearing of the hive from that site is 45? west of the sun. All the evidence at present suggests that bees store information about the hive and individual foraging sites. And the ability of bees to use the sun as a compass is firmly established. Further, it would be quite difficult to explain the navigational abilities of bees if (contrary to overwhelming evidence) they are not capable of estimating the solar bearing of a particular foraging site from the hive, or of the hive from a particular foraging site. Crucially, the mechanisms which allow bees to estimate hive-to-site solar bearings are the very same mechanisms which allow them to estimate site-to- hive solar bearings. As we?ve seen, bees employ their internal solar ephemeris to accommodate the pattern of movement of the sun?s azimuth. In addition, they 205 6 The estimates need not be accurate under all conditions. I?m speaking here of the capacity to estimate at all. are able to estimate the position of the solar azimuth not only during the day but also at night. Moreover, bees are capable of relating their solar ephemeris to dif- ferent groups of landscape features; in particular, those visible from the hive and those visible from various foraging sites. Thus, for any solar bearing , bees have the capacity to estimate that the solar bearing from a particular familiar site to the hive is (and that the solar bearing from the hive to that site is ), regardless of the time of day at which that bearing is . And this gives us Systematicity 1. Systematicity 1, then, is a consequence of the capacity of bees to estimate the so- lar bearing of any familiar place from any other familiar place. That capacity comprises a cluster of systematically related capacities. In light of the discussion of the previous section, the truth of Systematicity 1 does not require that bees can think a thought with the content [the solar bearing of the hive from the foraging site is 45? west of the sun]. Nor does it require that for each representational constituent of the bee?s information there is a unique constituent of [the solar bearing of the hive from the foraging site is 45? west of the sun] that has precisely the same extension. Insofar as the example involves direction, it?s being an example of systematicity requires only that bees are capa- ble of acquiring two distinct items of information that would share a representa- tional constituent that allows them to track a particular solar bearing, whatever the specific content of that constituent. Likewise, insofar as the example involves 206 the hive, it?s being an example of systematicity requires only that bees are capa- ble of acquiring two distinct items of information that would share a representa- tional constituent that allows them to track the hive, whatever the specific con- tent of that constituent. A question that might arise at this point is, Why suppose that bees are capa- ble of acquiring two distinct items of information related in that way? Perhaps bees represent places in different ways under different circumstances or different motivational states. Thus, a bee might represent the hive one way when it is us- ing information about the hive?s solar bearing from a certain site but represent it in a different way when it is using information about the solar bearing of that site (or another) from the hive. So a capacity to estimate that the solar bearing of Place 1 from Place 2 is might bring with it only a capacity to estimate that the solar bearing of Place 3 from Place 4 is , even when Place 1 is identical with Place 4 and Place 3 is identical with Place 2. Why think otherwise? Well, for one thing, there is no evidence that suggests that the envisioned possibility is actually the case. Second, as far as we know, the view would attrib- ute to bees much more information than is necessary to explain their behavior. Systematicity 1 attributes to bees information about two places, whereas the ob- jection?s alternative attributes to bees information about four places. Third, as I am about to argue, it would be difficult to explain the actual navigational abilities 207 of bees if we could not assume that the way in which they represented particular places normally didn?t vary with changes in the information they have about their circumstances or with changes in their internal states, such as motivation. Consider path integration. Suppose that while scouting for a new foraging site, a bee keeps track of its position in the relation to the hive, which it repre- sents as Place 1. Suppose further that the bee finds a source of nectar, and fills its crop. In that case, its motivational state (and presumably its information about various particulars of its circumstances) would change. It would become moti- vated to return to the hive rather than search or forage. But suppose that because of its change in motivation and circumstances, the bee then represents the hive as Place 2. How could the bee?s information about its position in relation to Place 1, provided by its path integration system, help it get to Place 2? Or to put it the other way around, How would going to Place 2 help the bee get back to Place 1? We would either have to reject the supposition that [Place 1] and [Place 2] are distinct ways of representing the hive or maintain that the bee would have to be sensitive to the fact that Place 1 is identical with Place 2. An advocate of the objection under consideration would have to opt for the latter alternative. 7 However, it would be difficult to explain how the bee could be 208 7 He or she could, of course, argue that path integration is a special case. But nothing about my response is essentially tied to path integration. In other words, my point is such as to force him or her to treat a wide range of navigational capacities as special cases and thereby to concede that they are in fact typical, not special, cases. sensitive to the fact that Place 1 is identical with Place 2 without presupposing that it has a way of representing the hive which is (at least with regard to the sort of case in question) circumstance and motivation independent. In fact, sensitivity to that identity would seem to make both [Place 1] and [Place 2] such ways of representing. For then it would seem that the bee would have the capacity to es- timate its position in relation to ?either? place, regardless of its motivation or cir- cumstances. In other words, an appeal to sensitivity to identity places the oppo- nent of Systematicity 1 in the position of having to concede the very kinds of ca- pacities the existence of which he or she wants to question. Consider also some of the results of Menzel?s vanishing bearing, displace- ment experiments (Table 6.2). Hive departing and feeder arriving bees which were captured in the afternoon (without having filled their crops) and released at the morning site were able to adopt the morning-site-to-hive compass heading upon release. This suggests that those bees represented the morning site and the hive the same way in which they represented them during previous, morning foraging excursions and after they had filled their crops. Neither their having flown to the morning site nor their having fed there was necessary in order for the bees to call up the appropriate homeward vector. Likewise, the bees which took the novel shortcut from Site 3 must have represented the hive and the two sites in the same way in which they had on previous foraging excursions. Oth- 209 erwise, it would be hard to see how the bees could treat both (or either) of the site-to-hive vectors as relevant to the task of returning to the hive from Site 3. In short, without coherence in the way bees represent various places under various external and internal conditions, it?s hard to see how they could exhibit the coherence in their navigational behavior that in fact they do. There?s another worry about Systematicity 1 that requires attention. Why suppose that the related items of information are complex? Perhaps estimating the solar bearing of a particular foraging site from the hive doesn?t require in- formation about that site or the hive. Rather, couldn?t the bee just call up the relevant solar bearing? The bee might need to recall only information that we might express as ?Go along bearing .? First, remember that the present discussion is solely about content. So I?m not assuming that the configurational structure of the vehicles of the relevant in- formation in question is complex. Second, the crucial fact that needs to be ex- plained is a bee?s capacity to call up an appropriate vector in a variety of circum- stances. For example, displaced bees have the capacity to call up a vector the ori- gin of which is tied to their location prior to their having been displaced and the ?tip? of which is tied to their original destination. Also, bees displaced from the hive to any familiar location have the capacity to return directly to the hive from that place. Moreover, bees can return directly to the hive from any type of forag- 210 ing site (nectar, pollen, etc.), and they can directly return to any type of familiar foraging site from the hive, even if they last visited that site at least one-day ago and have not just been recruited to it. So when bees decide to fly out toward a certain familiar destination (say, to a nectar source, if motivated to obtain nectar), they don?t just access any of their many vector memories; rather, they access the one which will lead them from (what they take to be) their present location to another at which a specific type of resource may presently be available. That can be explained, it seems, only if the vector and the connected locations are linked in memory. That?s the sense in which the remembered information has to be se- mantically complex. Much of what I?ve said about the attributions involved in Sytematicity 1 should be applicable, mutatis mutandis, to the additional cases of systematicity I provide below. They can thus be presented more briefly. The ability of bees to represent various sorts of complex structures provides us with further examples of clusters of systematically related capacities. Collett?s vector sequence experiments (? 6.2.1) suggest the following hypothesis: Systematicity 2 If a honeybee has the capacity to learn the flight vector sequence ?distance n in direction d, then distance m in direction d*,? then it has the capacity to learn the flight vector sequence ?distance n in direction d*, then distance m in direction d?, as well as the capacity to learn ?distance m in direction d*, then distance n in direction d?. 211 Bees presumably have the capacity to represent a great variety of two-segment vector sequences. That bees have that capacity has Systematicity 2 as a conse- quence, assuming of course that they can represent the distances n and m and the directions d and d*. But that?s guaranteed by Systematicity 2?s antecedent. The results of Collett?s study on the effects of panoramic context on the per- formance of route flight segments (? 6.2.1) suggest yet another systematicity hy- pothesis: Systematicity 3 If a honeybee has the capacity to learn the route sequence ?distance n to landmark L, then distance m to landmark L*?, then it has the capacity to learn any of the route sequences (i) ?distance m to L, then dis- tance n to L*?, (ii) ?distance n to L*, then distance m to L?, and (iii) ?distance m to L*, then distance n to L?. The case for this hypothesis proceeds along the same lines as the justifications for Systematicies 1 and 2. As long as bees can represent the distances n and m and the landmarks L and L*, the consequent of Systematicity 3 follows from the ca- pacity of bees to learn the lengths of a great variety of route segments. And that bees can represent those particular distances and landmarks is guaranteed by the antecedent of the hypothesis. Here are two more systematicity hypotheses: Systematicity 4 If a honeybee has the capacity to learn the sequence of positive stimuli ?white, then blue, then black?white vertical stripes?, then it has the capacity to learn any of the sequences ?white, then black?white 212 vertical stripes, then blue?, ?blue, then white, then black?white vertical stripes?, and so on. Systematicity 5 If a honeybee has the capacity to learn that the sucrose concentration of Feeder 1 is greater than that of Feeder 2, then it has the capacity to learn that the sucrose concentration of Feeder 2 is greater than that of Feeder 1. Systematicity 4 is based on the results of Collett?s visual-sequence learning ex- periments (? 6.2.2.1), which strongly suggest that bees can represent arbitrary sequences of visual stimuli. Systematicity 5 is based on Wei?s study of learning flight modulation (? 6.1.3), which suggests that bees can represent arbitrary rela- tive levels of sucrose concentration. 8 As with Systematicities 1?3, for each of these two hypotheses, the existence of systematically related specific capacities is in- ferred from the existence of a more general capacity. 7.2 Weak Systematicity and the Tracking Argument So far I?ve restricted my discussion to a relatively strict form of systematicity, requiring that systematic variants be formal permutations of each other. But dis- cussions of systematicity are often about one or another somewhat weaker no- tion, one that does not have a formal-permutation requirement. These weaker notions focus on the nonarbitrariness of the semantic relations among represen- 213 8 Bees are also capable of learning the relative flow rates of different feeders as well as the relative amounts of reward available from different feeders. See Greggers and Menzel 1993 and Greggers and Mauelshagen 1997. tations. The central idea is that an organism?s capacity to acquire information about a certain domain exhibits systematicity if the following is the case: If the organism has the capacity to acquire the information that a certain individual has a certain property (or stands in a certain relation), then it has both the capacity to acquire the information that that individual has any of a variety of different properties (or stands in any of a variety of dif- ferent relations) and the capacity to acquire the information that any of a variety of individuals has that property (or stands in that relation). More formally, If the organism has the capacity to represent that a has the property (or stands in the relation) F, then there are other properties (or relations), G 1 , G 2 , ?, G n , and other individuals, b 1 , b 2 , ?, b m , such that it has the capacity to represent that a is G 1 , that a is G 2 , ?, and that a is G n , and that b 1 is F, that b 2 is F, ?, and that b m is F. In short, an organism?s capacity to acquire information about a certain domain exhibits systematicity if it comprises specific capacities to acquire any of a plu- rality of items of information having a common semantic constituent in the same semantic structural role. Call this sort of systematicity ?weak? systematicity. Note that weak systematicity is not the claim that for any a, b, F, and G, if an individual can represent that a is F and that b is G, then it can also represent that b is F and that a is G. This stronger claim, applied to humans, has the questionable conse- quence that if someone can think both that John plays guitar and that the number two is an even number, then they can thereby think both that the number two plays guitar and that John is an even number. Weak systematicity, on the other 214 hand, does not require that if a bee can learn that a nectar source is 200?m from the hive, then it can learn that a nectar source is 200?m from the brood chamber. The information that bees use to find their way around the hive might not be accessi- ble to their large-scale navigational systems; but that has no bearing on whether their large-scale navigational capacities are systematically related. Note that the explanations of systematicity presented in Chapter 2 apply, mutatis mutandis, to weak systematicity as well. For those explanations are fun- damentally explanations of how it is that mental representations have various types of constituent structures and of how it is that the semantic relations among them are nonarbitrary. 7.2.1 The Tracking Argument Horgan and Tienson?s tracking argument for a ?language? of thought 9 may be viewed as appealing to weak systematicity. They argue that some organisms have to have at least some representations which are semantically complex. Fur- thermore, in terms of Cummins? distinction between pure encodings, structural encodings, and structural representations (? 5.3), they argue, in effect, that such representations cannot be pure encodings but must be either structural encodings or structural representations of what they represent. 215 9 Horgan and Tienson 1996, pp. 81?83. Note that, although Horgan and Tienson mean to show that there must be mental representations having ?language-like,? or ?syntactic,? structure, under the rubric ?language-like,? Horgan and Tienson include non-Classical representa- tions, such as tensor products. Roughly speaking, on their use of the term, a sys- tem of representation is language-like if it can be used to encode syntactic struc- ture in a way that allows the encoded structures to be recoverable from the repre- sentations (but does not require that they ever be recovered). Again, I reserve the use of terms such as ?syntactic? for the actual configuration of representations at the representational level of description. One of Horgan and Tienson?s favorite ways to state the tracking argument is in terms of navigational capacities. Any organism that exhibits complex and flexible navigational behavior must acquire a great deal of information about many particular things and places in its locale, such as landmarks and foraging sites. It must have information about their locations in relation to itself and to certain other objects. It also needs information about many of their other proper- ties, such as appearance and value as a resource. Furthermore, such an organism must be able to acquire new information as circumstances warrant. For resource values change; some landmarks move, become temporarily hidden, or disappear; and the organism itself might move to an altogether different area. So the organ- ism would have to have the capacity to attribute different properties and rela- 216 tions to the same objects at different times. It would also need the capacity to at- tribute to newly encountered objects the same properties and relations it has at- tributed to other objects. In addition, every item of acquired information must have a content-appropriate causal role. It does no good to learn that the position of a landmark has changed if that information, in relation to other information possessed by the organism, is not appropriately efficacious in guiding its behav- ior. Finally, note that the organism must have such capacities not only for the en- vironment it actually inhabits but also for any possible environment it might have found itself in. Horgan and Tienson maintain that all this is possible only if the mental rep- resentations that encode the information have some sort of ?language-like,? representational-constituent structure, whether it be concatenative or noncon- catenative. The only way for the organism to acquire all the information it needs on an ongoing basis, while reliably maintaining the content-appropriate causal efficacy of its information bearing states, is to have the corresponding represen- tations be ?constructed,? as needed, out of representational constituents. From many of the findings examined in the preceding chapter, it should be clear that the navigational abilities of the honeybee are sophisticated and flexible enough for it to be among the organisms to which the tracking argument applies. Those abilities do indeed depend on weak-systematically related capacities to 217 acquire information relevant to wayfinding. Thus, bees can track the location of a place of interest, even though its solar bearing in relation to the hive continually changes. Also, by means of path integration, bees in flight can keep track of their continually changing location in relation to the hive, a landmark, or the place of their release. They can learn to relate the solar ephemeris for their locale to the different landscape features visible at different locations. Apis Mellifora has the capacity to reference its waggle runs to landscape features, though this capacity is exercised, as far as we know, only under experimental conditions. Local, iso- lated changes in the area of a goal (say, the appearance or location of nearby landmarks) need not prevent bees from searching at the correct location. Fur- thermore, as long as they have a means of individuating certain reward sites, bees can track changes in the relative value of those rewards. The capacity of A. Mellifora to learn to reference its waggle runs to landscape features illustrates the fact that current capacities need not match up with current abilities. Without training, A. mellifora presumably is unable to orient its waggle runs to landscape features. Nonetheless, its ability to learn the task shows that it has the prior capacity to do so. Note that such unexercised capacities of an or- ganism are just what one should expect if related capacities of that organism ex- hibit a certain form of systematicity. 218 I won?t bother to spell out more formally all of the weakly systematic way- finding capacities of bees. Here are just two: If a bee has the capacity to learn that a feeding site is at certain direction and distance from the hive, it also has the capacity to learn that that very site is at a different direction and distance from the hive. If a bee has the capacity to learn that the sun?s azimuth is at one location (in relation to the landscape) at a given time, it also has the capacity to learn that it is at a different location at that time. Clearly, there are many other plausible hypotheses of this sort. The station shift experiments of Gould and Dyer provide particularly good support for the weak systematicity of bee navigational capacities (? 6.1.3). Recall that when Gould changed the compass direction of the feeding station by about 30?, the bees adjusted their waggle dances gradually, until they correctly indi- cated the new solar bearing. Some of the bees in Dyer?s experiments (which em- ployed a 90? shift in the direction of the feeding station) also showed gradual re- orientation. This suggests that the bees updated their information about the lo- cation of the site by updating their information about the location of what for them was one and the same site. The bimodal dances reported by Dyer have the same implication. The bees that performed bimodal dances had returned from just the one site. So its quite likely that their dances communicated what for them was the location of that one site. Yet the dances alternately indicated two very different solar bearings, one 219 presumably based on their memory of the solar bearing of that site in relation to the landscape and the other based on their very recent experience of its actual solar bearing. It?s possible that the dances were a result of the bee?s memory of the location of the ?old? site competing with their newly acquired information about the location of the ?new? site. That is, the bees might have been confused about which of two sites?what from their point of view were two sites?they had just visited, rather than about the location of the one site. But I find this pos- sibility to be highly unlikely. For not only the station but also the field edge would have changed in orientation. The bees flew along the landmark that had always led to the station, and they found the station at its usual place in relation to that landmark. Further, there?s no other evidence that bees which have just returned from a successful foraging trip ever dance to indicate the location of a site other than the one from which they have just returned. Wei?s learning flight modulation study also provides good support for the weak systematicity of bee navigational capacities. That the learning flights of the bees increased in duration after an imposed increase in search time, and that the decay rate of their learning flights after such increases was significantly faster than the decay rate of their initial learning flights, suggests that the bees updated their information about the location of the feeder in light of their past experience of it. After an increase in search time or a change in location of the land- 220 mark?feeder array, they did not treat the feeder and associated landmarks as if they were situated at a newly discovered site; rather, they behaved as if they in- tegrated remembered and newly acquired information about what for them was one and the same place. The important point here is that a bee?s remembered information about a particular place (or object) and any of it?s newly acquired information about what we would say is the same place, really are, for the bee, two pieces of infor- mation about the same place. Which is to say that the semantic relations between such remembered and newly acquired information are nonarbitrary. To see the force of the tracking argument, just consider how difficult it would be to explain certain behaviors if the semantic relations between remem- bered information and new information about what is, in reality, one and the same object or place were arbitrary. Suppose I remember that my coffee mug is on my desk. But when I go to get it, I see that it is no longer there. Believing that it was washed and put up, I go to the kitchen and find it in the cupboard. Now suppose that the content of my memory about the location of the mug was [my mug is on my desk], but that the content of my newly acquired information about (what is in reality) the mug, when I found that it was no longer on my desk, was [Paul?s copy of The Last of the Mohicans is probably somewhere in Aus- tralia]. If that?s the case, then it would appear to be a bit difficult to explain why I 221 went to the kitchen to look for my mug, rather than to Australia to look for Paul?s book. Clearly, it would be quite difficult for someone to consistently find their way to important resources or places if the semantic relations among their items of information about important locations were arbitrary. But how does the need for flexible navigational capacities to be weakly sys- tematic support the claim that the organism?s mental representations need to be structural encodings or structural representations of what they represent, rather than pure encodings? The trouble with pure encodings is that any correspon- dence between their nonsemantic, physical properties and their contents is purely accidental. So even if the items of informational content acquired by an organism happen to be systematically related, if its mental representations are pure encodings, the presence of that systematicity would also be purely acci- dental. It could not be explained in terms of the nonsemantic, physical properties of its pure encodings. To spell this out just a bit more, suppose that my mental representations are pure encodings. Suppose further that the bearer of the content of my belief that my mug is on my desk is , and that the bearer of the content of my belief that my mug is in the kitchen is . How could my cognitive system know which be- lief to act on, or even that they conflict? For, by hypothesis, those two representa- tions need not share any cognitively efficacious, nonsemantic, physical proper- 222 ties. Thus, there need not be any way for my cognitive system to detect that those two representations share a representational constituent. The semantic relations between and might as well be arbitrary, even if they are not. 7.3 Systematicity and Semantic Structural Roles The ?strong? systematicity of honeybees? capacities to acquire various sorts of navigation-related information is possible only if the mental representations that encode the information have some sort of representational-constituent structure, whether it be concatenative or nonconcatenative. The same is the case for weak systematicity. As Horgan and Tienson maintain, the only way for the organism to acquire all the information it needs on an ongoing basis, while reliably main- taining the content-appropriate causal efficacy of its information bearing states, is to have the corresponding representations be ?constructed,? as needed, out of representational constituents. Complex semantic structure requires that representational constituents have certain semantic structural roles. This should be relatively noncontroversial, though it?s worth emphasizing in order to see some of the sorts of structural roles bee representational constituents need to play. In Section 7.4, I argue that honey- bee information processing is sensitive to those structural roles. 223 7.3.1 Distinguishing Systematic Variants Consider Systematicities 1 and 5: Systematicity 1 If a honeybee has the capacity to estimate that the solar bearing of a particular foraging site from the hive is, say, 45? west of the sun, then it also has the capacity to estimate that the solar bearing of the hive from that site is 45? west of the sun. Systematicity 5 If a honeybee has the capacity to learn that the sucrose con- centration of Feeder 1 is greater than that of Feeder 2, then it has the ca- pacity to learn that the sucrose concentration of Feeder 2 is greater than that of Feeder 1. It should be clear that the semantic structure of representations that are system- atically related in either of the above two ways must be something other than the structure of a non-ordered set, such as {hive, Site S, 45?, west, sun}. For such a structure wouldn?t allow the bee to distinguish [The solar bearing of Site S from the hive is 45? west of the sun] from [The solar bearing of the hive from Site S is 45? west of the sun]. Since solar bearing is an asymmetrical relation, the constitu- ents [hive] and [Site S] must play different structural roles in those contents. Since having greater sucrose concentration is also an asymmetrical relation, the constituents [Feeder 1] and [Feeder 2] must also play different structural roles in [Feeder 1 has a greater sucrose concentration than Feeder 2]. The need to distinguish weakly systematic variants also requires that repre- sentational constituents have certain structural roles. Suppose that a bee acquires the information [The bearing of Site S from the hive at time t is 45? west of the 224 sun]. The bee must be sensitive to the fact that that information is distinct from both [The bearing of the hive from Site S at time t is 225? west of the sun] and [The bearing of Site S from from the hive at time t is 225? west of the sun]. For only the second of the three can guide the bee back to the hive from Site S. Consider also Systematicities 2?4: Systematicity 2 If a honeybee has the capacity to learn the flight vector se- quence ?distance n in direction d, then distance m in direction d*,? then it has the capacity to learn the flight vector sequence ?distance n in direction d*, then distance m in direction d?, as well as the capacity to learn ?distance m in direction d*, then distance n in direction d?. Systematicity 3 If a honeybee has the capacity to learn the route sequence ?distance n to landmark L, then distance m to landmark L*?, then it has the capacity to learn any of the route sequences (i) ?distance m to L, then dis- tance n to L*?, (ii) ?distance n to L*, then distance m to L?, and (iii) ?distance m to L*, then distance n to L?. Systematicity 4 If a honeybee has the capacity to learn the sequence of positive stimuli ?white, then blue, then black?white vertical stripes?, then it has the capacity to learn any of the sequences ?white, then black?white vertical stripes, then blue?, ?blue, then white, then black?white vertical stripes?, and so on. Each of these systematicities concerns a capacity to acquire information about a certain kind of sequence. For sequences, order is crucial. The bee needs to be sen- sitive to which element of the sequence is first, second, or third, and so on. And that could be the case only if each constituent of the relevant information plays a certain place-in-the-sequence structural role. 225 7.3.2 ?What? and ?Where? There are other sorts of structural roles for bee representational constituents. For example, the representation forming processes responsible for producing infor- mation about the location of a particular place in relation to another must com- bine two constituents about those two respective places with a constituent about a certain direction and a constituent about a certain distance. Those processes must combine, as it were, two ?what? constituents with two ?where? constitu- ents, rather than two what constituents with two more what constituents, or one what constituent with three where constituents, and so on. Likewise, that there are certain bee psychological processes dedicated to manipulating information about direction (and not about distance, resource value, color, and so on) suggests that different bits of information about different directions share a special property to which those processes are sensitive. In or- der to be reliable, such processes must be able to distinguish information about direction from other kinds of spatial information as well as from non-spatial in- formation. Consider further the ability of bees to solve matching- (and non- matching-) to-sample tasks (? 6.2.2.2). A rule such as, ?Choose the x-marked arm if x was at the entrance,? plausibly could not operate on, say, information strictly about distance. For example, the variable in such a rule is quite unlikely to be replaceable by the content [200?m]. 226 7.3.3 Indexicals In the case of humans, the contents of mental representations that we express through the use of proper names or indexicals have a different sort of semantic structural role than the contents we express through the use of predicates. Might there be anything like this distinction in the case of bee mental representations? It?s plausible that there is. In this section I propose that some bee representations have an indexical-like element as a semantic constituent. In the last chapter we saw that bees have the capacity to learn a variety of route segments. They can learn vector sequences as well as landmark-to-land- mark and landmark-to-foraging-site route segments (? 6.2.1). They can learn the distance and direction from the hive of various local landmarks?their general landscape memory (? 6.2.3.2). Also, when released at an unfamiliar location, they are able to track their location with respect to it by means of path integration, al- lowing them to periodically return there during their search flight (? 6.2.4). To- gether, all this evidence clearly indicates that bees are capable of keeping track of their location with respect to an arbitrarily broad range of types of places. Its perhaps universally acknowledged that path integration requires an ac- cumulator that tracks a foraging or exploring bee?s distance and direction from the hive. However, in light of the sort of evidence just mentioned, there?s also a need for one or more local accumulators that work in tandem with the main, 227 global accumulator. 10 A local accumulator might work just like a global accumu- lator, except that its origin can be set at a variety of locations, rather than just at the current hive location. Alternatively, local vector information could be the product of a system that monitors the global accumulator, comparing its values at different places along a route, and deriving the distances and directions be- tween them. Whatever the case, local vector information needs to be tied to vari- ous specific locations, such as a salient local landmark or the place of release after displacement. Now, consider a bee that is learning a multisegment route, one that takes it from the hive to solitary tree in a clearing, then to a large boulder, and then to a landmark array that marks the foraging site. While learning this route, the bee also learns the tree-to-boulder flight vector and the boulder-to-site flight vector. That is, in addition to learning to fly to the tree, then to the boulder, and then to the site, it also learns the distance and direction of the tree from the hive, the boulder from the tree, and the site from the boulder. In each case, the origin of its local accumulator is tied to a different place. Since the bee is learning the flight vectors in question, it would appear that it needs to explicitly represent informa- tion such as [100?m and 45? east of the sun from tree] while in flight. That is, local vector information must be tied to specific place information. The learning mech- 228 10 Collett (T. S.) and Zeil 1998, Collett (M.) et al. 2002. anism in question, then, would appear to require representations that provide distance and direction information in relation to the value a variable whose in- stances are representations of places, representations of the (semantic) form: [distance n and direction d from place x]. It is not unreasonable to suppose that the value of the place variable in such a representation sometimes has an indexical-like semantic role. For it?s possible for a bee?s local-vector learning mechanism to be active without it?s being tied to any specific place features. That might occur if the bee is released at a featureless, uniform, unfamiliar location. Or that might occur when a displaced bee, after playing out its (say) feeder-to-hive vector, arrives at a featureless, uniform loca- tion that would have been the location of the hive in the absence of displace- ment. 11 (Another possible occurrence is presented in Section 7.4.) It seems to be a live hypothesis, then, that some bee representations are of the form [distance n and direction d from there]. Indeed, one might well wonder how vector navigation is possible without (semantic) indexicals. Information about the distances and directions between various places is not going to be useful to you unless you know where you?re at. Thus, a bee might have stored the information [Site S to hive: 200 m and 30? west of the sun]. But if the bee, upon departing from the hive for Site S, is displaced to 229 11 Compare the search behavior of desert ants in just such circumstances (Wehner and Srinivasan 1981, M?ller and Wehner 1994). Site S, that information won?t help it get back to the hive unless it can also ac- quire the information [here is at Site S]. Moreover, it?s not acquiring that infor- mation, but retaining the information [here is at hive], would explain it?s setting a course, upon release at Site S, that would have taken it from the hive to Site S in the absence of displacement. 7.4 Operations on Semantic Constituents of Complex Representations As I argued in the previous section, bee representational constituents have vari- ous sorts of semantic structural roles. There is a corollary to this claim regarding information processing in the honeybee, namely, that some of those processes must be structure sensitive. They must be sensitive to the structural roles of rep- resentational constituents. In this section, I provide what I take to be specific examples of such processes. Recall that Menzel has shown that bees are capable of adopting novel routes to a feeder upon determining their location in relation to the hive (? 6.2.4). A sig- nificant fraction of the novel flight trajectories to the feeder were straight, whereas a majority consisted of two flight segments (Fig. 7.1). The initial segment of two-segment flights resembled the trained hive-to-feeder vector. The second segment resembled the vector that would have led the bee to the hive from the homing flight?s point of origin. 230 Straight shortcuts to the feeder are explanable by the hypothesis that the bees summed their present-location-to-hive vector with their hive-to-feeder vec- tor. Two-segment novel routes are explanable by the hypothesis that the bees flew those two vectors rather than summed them. What?s particularly intriguing about the latter possibility is that the bees would have first flown their hive-to- feeder vector from a place that was not the location of the hive to a place that was not the location of the feeder (Fig. 7.1). Furthermore, they would then have flown along a vector that was originally hive directed but was now feeder directed. So, as I am about to propose in more detail, not only was the route flown a novel shortcut, it was, on the present hypothesis, a result of a novel combination of flight vector memories and their semantic constituents. 12 231 12 Results of earlier experiments by Collett (T. S.) et al. (1993) hinted at the possibility that bees have the capacity to combine memories of route segments in novel ways. Figure 7.1. Novel metric shortcuts contrasted with novel complex routes. (Left) A straight short- cut (solid arrow) from a recognized landmark (L) to the feeder (F) is the sum of the landmark-to- hive (H) vector (V 1 ) and the hive-to-feeder vector (V 2 ). (Right) The first leg of a two-segment novel route, from a recognized landmark to the feeder, is the original hive-to-feeder vector (V 2 ). Since the bee starts at the landmark rather than the hive, the first leg leads the bee to a place (x) that is neither the hive nor the feeder. The second leg is the vector that would have led from the landmark to the hive (V 1 ). Suppose that a bee, while searching for the hive, encounters a landmark the perception of which causes the bee to recall, from its general landscape memory, the vector that leads from that landmark to the hive. Say that the content of that memory is [landmark L-to-hive: 100 m northeast]. However, the bee has become motivated to find the feeder (perhaps because its energy reserves are becoming depleted). So the bee?s new motivational state causes it also to recall its hive-to- feeder flight vector, the content of which we may express as [hive-to-feeder: 200?m east]. But the bee doesn?t merely fly the hive-to-feeder vector and search for the feeder upon its completion. It flies that vector and then the vector than would have led it to the hive from the recognized landmark. The hypothesis, then, is that from the stored information, [landmark L-to-hive: 100m northeast] [hive-to-feeder: 200m east] the bee constructs the ?flight plan,? [landmark L-to-x: 200m east, then x-to-feeder: 100m northeast]. That is, the bee learns how to get to the feeder from its location at the landmark by recombining, in a novel way, some of the semantic constituents of information previously acquired. Correlatively, there must be information manipulating processes that operate on the remembered information in question. Note that if 232 this hypothesis is correct, the bee?s flight plan has an indexical-like element as a semantic constituent, in accordance with the possibility, mentioned above (? 7.3.2), that a bee?s local-vector learning mechanism can be active without it?s being tied to any specific place features. The bee?s construction of the flight plan on the basis of its stored information would also seem to require that the bee rely on information such as [here is at landmark L]. Another possibility is that the bee arrives at the feeder by combining con- stituents of the stored information, [landmark L-to-hive: 100m northeast] [hive-to-feeder: 200m east] so as to construct the flight plan, [200m east, then 100m northeast]. But, crucially, even on this weaker hypothesis, the derived vector is a combina- tion of semantic constituents of the stored vectors. Vector averaging also involves manipulation of vector memory semantic constituents. First, vector information operations such as vector averaging and vector addition (as in, for example, path integration [? 6.1.1]) require manipula- tion of the distance and direction semantic constituents of the relevant vectors. For it?s only by manipulation of those constituents that the resultant vector can 233 be derived. But vector averaging, as hypothesized to have been performed by the novel-shortcut bees in Menzel?s vanishing bearing, displacement study (? 6.2.3), might also involve further alterations. The bees could have manipulated the two feeder-to-hive vectors so as to obtain a present-location-to-hive vector. Or they could have averaged the two hive-to-feeder vectors and then reversed the direc- tion of the result to obtain a present-location-to-hive vector. Giurfa?s Y-maze experiments provide evidence in support of the claim that bees can acquire constitutent-structure sensitive rules (see also below [? 7.5.2]). Recall that the bees appeared to acquire rules along the lines of ?Choose the x- marked arm if x is at the entrance.? If that?s correct, it?s reasonable to propose that the bees, in performing the delayed matching-to-sample task, relied on a rule and representations with the following contents: Learned rule [Choose the x-marked arm if x is at the entrance.] Current information [Odor O is at the entrance.] Instantiated rule [Choose the O-marked arm if O is at the entrance.] Motor command [Choose the O-marked arm.] This would be a clear example of structure-sensitive reasoning, regardless of whether or not the representations having the last three contents are thought of as being processed strictly in sequence or, to some extent, in parallel. 234 One of Collett?s maze experiments, together with other available evidence, makes the possibility that honeybees are capable of transitive reasoning worthy of investigation. Recall that Collett trained bees to negotiate a three-compartment maze by choosing the correct stimulus for each compartment (? 6.2.2.1). Collett?s results strongly suggest that the bee?s learned the compartment-to-compartment sequence of positive stimuli, rather than behaved in accordance with sequentially recalled memories. Now, recall that, for one set of experiments, bees were trained with yellow paper marking the entrance to the boxes (which was always on the left), white (positive) and black (negative) in the first box, blue (marking the only exit and always on the right) in the second, and vertical (positive) and horizontal (negative) in the third (Fig. 6.5). The test I draw your attention to is the one in which bees chose between white and vertical in the middle box. The back box remained the same as in training, whereas the front box was made to look as similar as possible to the middle box in training, with blue on the right marking the only exit. Nonetheless, the bees preferred white in the middle box and verti- cal in the back box. They did not, then, simply associate the perceived character- istics of the middle box in training with the succeeding, vertical positive stimu- lus. Rather, they appear to have stored a representation having a content corre- sponding to [white before blue and blue before vertical]. 235 If in fact this is correct, then there is a possibility that the bees? having pre- ferred white when tested in the middle box was a result of a kind of transitive reasoning process. From [white before blue and blue before vertical], the bees might have derived [white before vertical]. Of course, it is also possible that the bees independently learned, rather than derived, [white before vertical]. How- ever, what makes the possibility of transitive reasoning here one to be taken seri- ously is that, although bees learn route-segment sequences, they appear to learn, and certainly perform, individual route segments independently. For example, in Collett?s channel experiments (? 6.2.1), the bees learned the landmark-to-landmark route segment and the landmark-to-feeder route seg- ment, but didn?t appear to learn the first-boundary-to-feeder route segment. Note also that, for all tests in the first series, regardless of the types of landmarks employed, the bees searched at the training distance from the final landmark. That they did so, regardless of the distance from the channel entrance to the first landmark, confirmed earlier findings 13 that bees? searches are sometimes con- trolled by a local vector extending from a particular landmark to the place, rela- tive to that landmark, where the goal had been. And, in Collett?s vector sequence experiments (? 6.2.1), in both standard and displacement tests, when the position of the first turn in an individual bee?s flight path differed from the correct loca- 236 13 Srinivasan et al. 1997. tion, there was a slight tendency for the position of the second turn to differ from the correct location by the same amount. The second flight segment, then, did not appear to correct for any inaccuracies in the first. The following appears to be a clear case of explicit goal information inter- acting with additional, explicit locational information in order to yield an action. A recruitee reads a dance indicating?[200 meters from the hive, at 30? west of the sun]. So it acquires, as an explicit goal,?[200 meters from the hive, at 30??west of the sun]. Noncontroversially, this needs to be explicit. The bee then heads for the stated location, only to find the way blocked, perhaps by a high, steep bluff. It then detours around the obstacle. Its path-integration accumulator coordinates will give it its current position with respect to the hive (also explicit), which must be compared with the explicit goal coordinates, in order to give the bee the nec- essary heading and direction to take once clear of the obstacle. We thus have cur- rent information interacting with explicit goal information to yield an action. We also another example of a process operating on representational constituents. 14 7.5 Algebraic Rules: An Introduction to Modelling Issues In Chapters 3?5, I argued that Connectionist-style explanations of systematicity do not have an explanation of systematicity per se, and that they are unprinci- 237 14 For evidence supporting the occurrence of this sort of vector subtraction in hamsters, see Eti- enne et al. 1998. For such evidence in the case of ants, see Schmidt et al. 1992. pled in the sense that they appeal to mechanisms that are arbitrary with respect to Connectionism. Smolensky architectures, for example, appeal to structural-role vectors and operations defined over them. Such architectures are, in that sense, nonstandard Connectionist architectures. I?ve argued that an appeal to nonstan- dard Connectionist mechanisms is necessary in order to explain systematicity (?? 4.2 and 5.3). Connectionist theorists, though, will no doubt persist in at- tempting to capture systematicity with more standard architectures. Whether or not they will succeed without implementing Classical representations or rules is an empirical issue. So far, they have not succeeded; 15 and there may be principled explanations for the lack of their success. 16 I leave a full discussion of modelling issues for a later occasion. But it?s worth taking a look at one important issue that needs to be addressed, namely, whether standard Connectionist architectures are capable of freely generalizing universally quantified one-to-one mappings. (We?ll see what this issue is about shortly.) For, first, the issue of systematicity is related to issues of generalization. In accordance with a point made by Hadley, 17 systematically related capacities require (or perhaps are) capacities to generalize previously acquired informa- 238 15 Hadley (2002, 2004) shows that the most successful models (including his own) employ Classi- cal representations or rules. 16 Phillips 1998, Phillips and Halford 1997. 17 Hadley 1994. tional structures to novel informational constituents. For example, if you?ve ac- quired the capacity to think that Andy loves Betty, and you later acquire an ad- ditional concept with the content [Carol], then you also acquire the capacity to think that Andy loves Carol. Second, as I?ll make clear below, honeybees have the capacity to freely generalize certain universally quantified one-to-one mappings. 7.5.1 Algebraic Rules and Free Generalization Marcus reminds us that there is much evidence that people can freely generalize universally quantified one-to-one mappings. 18 Such a mapping is a function that yields a unique value for every item in its domain. The identity function, f(x) = x, is a clear example. To say that people can freely generalize such a function is to say that they can determine it?s value for any item in its domain, regardless of whether or not they have previously encountered that item. For example, English speakers can form the progressive of any English verb stem by suffixing ?-ing? to it, even if the verb stem is entirely new to them. Free generalization of a universal one-to-one function seems to require exe- cution of a rule that operates on instances of variables, what Marcus calls an al- gebraic rule. Operations that rely on encoded one-to-one mappings between particulars (such as could be contained in a look-up table, for example) would not suffice. Such operations simply do not permit generalization to novel par- 239 18 Marcus 2001, pp. 36?39. ticulars. For novel particulars, by definition, are just those for which there is no prior encoded mapping. On the other hand, free generalization comes naturally to a system that exe- cutes algebraic rules. For such a rule is applicable to any input-variable instance, regardless of whether or not the instance is novel to the system. As long as the rule is a good one, it will yield appropriate outputs for novel inputs. Bees, it seems, are also able to freely generalize universally quantified one- to-one mappings. We?ve seen that bees can freely generalize the solar ephemeris for their locale (? 6.1.2). That is, on the basis of limited exposure to the sun, their solar ephemeris learning mechanism produces a record that allows them to esti- mate the azimuthal position of the sun at times when have not seen it or never can see it. Also, Guirfa?s Y-maze experiments showed that bees can solve delayed matching-to-sample tasks and delayed non-matching-to-sample tasks, where their solutions allow them to generalize to novel stimuli, even across sensory modalities (? 6.2.2.2). Again, his results suggest that the bees can acquire rules that operate on instances of a variable. Furthermore, rules such as ?Choose the x- marked arm if x was at the entrance? and ?Choose the non-x-marked arm if x was at the entrance? are universally quantified one-to-one functions. Marcus provides a strong case for his thesis that standard connectionist net- works (whether local or distributed), trained by standard connectionist learning 240 algorithms, cannot freely generalize universal one-to-one functions unless they implement algebraic rules. He first provides theoretical considerations in support of his thesis. He then examines various models which attempt to account for ex- perimental results with respect to a variety of human cognitive tasks (such as linguistic inflection), where successful performance appears to require the ability to freely generalize. He argues that the most successful models implement rules for computing universal one-to-one functions, whereas the unsuccessful models do not. Here I present only his theoretical argument. I then show that his argu- ment applies fairly straightforwardly to a network model of solar ephemeris learning proposed by Dickinson and Dyer. 19 I also briefly discuss the implica- tions of his argument for modeling Giurfa?s Y-maze results. Marcus? theoretical thesis is that the training independence exhibited by standard connectionist networks entails that a multiple-node-per-input-variable 20 connectionist model can learn to compute a certain universal one-to-one function only if every input node and output node is exposed, during training, to at least some items in that function?s domain. Roughly, training independence exists 241 19 Dickinson and Dyer 1996. 20 Marcus treats single-node-per-input-variable models separately. He shows that such models are natural candidates as hypotheses about how algebraic rules could be implemented in networks. As such, they do not constitute an alternative to models having Classical architecture. Smolensky makes a similar claim about local connectionist models: ?The theory of ? local connectionist networks is so intimately associated with the classical theory of computation and automata that drawing any principled boundary between them may well be impossible? (1995c, p. 231). when: (1) adjustment of the connection weights (training) for some input nodes occurs independently of adjustment of the connection weights for other input nodes (input independence); and (2) adjustment of the connection weights for some output nodes occurs independently of adjustment of the connection weights for other output nodes (output independence). Training independence, according to Marcus, is a logical consequence of the nature of the standard connectionist learning algorithms, such as backpropaga- tion and Hebbian algorithms. Learning that occurs through the use of such algo- rithms is local. During training, the weight of a given connection is altered as a function of information that is locally available to that connection. Connections are not given access to the activation values of nodes to which they do not con- nect, nor are they given access to the weights of other connections. As a result, successful training adjustment of the connection weights for some subset of a network?s input (or output) nodes need not transfer to the connection weights for its other input (or output) nodes. As Marcus puts it, standard connectionist net- works are unable to generalize universal one-to-one functions between nodes. 7.5.2 Free Generalization in Bees Dickinson and Dyer claim to have provided what they consider to be a nonim- plementational connectionist model of how bees learn the local solar ephem- 242 eris. 21 The connectivity structure of the core of Dickinson and Dyer?s model is partially illustrated in Figure 7.2. The most active node in the inner ring repre- sents the time of day. The most active node in the outer ring represents the azi- muth. The outer ring receives its inputs from the visual system. The inner ring receives its inputs from the circadian clock. Each time node is connected with every azimuth node. There are also connections within each ring. During the learning process, the connection between the most active time node and the most active azimuth node is strengthened relative to the other time- azimuth connections (a Hebbian learning algorithm seems sufficient for this pur- pose). Also, the connection within each ring between its most active node and its 180? (12-h) opposite is strengthened relative to the other connections within that ring. The relative strengthening of intra-ring connections, according to Dickinson 243 21 Dickinson and Dyer provisionally devised a multilayer-perceptron model of solar-azimuth learning. Unlike bees, the model could not learn to estimate the position of the solar azimuth at night. That is, it could not generalize beyond times of day that did not occur within its training set. Dickinson and Dyer regarded this as ?a fatal flaw of the model, and of any model that re- quires exposure to examples of complete patterns to be able to recognize incomplete patterns? (1996, p. 200). Figure 7.2. Connectivity structure of Dickin- son and Dyer?s model of solar ephemeris learning (not all connections are shown). and Dyer, allows the network to learn the local ephemeris and to use it to esti- mate the azimuth for any time of day or night. 22 Dickinson and Dyer claim that this sort of model can learn any solar ephem- eris function. They also claim that it is nonimplementational. 23 It may be con- ceded that a model of the sort proposed by Dickinson and Dyer can learn any particular, local solar ephemeris function. However, it appears that such a model could learn such a function only if it builds in constraints that amount to an im- plementation of a general function which, via learning (perhaps some sort of parameter setting), yields a particular solar ephemeris. Consider such a model repeatedly exposed to the local solar azimuth only for the same couple of hours in the afternoon. How can it learn to estimate the complete local solar ephemeris for its locale? First, as Dickinson and Dyer realize, in order to learn the ephemeris for the corresponding time of night, the time-of- day nodes need to be most strongly connected to the time-of-night nodes that correspond to one-half of a day later. But it should be clear that this is a con- straint on weights that partially builds in a general solar ephemeris function. Clearly, this constraint must be built-in by the modeller, since weights are simply not determined by connectivity alone. Second, the portion of the solar ephemeris 244 22 Notice that Dickinson and Dyer get around the problem of training independence by designing out one of its preconditions: independent input and output nodes. 23 Dickinson and Dyer 1996, p. 201. that a network with such connectivity will learn, based on limited exposure, will be consistent with an infinite variety of complete solar ephemeris functions. For example, for all such a network might learn, the sun is never visible outside that part of the sky in which it has been observed. Thus, further constraints on the weights of its connections will be necessary. Again, such constraints must be built in by the modeller. In short, if such a model can learn any local solar ephemeris, that will be possible only if the modeller builds in what he or she already knows about the ?shapes? (a graph of) a local solar ephemeris can actually take as well as how the entire shape of a particular ephemeris depends on the shapes of cer- tain of its parts. Dickinson and Dyer?s network model, then, won?t be able to freely general- ize a local solar ephemeris unless it implements a generalized solar ephemeris function that operates on the value of a variable (time of day). Thus, it?s not a de- finitive example of a nonimplementational connectionist model of solar ephem- eris learning. At best, their model shows that if a universally quantified one-to- one mapping has a sufficiently limited domain, then it can be implemented with what amounts to a kind of look-up table. That?s something a Classical theorist should have no qualms about. I now turn to the question of whether network models of the learning of de- layed matching-to-sample tasks or delayed non-matching-to-sample tasks could 245 be adequate without implementing an algebraic rule. I won?t attempt to provide a complete answer this question. (Again, I leave a thorough examination of spe- cific modelling issues for future work. 24 ) Rather, I limit my discussion to a recent argument for an affirmative, though qualifiedly affirmative, answer. I?ll then say a few words about Giurfa?s Y-maze experiments. Learning the tasks in question involves learning a first-order sameness or difference relation. Penn and Povinelli 25 argue that non-Classical architectures are capable of learning such relations. They point to a network model by Gasser and Colunga 26 as a clear example of such a network. Their model employs ?mi- cro-relational units? to detect, roughly, the similarity or difference between two 246 24 In my preliminary research on this issue, I?ve yet to find an example of a strictly Connectionist network model of sophisticated navigational capacities. All of the network models reviewed by Trullier et al. (1997) that are capable of anything approaching the flexibility of bee navigation (none have the capability of taking novel shortcuts) implement representations for which the con- stituency relation is concatenative (typically, configurationally complex maps; they also employ traditional graph-search algorithms). The same is true of the more recent network model pro- posed by Voicu and Schmajuk (2000). The network model developed by McNaughton and col- leagues (McNaughton et al. 1991, 1996; Samsonovich and McNaughton 1997) performs path inte- gration, but does so by implementing a look-up table, and thus can?t serve as a definitive exam- ple of a non-Classical approach. Their model also implements a configurationally complex map. (A problem with the model is that it is incapable of returning to the coordinates of a stored loca- tion, since it has no mechanism for storing such coordinates.) Mittelstaedt (2000) extends their model. Unlike McNaughton et al.?s version, Mittelstaedt?s model can return to a previously vis- ited location. But, crucially, it leaves unspecified the mechanism by which locational information is tied to goal information. In effect, the model posits complex information without explaining how it is to be implemented. I should note that McNaughton and Mittelstaedt don?t appear to have a Connectionist axe to grind. Their goal is to provide network models of hippocampal func- tion and mammalian navigation, and network models need not have an entirely non-Classical architecture. 25 Penn and Povinelli (submitted). 26 Gasser and Colunga 1999. numeric inputs that encode respective features. However, it?s somewhat puzzling that Penn and Povinelli go on to admit that a micro-relational unit can plausibly be interpreted as implementing a rule that operates on the values of variables. Why, then, do they claim that Gasser and Colunga?s solution is non-Classical? The principal answer is that Penn and Povinelli require of Classical rules that they be implemented in the form of explicit information. Since micro- relational units do their job without employing explicit information about either the sameness or difference relation, Gasser and Colunga?s solution is non- Classical. Apart from the fact that Classical rules need not be implemented in the form of explicit information (they can be hardwired, for example), there?s a distinction between a solution that is not definitively Classical and one that does not imple- ment a rule that operates on the values of a variable. Some ways of implementing such rules are compatible with both Classical architectures and Connectionist ones. Gasser and Colunga?s use of multi-relational units appears to be one such way. Thus, insofar as the model employs such units, it cannot serve as a defini- tive example of a non-Classical implementation of an algebraic rule. It is also true, by the same token, that insofar as the model employs such units, it cannot serve as a definitive example of a Classical implementation. How- ever, the ability to learn a first-order sameness or difference relation, while per- 247 haps necessary for performing delayed matching-to-sample tasks or delaying non-matching to sample tasks, is not sufficient. 27 The rules learned by the bees in Giurfa?s experiments??Choose the x-marked arm if x was at the entrance,? and ?Choose the non-x-marked arm if x was at the entrance??make use of sameness or difference information and thus require more than the implementation of a rule merely for detecting sameness and difference. The bees learned to detect not only sameness or difference but also the sameness or difference between two dif- ferent kinds of features: the sample stimulus and the matching or nonmatching stimulus. In terms of variables, the information about the sample stimulus had to have been bound to a different variable than the information about either of the later-encountered stimuli. Which is another way of saying that the values of the respective variables had to have different semantic roles, and the learned rules had to have been sensitive to those roles. Thus, it appears that an adequate model of Giurfa?s results would have to do more than simply implement alge- braic rules. It would have to implement rules that are sensitive to semantic structure. 28 248 27 Penn and Povinelli devote no discussion to the modelling of such tasks. 28 As I noted above, Dickinson and Dyer avoid training independence in part by employing con- nections between input nodes. But Giurfa?s bees generalized across sensory modalities that have independent input channels. So, prima facie, it would appear that connectionist models of the bees? performance would have difficulty generalizing across input modalites, due to training in- dependence. 7.6 Summary and Conclusion I?ve argued in this chapter (based on the evidence presented in Chapter 6) that certain navigational capacities of honeybees exhibit what I?ve called strong sys- tematicity (? 7.1) and that certain navigational capacities of honeybees exhibit what I?ve called weak systematicity (? 7.2). I?ve also argued that the representa- tional constituents of systematically related honeybee mental representations have various structural roles (? 7.3). Among these are subject- and object-of- relation roles (? 7.3.1), place-in-sequence roles (? 7.3.1), and ?what? and ?where? roles (? 7.3.2). Furthermore, a case can be made for the hypothesis that among the constituents of bee representations are indexical-like constituents (?? 7.3.3, 7.4, and 7.5). Finally, I?ve argued that honeybee information processing must be sen- sitive to the structural roles of representational constituents. The question that connects Chapters 2?5 and Chapters 6 and 7 is, ?What kind of theory of honeybee mental representations and processes would best ex- plain the systematicity of the relevant honeybee navigational capacities?? Classi- cal theorists would hypothesize, in light of the evidence (Chapter 6), that honey- bees have mental representations that are complex, having representations as constituents. They would also hypothesize that the constituency relation for the relevant bee mental representations is concatenative and that the configurational structure of those representations is governed by a combinatorial syntax and se- 249 mantics. As in the case of human thought, the specific kinds of constituents?the specific contents and extensions of atomic and complex constituents?would be left open, for the present. 29 Classical theorists would further hypothesize that the relevant honeybee cognitive processes have representational constituents in their domains and are causally sensitive to syntactic structure. As I argued in Chapters 3?5, such an explanation of systematicity would be a good one. On the other hand, a Connectionist explanation would not be a good one, in that (1) it would provide neither a causal explanation of systematicity (Chapter 3) nor an acausal explanation of systematicity (Chapter 4) (and thus would not really explain systematicity at all), and (2) it would be unprincipled if construed as an explanation of systematicity (Chapter 5) (though it would not be be unprincipled if construed as an explanation of how a Connectionist system could mimic a Classical system that exhibits systematicity). Therefore, we have good (though defeasible) reasons to prefer Classical theories of certain honeybee navigational capacities over Connectionist theories. One objection to the Classical explanation of systematicity is that it?s not at all clear whether systematicity requires that the configurational structure of 250 29 That the Classical explanation of systematicity doesn?t provide a syntax and semantics for men- talese is not a good reason to regard the explanation as inadequate (contra Matthews 1997). It only points out the fact that the Classical view is not yet confirmed. Connectionism is no better off on this matter. mental representations be syntactic. 30 Perhaps positing map-like structure (for example) rather than syntactic structure would work as well. In regard specifi- cally to the systematicity of honeybee navigational capacities, it might seem that map-like representational structure could account for the relevant systematicies. After all, we?re talking about certain capacities of honeybees to acquire informa- tion about their navigational domain. Furthermore (it might be thought), assum- ing that the structure of the relevant honeybee mental representations is map-like provides the best explanation of the fact that those representations preserve in- formation about about the layout of their environment (which is also map-like). This objection, I acknowledge, does raise serious issues that would need to be adequately addressed by anyone concerned to defend the Classical language of thought hypothesis, especially by anyone concerned to defend the view that honeybees have a language of thought. Fortunately, however, for my principal purpose, it?s not necessary for me to attempt to refute the kind of view under consideration. For one thing, as I pointed out earlier (? 5.4; Appendix A), a sys- tem of mental representation can be both map-like and language-like. Further- more, and this is the key point, my main conclusion is that we have (defeasible) good reasons to favor explanations of systematicity that posit a system of mental representation for which the constituency relation is concatenative over explana- 251 30 See, for example, Block 1995, Copeland 1993, pp. 200?204, and Penn and Povinelli (submitted). tions that do not; and the constituency relation for maps and other sorts of structural representations is in fact concatenative. So even if the objection in question could be worked out (and even if it turns out that the vehicles of hon- eybee mental representations are map-like), that would be of no solace to a Con- nectionist. For distributed representations are configurationally simple. Their contents can be complex, but the Connectionist constituency relation is noncon- catenative. And, as I argued in Chapters 3?5, it?s that feature of Connectionism that makes its explanation of systematicity problematic. 252 Appendix A A Limited Representational System which is both Map- and Language-Like Here I demonstrate by means of a simple, artificial example, the possibility of a system of representation that is map-like, in that its representations have spatial structure, and language-like, in that it has a combinatorial syntax and semantics. I make no claims about the theoretical usefulness of the system. A.1 Lexicon for Map Legend L The map legend L consists of the following terms: A finite set of individual constants, I: a set of 12 unique, uniform patterns. A finite set of 1-place predicates, P: a set of 12 distinct colors. A finite set of 3-place predicates, G: a set of 12 grids of two square, non- overlapping regions having the same area and contiguous along a vertical side ( n). On the intended interpretation, the members of G express something like the following: x is that minimal region of the world such that y is situated in the left half of x, and z is situated in the right half of x. 253 A.2 Syntax for L All patterns, colors, and grids are to be understood as members of the relevant set of the terms of L. 1. For any uniform (colored or noncolored) pattern , and for any grid n, n and n are wffs. (Here, ?noncolored? means having a color that is distinct from each member of P.) 2. For any uniform (colored or noncolored) patterns and , and for any grid n, n is a wff. 3. If P and Q are wffs by clause 1 or 2, then the stack P/Q (P stacked on Q) is a wff. 4. There are no other wffs. Regarding 1, when just one of the pattern variables is instanced, the other may be considered bound by an implicit existential quantifier whose domain is I. A.3 Semantics for L A.3.1 L-Models An L-model is an ordered 4-tuple, < , ?, ?, ?>, where 1. is a square region consisting of 16 contiguous, nonoverlapping, and numbered square regions, arranged in a 4-by-4 grid: ?????? 254 2. ? is a one-to-one mapping of G onto the set of the 12 smallest, horizon- tally oriented, rectangular regions of , S: {1-2, 2-3, 3-4, 5-6, ?, 15-16}. 3. ? assigns to each member of I one member of P. 4. ? is a one-to-one mapping of I onto the set of the 16 numbered subre- gions of . A.3.2 Truth Conditions for wffs of L 1. If is a noncolored pattern, then n is true iff ?( n) = k-(k+1) and ?( ) = k. 2. If is a noncolored pattern, then n is true iff ?( n) = k-(k+1) and ?( ) = k+1. 3. If is a pattern of color c ? P, then n is true iff ?( n) = k-(k+1), ?( ) = k, and ?( ) = c. 4. If is a pattern of color c ? P, then n is true iff ?( n) = k-(k+1), ?( ) = k+1, and ?( ) = c. 5. If and are (colored or noncolored) patterns, then n is true iff n and n are true. 6. A stack, P/Q, is true iff P is true, Q is true, and the grid constituents of P and Q are mapped by ? onto two members of S, a and b (respectively), such that the bottom side of a is contiguous with the top side of b. 255 References Aizawa, K. 1997. ?Explaining systematicity.? Mind and Language 12: 115?136. Anderson, J. A. 1995. An Introduction to Neural Networks. MIT Press. Barsalou, L. W. 1992. ?Frames, concepts, and conceptual fields.? In Frames, Fields, and Contrasts: New Essays in Semantic and Lexical Organization, ed. E. Kittay and A. Lehrer. Erlbaum. Barsalou, L. W. 1993. ?Flexibility, structure, and linguistic vagary in concepts: Manifestations of a compositional system of perceptual symbols.? In Theories of Memories, ed. A. C. Collins, S. E. Gathercole, and M. A. Conway. Erlbaum. Beer, R. D. 2000. ?Dynamical approaches to cognitive science.? Trends in Cognitive Sciences 4: 91?99. Berg, R. E., and Stork, D. G. 1995. The Physics of Sound, second edition. Prentice Hall. Blakemore, R. P., and Frankel, R. B. 1981. ?Magnetic Navigation in Bacteria.? Sci- entific American 245: 58?65. Block, N. 1995. ?The mind as the software of the brain.? In An Invitation to Cogni- tive Science, 2nd ed., vol. 3, Thinking, ed. D. Osherson. MIT Press. Browne, A., and Sun, R. 1999. ?Connectionist variable binding.? Expert Systems 16: 189?207. Butler, K. 1991. ?Towards a connectionist cognitive architecture.? Mind and Lan- guage 6: 252?272. Capaldi, E. A., and Dyer, F. C. 1995. ?Landmarks and dance orientation in the honeybee Apis mellifera.? Naturwissenschaften 82: 245?247. Capaldi, E. A., and Dyer, F. C. 1999. ?The role of orientation flights on homing performance in honeybees.? The Journal of Experimental Biology 202: 1655-1666. 256 Capaldi, E. A., Smith, A. D., Osborne, J. L., Fahrbach, S. E., Farris, S. M., Rey- nolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., and Riley, J. R. 2000. ?Ontogeny of orientation flight in the honeybee revealed by harmonic radar.? Nature 403: 537?540. Carruthers, P. 2005. ?On being simple-minded.? In Consciousness: Essays from an Higher-Order Perspective. Oxford University Press. Cartwright, B. A., and Collett, T. S. 1983. ?Landmark learning in bees: Experi- ments and models.? Journal of Comparative Physiology 151: 521?543. Casati, R., and Varzi, A. C. 1999. Parts and Places: The Structures of Spatial Represen- tation. MIT Press. Chittka, L., Bonn, A., Geiger, K., Hellstern, F., Klein, J., Koch, G., Meuser, S., and Menzel, R. 1992. ?Do bees navigate by means of snapshot memory pictures? In Proceedings of the 20th G?ttingen Neurobiology Conference, ed. N. Elsner and D. W. Richter. Georg Thieme Verlag. Chittka, L., Geiger, K., and Kunze, J. 1995a. ?The influence of landmarks on dis- tance estimation of honey bees.? Animal Behaviour 50: 23?31. Chittka, L., Kunze, J., Shipman, C., and Buchmann, S. L. 1995b. ?The significance of landmarks for path integration in homing honeybee foragers.? Naturwis- senschaften 82: 341?343. Churchland, P. S. 1986. Neurophilosophy: Toward a Unified Science of the Mind Brain. MIT Press. Clark, A. 1988. ?Thoughts, sentences and cognitive science.? Philosophical Psychol- ogy 1: 263?278. Collett, M., and Collett, T. S. 2000. ?How do insects use path integration for their navigation?? Biological Cybernetics 83: 245?259. Collett, M., Collett, T. S., Bischi, S, and Wehner, R. 1998. ?Local and global vectors in desert ant navigation.? Nature 394: 269?272. Collett, M., Harland, D., and Collett, T. S. 2002. ?The use of landmarks and pano- ramic context in the performance of local vectors by navigating honeybees.? The Journal of Experimental Biology 205: 807?814. 257 Collett, T. S. 1992. ?Landmark learning and guidance in insects.? Philosophical Transactions of the Royal Society of London B 337: 295?303. Collett, T. S. 1996. ?Insect navigation en route to the goal: Multiple strategies for the use of landmarks. The Journal of Experimental Biology 199: 227?235. Collett, T. S., and Baron, J. 1994. ?Biological compasses and the coordinate frame of landmark memories in honeybees.? Nature 368: 137?140. Collett, T. S., and Zeil, J. 1998. ?Places and landmarks: An arthropod perspec- tive.? In Spatial Representation in Animals, ed. S. Healy. Oxford University Press. Collett, T. S., Fry, S. N., and Wehner, R. 1993. ?Sequence learning by honey bees.? Journal of Comparative Physiology A 172: 693?706. Collett, T. S., Baron, J., and Sellen, K. 1996. ?On the encoding of movement vec- tors by honeybees. Are distance and direction represented separately?? Jour- nal of Comparative Physiology A 179: 395?406. Collett, T. S., and Collett, M. 2000. ?Path integration in insects.? Current Opinion in Neurobiology 10: 757?762. Collett, T. S., and Collett, M. 2002. ?Memory use in insect visual navigation.? Na- ture Reviews Neuroscience 3: 542?552. Copeland, J. 1993. Artifical Intelligence: A Philosophical Introduction. Blackwell. Cummins, R. 1996. ?Systematicity.? The Journal of Philosophy 93: 591?614. Cummins, R., Blackmon, J., Byrd, D., Poirier, P., Roth, M., and Schwarz, G. 2001. ?Systematicity and the cognition of structured domains.? The Journal of Phi- losophy 98: 167?185. Darwin, C. 1985. The Origin of Species. Penguin Classics. Dennett, D. C. 1989. ?Mother nature versus the walking encyclopedia.? In Phi- losophy and Connectionist Theory, ed. W. M. Ramsey, S. P. Stich, and D. E. Ru- melhart. L. Erlbaum Associates. Dickinson, J. 1994. ?Bees link local landmarks with celestial compass cues.? Na- turwissenschaften 81: 465?467. 258 Dickinson, J., and Dyer, F. C. 1996. ?How insects learn about the sun?s course: Al- ternative modeling approaches.? In From Animals to Animats 4, ed. P. Maes, M. J. Mataric, J.-A. Meyer, J. Pollack, and S. W. Wilson. MIT Press. Dyer, F. C. 1985a. ?Mechanisms of dance orientation by the Asian honey bee Apis florea.? Journal of Comparative Physiology A 157: 183?198. Dyer, F. C. 1985b. ?Nocturnal orientation by the Asian honey bee, Apis dorsata.? Animal Behaviour 33: 769?774. Dyer, F. C. 1987. ?Memory and sun compensation by honey bees.? Journal of Comparative Physiology A 160: 621?633. Dyer, F. C. 1991. ?Bees acquire route-based memories but not cognitive maps in a familiar landscape.? Animal Behaviour 41: 239?246. Dyer, F. C. 2002. ?The biology of the dance language.? Annual Review of Entomol- ogy 47: 917?949. Dyer, F. C., and Dickinson, J. A. 1994. ?Development of sun compensation by honey bees: How partially experienced bees estimate the sun?s course.? Pro- ceedings of the National Academy of Sciences USA 91: 4471?4474. Dyer, F. C., and Dickinson, J. A. 1996. ?Sun-compass learning in insects: Repre- sentation in a simple mind.? Current Directions in Psychological Science 5: 67? 72. Esch, H. E., and Burns, J. E. 1996. ?Distance estimation by foraging honeybees.? The Journal of Experimental Biology 199: 155?162. Esch, H. E., Zhang, S. W., Srinivasan, M. V., and Tautz, J. 2001. ?Honeybee dances communicate distances measured by optic flow.? Nature 411: 581?583. Etienne, A., Maurer, R., Berlie, J., Reverdin, B., Rowe, T., Georgakopoulos, J., and S?guinot, V. 1998. ?Navigation through vector addition.? Nature 396: 161?164. Fodor, J. A. 1990. A Theory of Content and Other Essays. MIT Press. Fodor, J. A. 1998. ?Connectionism and the problem of systematicity (continued): Why Smolensky?s solution still doesn?t work.? In J. A. Fodor, In Critical Condi- tion: Polemical Essays on Cognitive Science and the Philosophy of Mind. MIT Press. 259 Fodor, J. A. 2000. The Mind Doesn?t Work That Way: The Scope and Limits of Compu- tational Psychology. MIT Press. Fodor, J. A., and McLaughlin, B. P. 1995. ?Connectionism and the problem of sys- tematicity: Why Smolensky?s solution doesn?t work.? In Connectionism: De- bates on Psychological Explanation, ed. C. MacDonald and G. MacDonald. Blackwell. Fodor, J. A., and Pylyshyn, Z. W. 1995. ?Connectionism and cognitive architec- ture: A critical analysis.? In Connectionism: Debates on Psychological Explanation, ed. C. MacDonald and G. MacDonald. Blackwell. F?l?p, A., and Menzel, R. 2000. ?Risk-indifferent foraging behaviour in honey- bees.? Animal Behaviour 60: 657?666. Gallistel, C. R. 1998. ?Symbolic processes in the brain: the case of insect naviga- tion.? In An Invitation to Cognitive Science, 2nd ed., vol. 4, Methods, Models, and Conceptual Issues, ed. D. Osherson. MIT Press. Garson, J. W. 1997. ?Syntax in a dynamic brain.? Synthese 110: 343?355. Giurfa, M., and Capaldi, E. A. 1999. ?Vectors, routes and maps: New discoveries about navigation in insects.? Trends in Neurosciences 22: 237?242. Giurfa, M., Zhang, S., Jenett, A., Menzel, R., and Srinivasan, M. V. 2001. ?The concepts of ?sameness? and ?difference? in an insect.? Nature 410: 930?933. Golledge, R. G., ed. 1999. Wayfinding Behavior: Cognitive Mapping and Other Spatial Processes. The Johns Hopkins University Press. Gould, J. L. 1984. ?Processing of sun-azimuth information by honey bees.? Ani- mal Behaviour 32: 149?152. Gould, J. L. 1986. ?The locale map of honey bees: Do insects have cognitive maps?? Science 232: 861?863. Gould, J. L., and Gould, C. G. 1988. The Honey Bee. W. H. Freeman. Greggers, U., and Mauelshagen, J. 1997. ?Matching behavior of honeybees in a multiple-choice situation: The differential effect of environmental stimuli on the choice process.? Animal Learning and Behaviour 25: 458?472. 260 Greggers, U., and Menzel, R. 1993. ?Memory dynamics and foraging strategies of honeybees.? Behavioral Ecology and Sociobiology 32: 17?29. Hadley, R. F. 1994. ?Systematicity in connectionist language learning.? Mind and Language 9: 247?272. Hadley, R. F. 1997. ?Cognition, systematicity, and nomic necessity.? Mind and Language 12: 137-153. Hadley, R. F. 2002. ?Systematicity in Connectionist Generalization,? In The Hand- book of Brain Theory and Neural Networks, 2nd ed., ed. M.A. Arbib. MIT Press. Hadley, R. F. 2004. ?On the Proper Treatment of Semantic Systematicity.? Minds and Machines 14: 145?172. Haugeland, J., ed. 1997. Mind Design II: Philosophy, Psychology, Artificial Intelli- gence. MIT Press. Healy, S., ed. 1998. Spatial Representation in Animals. Oxford University Press. Heinrich, B. 1976. ?Foraging specializations of individual bumblebees.? [Jrnl name? Ecol Monogr] 46: 105?128. Horgan, T., and Tienson, J. 1996. Connectionism and the Philosophy of Psychology. MIT Press. Hummel, J. E., and Holyoak, K. J. 2001. ?A process model of human transitive inference.? In Spatial Schemas and Abstract Thought, ed. M. Gattis. MIT Press. Janzen, D. H. 1971. ?Euglossine bees as long-distance pollinators of tropical plants.? Science 171: 203?205. Joerges, J., K?ttner, A., Galizia, C. G., and Menzel, R. 1997. ?Representation of odours and odour mixtures visualized in the honeybee brain.? Nature 387: 285?288. Kratzsch, D., Giurfa, M., and Menzel, R. 1998. ?Sequence learning by honey- bees.? Abstract 296, Fifth International Congress of Neuroethology, University of California, San Diego. MacDonald, C., and MacDonald, G., ed. 1995. Connectionism: Debates on Psycho- logical Explanation. Blackwell. 261 Manning, A. 1956. ?Some aspects of the foraging behaviour of bumblebees.? Be- haviour 9: 164?201. Marcus, G. 2001. The Algebraic Mind: Integrating Connectionism and Cognitive Sci- ence. MIT Press. Matthews, R. J. 1996. ?Can connectionists explain systematicity.? Mind and Lan- guage 12: 154?157. McLaughlin, B. P. 1993. ?The connectionism/classicism battle to win souls.? Philosophical Studies 71: 163-190. McNaughton, B. L., Chen, L. L., and Markus, E. J. 1991. ? ?Dead reckoning?, landmark learning, and the sense of direction: A neurophysiological and computational hypothesis.? Journal of Cognitive Neuroscience 3: 190?202. McNaughton, B. L., Barnes, C. A., Gerrard, J. L., Gothard, K., Jung, M. W., Knierim, J. J., Kudrimoti, H., Qin, Y., Skaggs, W. E., Suster, M., and Weaver, K.?L. 1996. ?Deciphering the hippocampal polyglot: The hippocampus as a path integration system.? The Journal of Experimental Biology 199: 173?185. Menzel, R. 1989. ?Bee-havior and the neural systems and behavior course.? In Perspectives in Neural Systems and Behavior, ed. T. J. Carew and D. Kelley. Alan R. Liss. Menzel, R. 1999. ?Memory dynamics in the honeybee.? Journal of Comparative Physiology A 185: 323?340. Menzel, R., and Giurfa, M. 2001. ?Cognitive architecture of a mini-brain: The honeybee.? Trends in Cognitive Science 5: 62?71. Menzel, R, and M?ller, U. 1996. ?Learning and memory in honeybees: From be- havior to neural substrates.? Annual Review of Neuroscience 19: 379?404. Menzel, R., Geiger, K., Chittka, L, Joerges, J., Kunze, J., and M?ller, U. 1996. ?The knowledge base of bee navigation.? The Journal of Experimental Biology 199: 141?146. Menzel, R., Geiger, K., Joerges, J., M?ller, U., and Chittka L. 1998. ?Bees travel novel homeward routes by integrating separately acquired vector memories.? Animal Behaviour 55: 139?152. 262 Menzel, R., Brandt, R., Gumbert, A., Komischke, B., and Kunze, J. 2000a. ?Two spatial memories for honeybee navigation.? Proceedings of the Royal Society of London B 267: 961?968. Menzel, R., Giurfa, M., Gerber, B., and Hellstern, F. 2000b. ?Cognition in insects: The honeybee as a study case.? In Brain Evolution and Cognition, ed. G. Roth and M. F. Wulliman. Wiley. Menzel, R., Greggers, U., Smith, A., Berger , S., Brandt, R., Brunke, S., Bundrock, G., H?lse, S., Pl?mpe, T., Schaupp, F., Sch?ttler, E., Stach, S., Stindt, J., Stollhoff, N., and Watzl, S. 2005. ?Honey bees navigate according to a map-like spatial memory.? Proceedings of the National Academy of Sciences USA 102: 3040?3045. Michelson, A. 1999. ?The dance language of honey bees: Recent findings and problems.? In The Design of Animal Communication, ed. M. Hauser and M. Konishi. MIT Press. Mittelstaedt, H. 2000. ?Triple-loop model of path control by head direction and place cells.? Biological Cybernetics 83: 261?270. M?ller, M., and Wehner, R. 1994. ?The hidden spiral: Systematic search and path integration in desert ants, Cataglyphis fortis.? Journal of Comparative Physiology A 175: 525?530. Niklasson, L. F., and van Gelder, T. 1994. ?On being systematically connection- ist.? Mind and Language 9: 288?302. Pastergue-Ruiz, I., and Beugnon, G. 1994. ?Spatial sequential memory in the ant Cataglyphis cursor. In Les Insectes Sociaux. Proceedings of the 12th Congress of the International Union. Study social insects, ed. A Lenoir, G. Arnold, and M. Lepage. University Paris Nord, Paris. Penn, D., and Povinelli, D. J. (submitted.) ?Do animals really have a language of thought?? Behavioral and Brain Sciences. Phillips, S. 1998. ?Are feedforward and recurrent networks systematic? Analysis and implications for a connectionist cognitive architecture.? Connection Sci- ence 10: 137?160. Phillips, S., and Halford, G. S. 1997. ?Systematicity: Psychological evidence with connectionist implications.? In Proceedings of the Nineteenth Annual Conference 263 of the Cognitive Science Society, eds. M. G. Shafto and P. Langley. Stanford Uni- versity. Pinker, S. 1997. How the Mind Works. Norton. Povinelli, D. J., and Bering, J. M. 2002. ?The mentality of apes revisited.? Current Directions in Psychological Science 11: 115?119. Povinelli, D. J., Bering, J. M., and Giambrone, S. 2000. ?Toward a science of other minds: Escaping the argument by analogy.? Cognitive Science 24: 509?541. Povinelli, D. J., and Giambrone, S. 2001. ?Reasoning about beliefs: A human spe- cialization?? Child Development 72: 691?695. Povinelli, D. J., and Vonk, J. 2003. ?Chimpanzee minds: Suspiciously human?? Trends in Cognitive Sciences 7: 157?160. Rey, G. 1997. Contemporary Philosophy of Mind. Blackwell. Rey, G. 2003. ?Chomsky, Intentionality, and a CRTT.? In Chomsky and His Critics, ed. L. M. Antony and N. Hornstein. Blackwell. Riley, J. R., Smith, A. D., Reynolds, D. R., Edwards, A. S., Osborne, J. L., Williams, I. H., Carreck, N. L., and Poppy, G. M. 1996. ?Tracking bees with harmonic radar.? Nature 379: 29?30. Riley, J. R., Valeur, P., Smith, A. D., Reynolds, D. R., Poppy, G. M., and L?fstedt, C. 1998. ?Harmonic radar as a means of tracking the pheromone-finding and pheromone-following flight of male moths.? Journal of Insect Behavior 11: 287? 296. Riley, J. R., Greggers, U., Smith, A. D., Stach, S., Reynolds, D. R., Stollhoff, N., Brandt, R., Schaupp, F., and Menzel, R. 2003. ?The automatic pilot of honey- bees.? Proceedings of the Royal Society of London B 270: 2421?2424. Riley, J. R., Greggers, U., Smith, A. D., Reynolds, D. R., and Menzel, R. 2005. ?The flight paths of honeybees recruited by the waggle dance.? Nature 435: 205? 207. Robinson, W. S. 1995. ?Direct representation.? Philosophical Studies 80: 305?322. 264 Ronacher, B., and Wehner, R. 1995. ?Desert ants Cataglyphis fortis use self- induced optic flow to measure distances travelled.? Journal of Comparative Physiology A 177: 21?27. Samsonovich, A., and McNaughton, B. L. 1997. ?Path integration and cognitive mapping in a continuous attractor neural network model.? The Journal of Neu- roscience 17: 5900?5920. Schmidt, I., Collett, T. S., Dillier, F.-X., and Wehner, R. 1992. ?How desert ants cope with enforced detours on their way home.? Journal of Comparative Physi- ology A 173: 103?133. Sch?ne, H., Westermayr, P., K?hme, D., K?hme, L., Sch?ne, M., and Sch?ne, R. 1998. ?Searching behaviour and direction finding of differently motivated displaced honeybees ? an ?etho-psychological? study of release behaviour.? Ethology 104: 1039?1055. Servan-Schreiber, D., Cleeremans, A., and McClelland, J. 1991. ?Graded state ma- chines: The representation of temporal contingencies in simple recurrent net- works.? In Connectionist Approaches to Language Learning, ed. D. Touretzky. Kluwer. Si, A., Srinivasan, M. V., and Zhang, S. 2003. ? Honeybee navigation: Properties of the visually driven ?odometer?.? The Journal of Experimental Biology 206: 1265? 1273 Schmidt, I., Collett, T. S., Dillier, F.-X., and Wehner, R. 1992. ?How desert ants cope with enforced detours on their way home.? Journal of Comparative Physi- ology A 171: 285?288. Smolensky, P. 1995a. ?Connectionism, constituency, and the language of thought.? In Connectionism: Debates on Psychological Explanation, ed. C. Mac- Donald and G. MacDonald. Blackwell. Smolensky, P. 1995b. ?On the proper treatment of connectionism.? In Connection- ism: Debates on Psychological Explanation, ed. C. MacDonald and G. MacDon- ald. Blackwell. Smolensky, P. 1995c. ?Reply: Constituent structure and explanation in an inte- grated connectionist/symbolic cognitive architecture.? In Connectionism: De- 265 bates on Psychological Explanation, ed. C. MacDonald and G. MacDonald. Blackwell. Srinivasan, M. V., Zhang, S. W., and Bidwell, N. J. 1997. ?Visually mediated odometry in honeybees navigation en route to the goal: Visual flight control and odometry.? The Journal of Experimental Biology 200: 2513?2522. Srinivasan, M. V., Zhang, S., Altwein, M., and Tautz, J. 2000. ?Honeybee naviga- tion: Nature and calibration of the ?odometer?.? Science 287: 851?853. Sterelny, K. 1990. The Representational Theory of Mind: An Introduction. Blackwell. Tautz, J., Zhang, S., Spaethe, J., Brockmann, A., Si, A., and Srinivasan, M. 2004. ?Honeybee odometry: Performance in varying natural terrain.? PLoS Biology 2: 915?922. Touretzky, D. S. 1986. ?BoltzCONS: Reconciling connectionism with the recursive nature of stacks and trees.? Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Amherst, Mass. Trullier, O., Wiener, S. I., Berthoz, A., and Meyer, J.-A. 1997. ?Biologically based artificial navigation systems: review and prospects.? Progress in Neurobiology 51: 483?544. van Gelder, T. 1990. ?Compositionality: A connectionist variation on a classical theme.? Cognitive Science 14: 355?384. van Gelder, T. 1991. ?Classical questions, radical answers: Connectionism and the structure of mental representations.? In Connectionism and the Philosophy of Mind, ed. T. Horgan and J. Tienson. Kluwer. van Gelder, T. 1995. ?What might cognition be, if not computation?? Journal of Philosophy 91: 345?381. van Gelder, T. 1998. ?The dynamical hypothesis in cognitive science.? Behavioral and Brain Sciences 21: 615?665. Voicu, H., and Schmajuk, N. 2000. ?Exploration, navigation and cognitive map- ping.? Adaptive Behavior 8: 207?224. von Frisch, K. 1967. The Dance Language and Orientation of Bees. Belknap/Harvard. 266 Wehner, R. 1983. ?Celestial and terrestrial navigation: Human strategies ? insect strategies.? In Neuroethology and Behavioral Physiology, ed. F. Huber and H. Markl. Springer-Verlag. Wehner, R. 1984. ?Astronavigation in insects.? Annual Review of Entomology 29: 277?298. Wehner, R. 1992. ?Arthropods.? In Animal Homing, ed. F. Papi. Chapman & Hall. Wehner, R., and Srinivasan, M. V. 1981. ?Searching behavior of desert ants, genus Cataglyphis (Formicidae, Hymenoptera).? Journal of Comparative Physiology 142: 315?338. Wehner, R., Bleuler, S., Nievergelt, C., and Shah, D. 1990. ?Bees navigate by using vectors and routes rather than maps.? Naturwissenschaften 77: 479?482. Wehner, R., Michel, B., and Antonsen, P. 1996. ?Visual navigation in insects: Cou- pling of egocentric and geocentric information.? The Journal of Experimental Biology 199: 129?140. Wehner, R., Gallizzi, K., Frei, C., and Vesely, M. 2002. ?Calibration processes in desert ant navigation: vector courses and systematic search.? Journal of Com- parative Physiology A 188: 683?693. Wei, C. A., Rafalko, S. L., and Dyer, F. C. 2002. ?Deciding to learn: Modulation of learning flights in honeybees, Apis mellifera.? Journal of Comparative Physiology A 188: 725?737. Wohlgemuth, S., Ronacher, B., and Wehner, R. 2001. ?Ant odometry in the third dimension.? Nature 411: 795?798. Zhang, S. W., Bartsch, K., and Srinivasan, M. V. 1996. ?Maze learning by honey- bees.? Neurobiology of Learning and Memory 66: 267?282. 267