ABSTRACT
Title of dissertation: BEE-ING THERE: THE SYSTEMATICITY OF
 HONEYBEE NAVIGATION SUPPORTS A CLASSICAL 
 THEORY OF HONEYBEE COGNITION
 Michael J. Tetzlaff, Doctor of Philosophy, 2006
Dissertation directed by: Professor Georges Rey
 Department of Philosophy
The Classical theory of cognition proposes that there are cognitive processes that 
are computations defined over syntactically specified representations, ?sen-
tences? in a language of thought, for which the representational-constituency re-
lation is concatenative. The main rival to Classicism is (Nonimplementational, or 
Radical, Distributed) Connectionism. It proposes that cognitive processes are 
computations defined over syntactically simple, distributed representions, for 
which the constituency relation is nonconcatenative. I argue that Connectionism, 
unlike Classicism, fails to provide an adequate theoretical framework for ex-
plaining systematically related cognitive capacities and that this is due to its nec-
essary reliance on nonconcatenative constituency.
There appears to be an interesting divergence of attitude among philoso-
phers of psychology and cognitive scientists regarding Classicism?s language of 
thought hypothesis. On one extreme, there are those who argue that only hu-
mans are likely to possess a language of thought (or that we at least have no evi-
dence to the contrary). On the other extreme, there are those who argue that dis-
tinctively human thinking is not likely to be explicable in terms of a language of 
thought. They point to features of human cognition which they claim strongly 
support the hypothesis that human cognitive-state transition functions are com-
putationally intractable. This implicitly suggests that the cognitive processes of 
simpler, nonhuman minds might be computationally tractable and thus amena-
ble to Classical computational explanation.
I review much of the recent literature on honeybee navigation. I argue that 
many capacities of honeybees to acquire various sorts of navigational informa-
tion do in fact exhibit systematicity. That conclusion, together with the correct-
ness of the view that Classicism provides a better theoretical framework than 
does Connectionism for explaining the systematicity of the relevant cognitive ca-
pacities, gives one reason in support of the claim that sophisticated navigators 
like honeybees have a kind of language of thought. At the very least, it provides 
one reason in support of the claim that the constituency relation for the mental 
representations of such navigators is concatenative, not nonconcatenative.
BEE-ING THERE: THE SYSTEMATICITY OF HONEYBEE NAVIGATION
SUPPORTS A CLASSICAL THEORY OF HONEYBEE COGNITION
by
Michael J. Tetzlaff
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2006
Advisory Committee:
Professor Georges Rey, Chair
Professor Peter Carruthers
Professor Christopher Cherniak
Professor Gary Marcus
?Copyright by
Michael J. Tetzlaff
2006
TABLE OF CONTENTS
.....................................................................................................................List of Tables iv
....................................................................................................................List of Figures v
Chapter 1: Introduction: Systematicity, Navigation, and Cognitive Architecture..... 1
............................................................................................................1.1  The Issues 1
...................................1.2  Classical and Connectionist Cognitive Architectures 4
.......................................................................................................1.3  Systematicity 6
...............................................................................................1.4  Why Navigation? 7
.............................................................................................1.5  Why Honeybees? 10
............................................................................................1.6  The Terrain Ahead 11
........................................Chapter 2: Two Candidate Explanations of Systematicity 14
..................................................2.1  The Classical Explanation of Systematicity 16
.........................2.2  Smolensky?s Connectionist Explanation of Systematicity 23
2.3  Summary of the Key Features of the Two Explanations
...............................................................................of Systematicity 32
........................................................................Chapter 3: Sytematicity and Causation 36
.........................................................................3.1  Vector Constituent Causation 37
.........................................................................................3.1.1  Superposition 38
........................................3.1.2  Criteria for Existence and Causal Efficacy 42
....................................................................................3.1.3  Vector Similarity 52
.........................................3.1.4  Vector Constituents as Causal Precursors 55
.........................3.1.5  Causal Efficacy of Information about Constituents 58
..................................................................................Chapter 4: Acausal Explanation? 70
..................................4.1  An Adequate Explanation, but Not of Systematicity 72
....................................................................................4.2  Moral of the Argument 81
...................................Chapter 5: Structure Sensitivity and Principled Explanation 83
..................................5.1  Prediction versus Accommodation of Systematicity 84
................................................5.2  The Nonarbitrariness of Classical Processes 91
...............................5.3  Unprincipledness is Not Structured-Domain Relative 93
5.3.1  The Relationship between Content Structure
......................................................and Representation Structure 100
5.3.2  Unprincipledness Rests with Vector Constituency,
.................................................................................Not Encoding 109
...................................................................5.4  Representations for Navigation 113
ii
..............................Chapter 6: Structure of the Honeybee?s Navigational Domain 115
...........................................................................................6.1  Simple Structures 116
......................................................6.1.1  Distance and Direction Relations 117
................................................6.1.2  Solar Compass and Solar Ephemeris 119
..................................6.1.3  Updating Previously Learned Relationships 123
....................................6.2  Complex Structures: Sequences, Rules, and Maps 130
.................................................................................6.2.1  Vector Sequences 131
.....................................................................................6.2.2  Maze Learning 143
...............................................6.2.2.1  Configurations and Sequences 144
...........................................................................................6.2.2.2  Rules 155
............................................6.2.3  Novel Shortcuts and Vector Averaging 161
...................................................6.2.3.1  Novel Shortcuts to the Hive 161
............................6.2.3.2  Explanations of Novel-Shortcut Behavior 172
...................................................................6.2.4  A Kind of Cognitive Map 190
....................Chapter 7: The Systematicity of Honeybee Navigational Capacities 197
.............................7.1  Systematicity of Information Acquired by Honeybees 198
.....................................................7.1.1  Attributions of Content to Insects 200
........................................................7.1.2  Some Honeybee Systematicities 205
......................................7.2  Weak Systematicity and the Tracking Argument 213
.....................................................................7.2.1  The Tracking Argument 215
.............................................7.3  Systematicity and Semantic Structural Roles 223
..................................................7.3.1  Distinguishing Systematic Variants 224
.............................................................................7.3.2  ?What? and ?Where? 226
..............................................................................................7.3.3  Indexicals 227
......7.4  Operations on Semantic Constituents of Complex Representations 230
7.5  Algebraic Rules: An Introduction to Modelling Issues............................. 237
7.5.1 Algebraic Rules and Free Generalization......................................... 239
7.5.2 Free Generalization in Bees................................................................. 242
............................................................................7.6  Summary and Conclusion 249
Appendix A: A Limited Representational System which is both Map-
.......................................................................and Language-Like 255
...........................................................................A.1  Lexicon for Map Legend L 255
....................................................................................................A.2  Syntax for L 256
..............................................................................................A.3  Semantics for L 256
..............................................................................................A.3.1  L-Models 256
..........................................................A.3.2  Truth Conditions for wffs of L 257
.......................................................................................................................References  258
iii
LIST OF TABLES
Table 6.1: Stimulus pairs in Giurfa et al.?s (2001) delayed matching-to-
...........sample and delayed non-matching-to-sample experiments 159
..........Table 6.2: Courses set in Menzel et al.?s (1998) displacement experiments 171
Table 6.3: Comparison of explanations of Menzel et al.?s (1998)
........................................................displacement experiment results 190
iv
LIST OF FIGURES
.................................................................................Figure 2.1: Vector representations 24
......................................Figure 2.2: Vector representations are processed as wholes 29
Figure 6.1: Course configurations in Collett (T. S.) et al.?s (1993)
...............................................................vector sequence experiments 133
Figure 6.2: Train and test configurations in Collett (M.) et al.?s (2002)
.............................................................................channel experiments 137
........Figure 6.3: Mazes used in Zhang et al.?s (1996) maze learning experiments 144
Figure 6.4: Sample maze configurations in Collett (T. S.) et al.?s (1993)
...............................................visual-sequence learning experiments 148
Figure 6.5: Maze configurations for Collett (T. S.) et al.?s (1993)
............................?blue?single exit? sequence learning experiment 151
...............................Figure 6.6: Mazes learned with color cues (Zhang et al. 1996) 156
Figure 6.7: Y-maze used in a delayed matching-to-sample experiment
.................................................................................(Giurfa et al. 2001) 158
Figure 6.8: Map of the area chosen by Menzel et al. (1998) for their
...................................................................displacement experiments 162
.....................Figure 6.9: Distributions of vanishing bearings (Menzel et al. 1998) 164
Figure 6.10: Histograms of vanishing bearing distributions in
.............................Menzel et al.?s (1998) displacement experiments 180
..........Figure 7.1: Novel metric shortcuts contrasted with novel complex routes 231
Figure 7.2: Connectivity structure of Dickinson and Dyer?s (1996)
.............................................model of solar ephemeris learning 243
v
Chapter 1
Introduction: Systematicity, Navigation, and Cognitive Architecture
1.1??The Issues
The Classical theory of cognition proposes that there are cognitive processes that 
are computations defined over syntactically specified representations, ?sen-
tences? in a language of thought. Classicism provides a theoretical framework for 
explaining several features of cognition.
1
 Some of these are the productivity of 
thought, the compositional unity of particular thoughts, inferential relations 
among thoughts, structure-sensitive errors in reasoning, the multiplicity of psy-
chological ?attitudes? that may be taken toward particular thoughts (we can be-
lieve that P, desire that P, etc.), the causal relations that obtain between thoughts in 
cognitive processes, and the systematicity of thought. Theories of cognitive ar-
chitecture must be evaluated in light of how well they explain (or explain away) 
those and other properties of cognition. My focus is on the systematicity of cog-
nitive capacities.
The main rival to Classicism is Nonimplementational (or Radical) Distrib-
uted Connectionism (hereafter, simply Connectionism). As we will see, Classi-
1
1
 Rey 1997.
cism affords a relatively straightforward explanation of the systematicity of 
thought. And, though other approaches to cognitive architecture might turn out 
to be viable,
2
 the only worked-out alternative to the Classical explanation of sys-
tematicity is a Connectionist one. Thus, one of the two principal questions I ad-
dress is, Which theoretical framework, Classicism or Connectionism, provides 
the best explanation of systematicity? I argue that Connectionism, unlike Classi-
cism, fails to provide an adequate framework for explaining systematicity. 
There appears to be an interesting divergence of attitude among philoso-
phers of psychology and cognitive scientists regarding the Classicist?s language 
of thought hypothesis. On one extreme, there are those who argue that only hu-
mans are likely to possess a language of thought (or that we at least have no evi-
dence to the contrary). Povinelli and colleagues
3
 favor the view that certain hu-
man cognitive capacities require a language of thought. Some of the capacities 
they include in that category are the capacities to represent unobservables and 
counterfactual situations, to distinguish individuals and kinds, to learn new rules 
that operate on instances of variables, and to use productive and systematic 
symbolic systems. However, they argue that, in many cases, there is evidence 
2
2
 See Beer 2000; Haugeland 1997; and van Gelder 1995, 1998.
3
 Penn and Povinelli (submitted), Povinelli and Bering 2002, Povinelli et al. 2000, Povinelli and 
Giambrone 2001, Povinelli and Vonk 2003.
which suggests that nonhumans lack such capacities, while in other cases, there 
is a lack of evidence that nonhumans have such capacities. 
On the other extreme, it has been argued that distinctively human thinking 
is not likely to be explicable in terms of a Classical language of thought. For ex-
ample, Horgan and Tienson
4
 point to features of human cognition (its open-
endedness, the potential relevance of anything to anything, and the holistic char-
acter of relevance) which they claim strongly support the hypothesis that human 
cognitive-state transition functions are computationally intractable. This implic-
itly suggests that the cognitive processes of simpler, nonhuman minds might be 
computationally tractable and thus amenable to Classical computational expla-
nation.
Thus, the second of the two principal questions I address is, Do the cognitive 
capacities of any nonhuman organisms exhibit systematicity? I argue that certain 
navigational capacities of honeybees do in fact exhibit systematicity. That conclu-
sion, together with the correctness of the view that Classicism provides a better
theoretical framework for explaining the systematicity of the relevant naviga-
tional capacities than does Connectionism, gives one reason in support of the 
claim that sophisticated navigators like honeybees have a kind of language of 
3
4
 Horgan and Tienson 1996.
thought (or, at the very least, a system of mental representation for which the 
constituency relation is Classical in character [? 2.1]).
1.2??Classical and Connectionist Cognitive Architectures
The most important tenets (for my purposes) of Classicism and Connectionism 
will be spelled out in more detail in Chapter 2. Here I provide a brief sketch of 
how those theories answer two questions: What are the relations among mental 
representations as vehicles of content? What roles do those vehicles play in cogni-
tive processes?
The Classicist holds that the relations among mental representations include 
both causal relations and constituency relations. Certain mental representations 
are complex, in the sense that they have constituents which are themselves repre-
sentations. Those constituents play causal roles in cognitive processes. That is, 
cognitive processes are causally sensitive to the constituent structure of mental 
representations. Moreover, mental representations may share constituents. In 
other words, two different, complex-representation tokens may share constituent 
tokens of the same type.
For purposes of illustration, we can think of Classical mental representations 
as being analogous, in certain respects, to formulae of propositional logic. Thus, 
suppose that a cognitive system?s entokening P ? Q causes it to entoken 
4
~Q???~P. The causal mechanisms responsible for that transition, on the Classical 
story, are sensitive to the constituent structure, or syntax, of P ? Q, ~P, and ~Q. 
The transition will have been governed by rules that operate on the constituents 
of those representations.
The Connectionist, unlike the Classicist, holds that the only relations among 
mental representations are causal relations (though there are constituency rela-
tions among the contents of mental representations). The Connectionist hypothe-
sizes that the mind is a kind of network of interconnected nodes. It?s structure, at 
the cognitive level, is similar to the structure of the brain at the level of neurons 
and their interconnections. Mental representations are not formulaic; rather, they 
are patterns of activity levels across sets of nodes. These representational patterns 
do not have parts that are themselves representations. They are, in that sense, 
simple rather than complex (though their contents may be complex). Cognitive 
processes are transformations of representational patterns into other representa-
tional patterns.
Suppose, then, that a cognitive system?s entokening of the pattern <1, 2, 3, 4> 
causes it to entoken the pattern <5, 6, 7, 8>. (These representations may have the 
same respective contents as P ? Q and ~Q ? ~P; those contents could be, say, [If 
there?s smoke, there?s fire] and [If there?s no fire, there?s no smoke].
5
) The causal 
5
5
 I adopt the convention of using boldface square brackets to indicate contents.
mechanisms responsible for that transition, on the Connectionist story, are sensi-
tive to the activity levels of the individual nodes. The strengths of the connec-
tions between nodes determines what activity pattern becomes entokened as the 
result of the entokening of another activity pattern. There are no operations de-
fined over syntactically specified entities.
For the Connectionist, representations are distributed not only in the sense 
that they are realized by the activity levels of multiple nodes but also in the sense 
that a particular set of activity levels may realize many representations at once. 
This will be the case when a pattern of activity that is a representation is the sum, 
or superimposition, of multiple patterns that are themselves representations. 
(Similarly, cognitive processes are distributed in the sense that one and the same 
set of connection strengths may realize multiple operations at once [? 3.1.2].) As 
we will see, the idea of representations in superposition plays an important role 
in the Connectionist explanation of systematicity. 
1.3  Systematicity
There are a number of possible varieties of systematicity. Linguistic capacities 
may become more systematic over the course of development.
6
 Also, different 
kinds of cognitive capacities might be systematically related in different ways.
6
6
 Hadley 1994.
For now, a general characterization will do (I argue in Chapter 7 that honeybee 
navigational capacities exhibit two specific kinds of systematicity). The central
idea is that certain, relatively specific cognitive capacities come in clumps. That 
is, if a mind has certain cognitive capacities, it thereby?by nomological necessi-
ty?also has certain other cognitive capacities. Common examples of systemati-
cally related capacities are various linguistic ones. Thus, if a person has the ca-
pacity to understand the sentence, ?John loves Mary,? then that person thereby 
also has the capacity to understand the sentence, ?Mary loves John.?
As I?ll emphasize in the next chapter, systematicity has an important seman-
tic aspect. That this is so is tied to the fact that cognitive capacities are capacities 
to acquire, store, and process information. An explanation of systematicity must 
make clear how causal cognitive processes preserve the appropriate semantic 
relations among mental representations.
1.4  Why Navigation?
Patricia Churchland once pointed out that ?if you root yourself to the ground, 
you can afford to be stupid.?
7
 On the other side of the coin, if your survival de-
pends on long foraging trips to perhaps unfamiliar territory far from home, then 
you can?t afford to be stupid. For the need to navigate over long distances and to
7
7
 Churchland 1986, p.13.
find your way back to safety brings with it the distinct possibility that you will 
become lost. So the abilities to plan your trip in advance and to think about what 
to do when in fact you do become lost would be very valuable assets. 
It?s extremely likely that some navigational capacities do not require cogni-
tive capacities. For example, there supposedly is no need to posit thought proc-
esses or memories in order to explain chemotaxis, phototaxis, or magnetotaxis
8
 in 
bacteria. Likewise, although ants have the ability to home toward remembered 
landmarks, it is plausible that such beacon homing can be explained in terms of 
recognition?triggered-response mechanisms.
9
On the other hand, some navigational capacities would seem to require the 
capacity to represent various places of interest and certain relations (topological, 
metric, etc.) among them,
10
 as well as the capacity to make inferences involving 
those representations. Perhaps the clearest example is the capacity to take novel 
shortcuts. Thus, suppose an organism has learned how to get from Place A to 
Place B and how to get from Place C to Place A. Suppose further that the organ-
ism is unfamiliar with the territory between Places B and C, and that no percep-
8
8
 Blakemore and Frankel 1981.
9
 Gallistel (1998), however, argues that the image matching mechanism thought by many to un-
derly beacon (landmark) homing in ants requires symbolic computation.
10
 Although I here speak of representing places and relations, I mean to leave open the issue of 
what contents and extensions such representations actually have, at least in the case of nonhu-
man animals (see below, ? 7.1.1).
tible features associated with Place B (or with known routes to or from it) are
detectable by the organism from Place C. Nonetheless, when at Place C, it takes 
the direct route from Place C to Place B. Assuming that the organism?s finding its 
way to Place B was not accidental, it must have acquired information about the 
directed distances between Places A and B and between Places A and C, and it 
must have used that information to infer the direct route. We know of no other 
way an organism (or device) could accomplish such a task. 
Navigation in humans and other animals, including invertebrates, has been 
studied extensively.
11
 Despite this, philosophers of mind have devoted relatively 
little attention to this body of work, certainly much less attention than they have 
devoted to natural language.
12
 In particular, recent philosophical discussions of 
systematicity have focused on linguistic capacities and sentence parsing.
13
 A col-
league once suggested that if the philosophical focus had been on navigation 
rather than language, the language of thought hypothesis would not have been 
nearly so influential. I hope to convince you that that suggestion is dubious.
9
11
 See, for example, Healy 1998 and Golledge 1999.
12
 Two noteworthy exceptions are Carruthers 2005 and Robinson 1995.
13
 Cummins et al. 2001, Hadley 1994, Niklasson and van Gelder 1994.
1.5  Why Honeybees?
The honeybee is a superb model organism for the study of learning and memory. 
Also, its neurophysiology is being investigated using both electrical and optical 
techniques.
14
 It has ?only? about 960,000 neurons, which makes the goal of at-
taining a comprehensive understanding of its neuroanatomy relatively practical.
The evidence is growing for the idea that the honeybee has genuinely cogni-
tive capacities.
15
 This is, it is becoming increasingly difficult to explain honey-
bees? behavior in nonrepresentational terms. For example, they exhibit multiple 
stages of memory consolidation;
16
 their learning mechanisms go well beyond 
those of simple association; and they can generalize well beyond the information 
present in the stimuli used for training. Some researchers have come to advocate 
the view that honeybees have goal-specific expectations
17
 (cf. ? 6.1.3). Especially 
pertinent is the growing body of evidence that strongly supports the hypothesis 
that honeybees are capable of taking novel shortcuts (?? 6.2.3.1, 6.2.3.2, 6.2.4).
10
14
 Joerges et al. 1997, Menzel and M?ller 1996.
15
 Menzel and Giurfa 2001, Menzel et al. 2000b.
16
 Menzel 1999.
17
 Menzel et al. 1996.
1.6  The Terrain Ahead
Chapter 2 revisits the Classical explanation of systematicity and Smolensky?s 
Connectionist explanation.
18
 Although these explanations are familiar to many 
philosophers and cognitive scientists, it will be useful to review them in detail. I 
focus on Smolensky?s explanation, since it is the most-often discussed explana-
tion in the literature, and it contains the essentials of any adequate Connectionist 
explanation.
Chapter 3 examines the role of representational constituents in the Classical 
and the Connectionist explanations. As we have seen, Classicism attributes 
causal roles to the constituents of complex representations. If a Classical repre-
sentation is tokened, so too must be its constituents, and they will thus be avail-
able to play causal roles in mental processes. There is still much confusion in the 
literature concerning whether the Connectionist explanation attributes causal 
efficacy, in cognitive processes, to representational constituents. I argue that it 
does not?it does not attribute to such constituents causal roles in mental opera-
tions on the representations of which they are constituents. In that sense, the 
Connectionist explanation is not a causal one.
Chapters 4 and 5 raise and defend arguments for the claim that we have 
strong (though defeasible) reasons to prefer the Classical explanation of syste-
11
18
 Many of the key elements of these explanations are developed in articles collected in MacDon-
ald and MacDonald 1995.
maticity over the Connectionist one. I argue in Chapter 4 that while there is a 
sense in which the Connectionist explanation is an adequate one, as an ?acausal? 
explanation, it is not adequate as an acausal explanation of systematicity. At best, 
it is an adequate explanation of how networks can be rigged so as to exhibit the 
systematicities of which Classical architectures, by their very nature, are capable. 
Combining the lessons from Chapters 3 and 4, I conclude that since the Connec-
tionist account is neither a causal explanation of systematicity nor an acausal ex-
planation of systematicity, it is not really an explanation of systematicity at all.
I argue in Chapter 5 that the Connectionist explanation is unprincipled in 
that it appeals to cognitive processes that are arbitrary with respect to Connec-
tionism. The explanation will be shown to have the same form as certain scien-
tific explanations which are clearly unprincipled. The central point is that Classi-
cal cognitive systems exhibit systematicity ?for free,? as it were (by nomological 
necessity). The systematicity of Classical systems is a product of Classical cogni-
tive architecture alone. If a Classical system doesn?t exhibit systematicity, that 
will have to be because it has been specifically designed out of the system. On 
the other hand, Connectionist cognitive architectures can just as easily be non-
systematic as systematic. For such architectures, systematicity has to be specifi-
cally designed in. (An important part of my argument is a response to an attempt 
12
by Cummins and colleagues
19
 to shift the issue from systematic relations among 
thoughts or items of information to law-like psychological effects of acquiring 
knowledge of various structured domains. As we?ll see, if that shift is warranted, 
it becomes a bit (but just a bit) easier to argue that the Classical explanation is just 
as unprincipled as the Connectionist account.)
Chapter 6 is a review of much of the literature on honeybee navigation. I 
argue that some of the navigational abilities of bees require the learning and 
storage of semantically complex information. Some, in addition, require learning 
by means of combining new and previously acquired information in novel ways.
Finally, in Chapter 7, I argue that various capacities of honeybees to acquire 
information relevant to their navigational tasks exhibit certain systematicities. I 
conclude by proposing that a complete account of honeybee navigational capaci-
ties will be one that posits cognitive processes that are computations defined 
over syntactically specified representations. At the very least, such an account 
will be one that posits computations defined over configurationally complex rep-
resentations. Either way, the account will not be a Connectionist one.
13
19
 Cummins 1996 and Cummins et al. 2001.
Chapter 2
Two Candidate Explanations of Systematicity
A view widely held among cognitive scientists
1
 is that human thought is system-
atic. Roughly, the idea is that our capacity to think certain thoughts is intrinsi-
cally related to our capacity to think certain other thoughts. For example, anyone 
who is able to think that Seabiscuit was a better racehorse than War Admiral is 
also able to think that War Admiral was a better racehorse than Seabiscuit. Any-
one who can think that there are black cats and brown dogs can also think that 
there are black dogs and brown cats.
There are many ways to more precisely specify the nature of systematicity.
2
 
For present purposes, we may consider two structurally complex thoughts to be 
systematically related just in case they have the same logical and representational 
constituents and are formal permutations of each other. Thus, whereas the 
thought that Fa ? Gb is a systematic variant of the thought that Ga ? Fb, this is 
true neither of the thought that Fa ? Hb nor the thought that ~ (Fa ? Gb).
14
1
 In addition to the researchers who contributed to the explanations of systematicity presented in 
this chapter, some others who (at least implicitly) accept that human thought is systematic are 
Anderson 1995; Barsalou 1992, 1993; Block 1995; Butler 1991; Carruthers 2005; Hadley 1994, 1997; 
Horgan and Tienson 1996; Hummel and Holyoak 2001; Marcus 2001; Niklasson and van Gelder 
1994; Phillips 1998; Phillips and Halford 1997; Pinker 1997; and Sterelny 1990.
2
 See Hadley 1994, McLaughlin 1993, and Niklasson and van Gelder 1994.
There are two aspects of systematicity particularly important to account for. 
First, systematicity is supposedly a matter of psychological law. Anyone who is 
able to think the thought T is thereby also able to think systematic variants of T. 
Nature, it seems, packages capacities to think various thoughts in bundles. Sec-
ond, systematicity has a semantic aspect: the semantic relations among system-
atically related thoughts are nonarbitrary. For example, the contents [brown], 
[black], [cat], and [dog] contribute to the content of both the thought that there 
are black cats and brown dogs and the thought that there are black dogs and 
brown cats.
A natural place to look for explanations of systematicity, its lawfulness, and 
its semantic character are theories of cognitive architecture. Fodor and others
3
 
(hereafter, Fodor) have promoted an explanation that appeals to Classical cogni-
tive architecture. Smolensky
4
 has offered an explanation that appeals to one type 
of Connectionist cognitive architecture.
The assumptions those explanations have in common include the following 
(note that the notion of constituency appealed to here is a very broad one, one 
that allows for the possibility that the constituency relation is an abstract, formal 
relation, rather than some sort of part ?whole relation):
15
3
 Fodor 1998, Fodor and McLaughlin 1995, Fodor and Pylyshyn 1995, and McLaughlin 1993.
4
 Smolensky 1995a?c.
Representationalism??Thinking that P requires having a mental representa-
tion that has the content [P].
Complexity of mental representations??Some mental representations are com-
plex in the sense that they have mental representations as constituents.
Structure-sensitive processing? ?Mental processes are sensitive to the con-
stituent structure of mental representations.
Compositionality for mental representations??The content of some mental rep-
resentations is determined by the contents of their constituents and by 
their constituent structure.
5
 
But Fodor?s Classical explanation and Smolensky?s Connectionist explanation 
rely on different views about the nature of mental representations, mental proc-
essing, and the constituency relation for mental representations. 
2.1??The Classical Explanation of Systematicity
Let?s begin with Fodor?s Classical explanation of systematicity. For the purpose 
of understanding his account, it useful to see that he endeavors to explain the 
systematicity of thought in much the way one might explain the systematicity 
(what there is of it) present in natural language. For example, anyone who can 
understand the sentence ?Andy loves Betty? is bound to be able to understand 
the sentence ?Betty loves Andy.? A plausible explanation of this appeals to these 
facts: (1) the two sentences have the words ?Andy,? ?loves,? and ?Betty? as con-
16
5
 This notion of compositionality, as I intend it to be understood, is weaker than the linguistic 
notion, since what is meant by ?constituent? is left open.
stituents; (2) those constituents have the respective contents [Andy], [loves], and 
[Betty]; and (3) the two sentences have the same syntactic structure. Furthermore, 
understanding them requires understanding what their syntactic structures and 
their constituents contribute to their contents. But if all this is true, then it looks 
like what it takes to understand one of the sentences is just what it takes to un-
derstand the other. Roughly, what explains the systematicity present in natural 
language is that the requirements for understanding systematically related sen-
tences are the same. What is necessary for understanding ?Andy loves Betty? is 
necessary and normally sufficient for understanding ?Betty loves Andy.?
A useful way to bring the key features of the Classical explanation into relief 
is to first suppose that John is able to think that Andy loves Betty. We may then 
spell out in detail how Classical hypotheses about mental representation and 
processing, together with that supposition, explain how John is thereby also able 
to think that Betty loves Andy.
The first step of the Classical explanation is a hypothesis about the nature of 
propositional attitudes, such as believing that P, desiring that P, and so on. On the 
Classical view, to have a certain sort of occurrent propositional attitude toward a 
thought content is to stand in a specific kind of computational relation to a 
mental-representation token with that content. For example, for a to occurrently 
judge that C is for a to have entokened within his cognitive system a representa-
17
tion both having the content [C] and playing the computational role of a judg-
ment. Clearly, then, on the Classical account of propositional attitudes, John is 
able to think that Andy loves Betty only if he can entoken a mental representa-
tion with the content [Andy loves Betty]. Let?s say that a token of a mental repre-
sentation with that content is a token of ?.
6
 (For ease of exposition in what fol-
lows, I?ll generally put aside type?token subtleties.)
The Classical view hypothesizes that some mental representations are com-
plex, in the sense that they have representations as constituents. Furthermore, the 
Classicist proposes that the structure of some complex mental representations is 
governed by a combinatorial syntax. This means that certain mental representa-
tions are of certain formal types (individual constants, variables, etc.) and that 
they combine to form more complex representations according to syntactic rules. 
Thus, the Classicist proposes that ? is a complex, syntactically structured repre-
sentation, formally much like a well-formed formula in an artificial language 
such as first-order predicate logic. Indeed, ? is part of a system of mental repre-
sentation, ?Mentalese,? which is literally a language of thought.
18
6
 We could call them ?ANDY LOVES BETTY? representations. But at this point in the exposition, 
that label might be misleading, since it would prematurely suggest that they have language-like 
constituent structure. The Classical explanation proceeds by hypothesizing that mental representa-
tions have language-like constituent structure and then showing that that hypothesis plays a 
central role in a good explanation of systematicity. To refer to the representations in question as 
?ANDY LOVES BETTY? representations might make the Classicist?s hypothesis seem trivial or 
question begging (cf. Cummins et al. 2001), when in fact it is neither. In what sense the structure 
of mental representations is language-like, on the Classical view, is explained below.
What, then, are ??s constituents? The specific kinds of constituents that 
mental representations have is a point of contention among Classicists. But the 
Classical explanation of systematicity doesn?t depend on any particular stance on 
that issue. The important point is that whatever are ??s constituents, they stand in 
structural relations governed by syntactic rules. So, for expository purposes, we 
can keep the discussion at an intuitive level.
Let?s assume, then, that ??s content, [Andy loves Betty], is composed of the 
contents [Andy], [loves], and [Betty].
7
 Further, since the constituents of a repre-
sentation are themselves representations, let?s suppose that ? has three constitu-
ents, each having one of those three contents. Call the constituents of ? which 
have those contents a, L, and b, respectively,
8
 where a and b are individual con-
stants and L is a 2-place predicate.
Now, Mentalese representational constituency is a co-tokening relation: rep-
resentation R is a constituent of representation R* just in case it is metaphysically
necessary that whenever R* is tokened, so is R.
9
 Call this sort of constituency 
19
7
 Again, this supposition is for expository purposes. As McLaughlin (1993) notes, Classicism is 
not committed to the view that the constituents of a thought content stand in one-to-one corre-
spondence with the words in a public-language sentence that may be used to express it.
8
 The constituents a, L, and b themselves might be either simple or complex. The Classical account 
of systematicity does not and need not take a stand on this issue.
9
 This is explicit in Fodor and McLaughlin 1995, p. 201; see also Fodor 1998. van Gelder (1990) 
makes it clear that concatenative constituency is a necessary feature of complex Classical repre-
sentations. According to Classicism, the mind/brain is a syntactically driven physical system that 
exhibits semantically coherent behavior. This requires that mental processes are causally sensitive 
to the syntactic structure of mental representations, which in turn requires that their syntactic 
constituents are physically entokened.
?concatenative? constituency. Clear examples of representations with concatena-
tive constituency are representationally complex written sentences. The word 
?Andy? is concatenative constituent of the sentence ?Andy loves Betty,? since the 
latter cannot be tokened unless the former is tokened. 
From the Classical characterization of the constituency relation, and given 
that a, L, and b are ??s constituents, it follows that tokening ? requires tokening a, 
L, and b. Furthermore, John is able to stand in a computational relation to ? only 
if his cognitive system can token ?. Hence, John is able to stand in a computa-
tional relation to ? only if his cognitive system can token a, L, and b.
The Classicist?s story so far is that John is able to think that Andy loves Betty 
only if his cognitive system can token a, L, and b. What the Classicist still needs to 
explain is how John?s cognitive system can token a, L, and b only if he can think 
that Betty loves Andy. The explanation proceeds by appealing to the Classical 
account of mental processes. That account includes the hypothesis that some 
mental processes have representational constituents in their domains and are 
causally sensitive to syntactic structure. Thus, the Classicist claims that there are 
mental processes that can operate on ??s constituents so as to construct mental 
representations which have the same syntactic form as ?, the very same constitu-
ents as ?, but a different arrangement of those constituents. If there are mental 
processes that can construct ? by, as it were, completing the mental predicate 
20
?_L_? with ?a? in the first slot and ?b? in the second, then there are mental processes 
that can construct other mental representations by completing the same predicate 
with ?b? in the first slot and ?a? in the second. So, on the Classical view, if John?s 
cognitive system is capable of tokening a, L, and b (and aLb representations), then 
his cognitive system is also capable of tokening bLa representations.
What remains to be explained is how John can token bLa representations 
only if he can think that Betty loves Andy. That is, there is still the question of the 
content of bLa. The Classicist addresses this question by hypothesizing that the 
semantics for mental representations is compositional: the content of a complex 
mental representation is determined by its syntactic structure together with the 
contents of its constituents, which are context independent. On this hypothesis, ? 
has the content [Andy loves Betty] because, first, its constituents, a, L, and b, have 
the contents [Andy], [loves], and [Betty], respectively, and second, it has the 
syntactic form xRy, where x = a, R = L, and y = b. Likewise, bLa has the content 
[Betty loves Andy] because, first, its constituents, a, L, and b, have the contents 
they do, and second, it has the form xRy, where x = b, R = L, and y = a. Therefore, 
John can token bLa representations only if he is able to think that Betty loves 
Andy. This completes the explanatory chain from the supposition that John can 
think that Andy loves Betty to the result that he can think that Betty loves Andy.
21
Note that the Classical account explains why the semantic relations among 
systematically related thoughts are nonarbitrary. Systematically related mental 
representations share constituents, and those constituents contribute the same 
contents to the content of the relevant mental representations. That is why, for 
example, the content [loves] contributes to the content of both the thought that 
Andy loves Betty and the thought that Betty loves Andy. Thinking either thought 
requires tokening a complex mental representation having a constituent with the 
content [loves].
The Classical account also explains why systematicity is a nomologically 
necessary feature of thought. Because the systematic variants of a particular 
mental representation are constructed from the same constituents by means of 
the same syntactic rules, anyone who can token that mental representation is 
bound to be able to token its systematic variants. Of course, there could be spe-
cial circumstances in which systematicity does not hold for certain thoughts. For 
example, John might suffer a type of brain damage that prevents him from 
thinking that Betty loves Andy, even if he can think that Andy loves Betty. But 
the point is that, on the Classical view, such circumstances would have to be out 
of the ordinary. In other words, the Classicist may hold that the law that thought 
is systematic is a ceteris paribus law.
Let?s move on to Smolensky?s explanation of systematicity.
22
2.2??Smolensky?s Connectionist Explanation of Systematicity
Smolensky accepts representationalism, mental-representation complexity, struc-
ture-sensitive mental processing, and compositionality for mental representa-
tions. He disagrees with the Classicist, however, on the nature of mental repre-
sentations and, correlatively, on the nature of the constituency relation. He also 
disagrees with the Classicist on the nature of mental processes.
10
 
Smolensky?s account of the systematicity of thought takes some setting up, 
but then is relatively straightforward. A good place to begin is his view on the 
nature of mental representations.
Unlike Fodor, Smolensky does not attempt to explain systematicity in terms 
of language-like mental representations. Instead, he appeals to representations 
that encode both the syntactic structure of language-like representations and their 
constituents but do not actually have language-like, configurational structure 
themselves. On his account, all mental representations, or at least those impor-
tant for issues about systematicity, are patterns of Connectionist-network unit 
activation levels. They are distributed over many units, which is to say that
(1) every mental representation comprises the activity of multiple units, and
(2) every unit participates in multiple mental representations. Such activity pat-
terns are readily conceptualized as vectors (ordered sets of numbers), where each 
23
10
 Actually, Smolensky doesn?t argue that Classicism is wrong. His intended conclusion is that 
there are viable Connectionist alternatives to Classicism.
number in the vector uniquely corresponds to the activity level of a particular 
unit (Fig. 2.1). For this reason, following Smolensky and many others, we may 
simply call mental representations of this sort ?vectors.?
On Smolensky?s view, the constituency relation for vector representations is 
a certain type of vector component
11
 relation, not a co-tokening relation. Of 
course, there are many vector component relations: vectors are mathematically 
decomposable in many ways (in some systems of vector representation, includ-
ing Smolensky?s, infinitely many). For example, just as many different pairs of 
numbers sum to a given number, many different pairs of vectors sum to a given 
vector (some vector operations are introduced below). So, some of the compo-
nents mathematically derivable from a vector representation will not have an ap-
24
11
 On my usage, vector components are not elements or subsets of vectors. Vector components are 
members of the domains of vector operations such as vector addition and tensor multiplication, 
which are introduced below.
1
6 3
4
2
Vector: ?1, 6, 3, 4, 2?
Content: [Andy loves Betty]
5 0 7 8 9
Vector: ?5, 0, 7, 8, 9?
Content: [Betty loves Andy]
a b c d e a b c d e
Activity pattern 1 Activity pattern 2
Figure 2.1. Vector representations. The activity patterns and the contents I?ve assigned to them 
were chosen arbitrarily. Note that, although the contents of two vectors may be systematically 
related, as they are here, this does not require that the vectors have any common elements or 
subvectors. a?e: Connectionist-network units.
propriate content or will have no content at all. Such components, then, will not 
be representational constituents of the vector from which they are derivable. To 
address this matter, Smolensky comes up with a system of vector representation 
in which just those vector components with the appropriate contents are the con-
stituents of mental representations. He achieves this, in part, by providing an 
algorithm for translating Classical symbol structures into vectors. In particular, 
he shows that a unique vector translation is derivable from any constituent 
structure, binary tree.
12
 
In order to understand Smolensky?s translation scheme, it is necessary first 
to understand two vector operations, vector addition and tensor multiplication. 
To add two vectors, we simply add their corresponding elements. Thus, the vec-
tor sum of ?1, 2, 3? and ?2, 3, 4? is ?3, 5, 7?. Generalizing to all finite vectors, the 
sum of the vectors 
?x
1
, x
2
, ?, x
n
?
and
?y
1
, y
2
, ?, y
n
?
is
25
12
 Smolensky (1995c) alternately speaks of vectors as being, realizing, and representing Classical 
symbol structures. He doesn?t speak of vectors as translating them. However, with respect to the 
present issue, I think that seeing vectors as translations (of a sort) most clearly elucidates his view. 
For the notion of translation brings with it the idea of semantic relations, and that idea is crucial 
to the explanation of systematicity.
?x
1
 + y
1
, x
2
 + y
2
 , ?, x
n
 + y
n
?. 
(Vector addition is defined only for vectors having the same number of ele-
ments.) The tensor product of two vectors is the vector which contains all the 
separate products of every single element of the first and every single element of 
the second. For example, the vector product of ?1, 2? and ?2, 3, 4? is
?1(2), 1(3), 1(4), 2(2), 2(3), 2(4)? = ?2, 3, 4, 4, 6, 8?.
Generalizing to all finite vectors, the tensor product of
?x
1
, x
2
, ?, x
n
?
and
?y
1
, y
2
, ?, y
m
?
is
?x
1
y
1
, x
1
y
2
, ?, x
1
y
m
, x
2
y
1
, x
2
y
2
, ?, x
2
y
m
, x
n
y
1
, x
n
y
2
, ?, x
n
y
m
?. 
Vectors which are tensor products, or which have tensor products as compo-
nents, are called ?tensor product representations.?
We are now in a position to understand the essential?s of Smolensky?s tree 
translation scheme.
13
 Take some constituent structure tree, say, (L (A, B)), 
26
13
 See Smolensky 1995c, pp. 136?141.
having the content [Andy loves Betty]. In Smolensky?s system, it has the unique 
vector translation
V = (r
0
 ? L) + (r
1
 ? ((r
0
 ? A) + (r
1
 ? B))),
where ??? is tensor multiplication and ?+? is vector addition. The tree constituents 
L, A, and B are assigned the vectors L, A, and B, respectively. That L and A are left 
branches is encoded by taking the tensor products of L and r
0
 and of A and r
0
, 
where r
0
 is a (constant) vector than encodes the left-branch structural role. That B 
and (A, B) are right branches is encoded by taking the tensor products of B and 
r
1
, and of (r
0
 ? A) + (r
1
 ? B) and r
1
, where r
1
 is a (constant) vector that encodes 
the right-branch structural role. That a certain tree has two particular trees as its 
immediate subtrees?for example, that (L (A, B)) has L and (A, B) as its im-
mediate subtrees?is encoded by requiring that the vector which translates the 
higher-level tree is the sum of the vectors which translate the two subtrees.
Given Smolensky?s tree translation scheme, just those vector components 
with the appropriate contents are the constituents of mental representations. Al-
though V is equal to the sum of many different pairs of vectors, only the sum
(r
0
 ? L) + (r
1
 ? ((r
0
 ? A) + (r
1
 ? B)))
gives us V?s constituents, L, A, and B.
14
 
27
14
 This works as long as r
0
 and r
1
 are independent vectors (see Smolensky 1995c, pp. 237 and 
283n19).
Smolensky?s notion of vector constituency, then, may be stated as follows:
Vector constituency? ?Vector V
n
 is a vector constituent of vector V
m
 iff V
n
 
uniquely translates tree T, V
m
 uniquely translates tree T*, and T is a Clas-
sical constituent of T*. 
Vector constituency, then, is a derivation relation, not a co-tokening relation. It is 
a vector component relation that presupposes a translation function from trees to 
vectors, where the vector that translates a particular tree is uniquely derivable 
from it.
Since vector constituency is not a co-tokening relation, one vector can be a 
constituent of another, tokened vector, without itself ever being tokened. Ac-
cordingly, it is further true that although the representation-level processes in a 
Smolensky cognitive architecture result in vector-to-vector transformations, they 
do not operate on any tokened constituents of the vector tokens they trans-
form?vectors are processed as wholes (Fig. 2.2; see also ? 3.1.2). This stands in 
stark contrast to the Classical account, on which there are representation-level 
processes that transform complex representation tokens by operating on their 
tokened constituents.
The principal representation-level operation in Connectionist networks is 
matrix multiplication: the multiplication of a vector by a matrix of connection 
strengths. Matrix multiplication is implemented by a set of simpler algorithmic 
processes, each being the multiplication of a single unit?s activation value by a 
28
single connection strength. But these algorithmic processes operate at a subrepre-
sentational level of description: they do their job at the level of single units and 
single connections. They do not operate on patterns of activity levels. Hence, they 
do not operate on mental representations or their constituents (which themselves 
are patterns of activity levels). Thus, in a Smolenksy architecture, neither 
representation-level processes nor the algorithmic processes that implement 
them operate on the constituents of the representation tokens they manipulate.
It?s important to be clear on the role of trees and tree translation algorithms 
in Smolensky?s account. Neither are to be understood as playing causal roles 
within cognitive systems. They are, rather, elements of his theory of how cogni-
tive systems can exhibit some of the properties of Classical systems of represen-
tation. Trees simply provide a good example of representations having Classical 
29
[Andy loves Betty]
1
6
3
4
2
5
0
7
8
9
[Betty loves Andy]
Figure 2.2. Vectors are processed as wholes. Vector transforming processes in networks with dis-
tributed vector representations operate on entire vectors, not on any of their constituents. Here, 
the vector instantiated at left is directly transformed into an instantiation of one of its systematic 
semantic variants; and this is accomplished in the absence of any process that operates on any 
vector with the content [Andy], [loves], or [Betty].
constituent structure, and Smolensky shows that tensor product representations 
can have a parallel, but non-Classical, constituent structure. The tree translation 
algorithms describe but do not govern mental processes, in the sense that they 
are not executed by cognitive systems. They do, though, provide a way of under-
standing the tensor product representation constituency relation. They also pro-
vide a way to show that a Connectionist network with a Smolensky architecture 
can process tensor product representations in a way that maintains the appropri-
ate semantic relations among systematically related mental representations, as 
we will see shortly.
We may briefly sum up the key points of the preceding as follows. Consider 
a mental representation that has the content [Andy loves Betty]. On Smolensky?s 
account, that representation is a tensor product representation,
V
1
 = (r
0
 ? L) + (r
1
 ? ((r
0
 ? A) + (r
1
 ? B))).
Vector V
1
 is the unique translation, and encodes the constituent structure, of a 
tree, (L (A, B)), having the content [Andy loves Betty]. Furthermore, V
1
?s  
component vectors, A, L, and B, have the contents [Andy], [loves], and [Betty], 
respectively. Those vectors are the representational constituents of V
1
.
Now, vector V
1
 can be transformed into a different vector,
V
2
 = (r
0
 ? L) + (r
1
 ? ((r
0
 ? B) + (r
1
 ? A))).
30
Note that V
2
 has the same constituents as V
1
, but their mathematical arrange-
ment is different: the roles of A and B are reversed. A key question now is, What 
is the content of V
2
? Since vectors are translations of trees, an important step in 
answering that question is to determine which tree V
2
 translates. Smolensky, in 
fact, provides a procedure for deriving from any vector that tree which is its 
unique translation. He shows not only that there is only one vector that translates 
a given tree but also that there is only one tree derivable from a given vector. The 
tree that is uniquely derivable from and uniquely translated by V
2
 is (L (B, A)). 
Hence, assuming compositionality for tensor product representations, V
2
 has the 
content [Betty loves Andy]. 
The explanation of systematicity is now relatively straightforward. Suppose 
that John?s cognitive system has a Smolensky architecture and can token V
1
. 
Then the vector space for that system contains the vectors A, L, B, r
0
, and r
1
.
15
 
Furthermore, the system must (in principle) be capable of building up V
1
 by 
means of processes that both operate on its constituents and implement vector 
addition and tensor multiplication. But then the vector space for the system also 
contains V
2
. For V
2
 has the same constituents and the same mathematical struc-
ture as V
1
. Finally, if the vector space for the system contains V
2
, then the system
31
15
 These consequences depend on the properties of a Smolensky architecture. One key property is 
that of having fixed, independent, structural-role vectors (r
0
 and r
1
). Another is that of having a 
continuous range of unbounded activation values.
is capable of tokening V
2
. Hence, on Smolensky?s account, if John is able to think 
that Andy loves Betty, he is thereby also able to think that Betty loves Andy. For if 
John is able to think that Andy loves Betty, then his cognitive system is capable of 
tokening V
1
. And if his cognitive system is capable of tokening V
1
, it is capable of 
tokening V
2
. And, finally, if it is capable of tokening V
2
, then John is able to think 
that Betty loves Andy.
As on the Classical account, systematically related tensor product represen-
tations share constituents, and those constituents individually contribute the 
same contents to the content of the relevant mental representations. So Smolen-
sky?s account explains why the semantic relations among systematically related 
thoughts are nonarbitrary.
Smolensky?s account also seems to explain why systematicity is a nomologi-
cally necessary feature of thought: because a particular tensor product represen-
tation and its systematic variants have the same vector constituents and the same 
mathematical form, anyone who is able to token that representation is bound to 
be able to token its systematic variants.
2.3??Summary of the Key Features of the Two Explanations of Systematicity
The Classical explanation and Smolensky?s explanation both assume, in a broad 
sense, compositionality for mental representations. But they differ in four im-
32
portant respects. The Classical account posits a cognitive architecture with the 
following features:
(1) The constituency relation for mental representations is concatenative.
(2) Mental representations have syntactic structure.
(3) Mental processes are causally sensitive to the syntactic properties of 
mental representations.
(4) The constituents of mental representations play causal roles in mental 
processes.
On the other hand, Smolensky?s account posits a cognitive architecture with 
these features:
(5) The constituency relation for mental representations is nonconcatena-
tive.
(6) Mental representations have mathematical (vector) structure, of a sort 
that is not also a kind of syntactic structure.
(7) Mental processes are functionally sensitive to the constituent structure 
of mental representations.
(8) The constituents of any particular mental-representation token do not 
have causal roles in any operation on that token.
My claim that (8) is a feature of a Smolensky architecture is controversial. I show 
in the next chapter that it is indeed a feature of Connectionist architectures.
Feature (6) might require some clarification. Some defenders of Connection-
ism, including Smolensky, do speak of vectors as having syntactic structure and 
33
do consider mental processes to be sensitive to syntactic structure. For the math-
ematical structures of the relevant vectors encode the syntactic structures of their 
corresponding Classical representations, and that permits mental processes to be 
structure sensitive. But this is a terminological matter. To avoid confusion, I will 
use terms describing the formal structure of representation tokens only as de-
scriptions of their configurational structure, not as descriptions of their (broadly 
speaking) constituent structure (though these two kinds of structures may coin-
cide, as they do for Classical representations).
Features (5)?(8) are very plausibly essential features of any Connectionist 
architecture on which a non-Classical explanation of systematicity could be 
based. Again, that this is true for (8) is a topic of the next chapter. Feature (7) 
seems clearly essential for any adequate explanation of systematicity. Regarding 
(5) and (6), note first that they are features of any Connectionist architecture that 
employs distributed vector representations, whether or not they are tensor prod-
uct representations. Furthermore, all Connectionist systems alleged to exhibit 
some significant kind of systematicity employ distributed vectors. Indeed, as van 
Gelder
16
 argues, it is hard to see how Connectionists could provide a non-
Classical explanation of systematicity without appealing to distributed vectors. 
For Connectionist networks do not have arbitrarily extendable representational 
34
16
 van Gelder 1990, pp. 368?369 and 374?375.
resources?they have a finite number of units over which to represent arbitrary 
complex structures. So, in order to represent such structures, Connectionists have 
turned to representational schemes which permit the various parts of a complex 
structure to be represented at once over the same set of units; that is, they have 
turned to distributed vectors.
As I argue in Chapters 4 and 5, the appeal to distributed vectors in explana-
tions of systematicity turns out to be problematic. The force of the difficulties 
facing Connectionism will be clearer if we first see that the constituents of a vec-
tor representation token do not have causal roles in any operation on that token.
35
Chapter 3
Systematicity and Causation
There is a specific sense in which the Classical explanation of systematicity is a 
causal explanation. Since Classical constituency is a co-tokening relation, the rep-
resentational constituents within a cognitive system, on the Classical account, are 
available to causally interact via rule-governed processes in order to form sys-
tematically related mental representations. The causal efficacy of representational 
constituents is essential to the Classical explanation.
In contrast, Fodor and McLaughlin
1
 argue, Smolensky?s explanation is not a 
causal one. That is, his explanation of the capacity to token systematically related 
vectors does not posit causal laws governing constituents of those vectors. 
Nothing about a Smolensky architecture guarantees that the vector constituents 
of tokened vectors are ever themselves tokened within the system. Neither to-
kening a vector nor performing an operation on a vector requires tokening its 
vector constituents. So nothing about a Smolensky architecture guarantees that 
the vector constituents of tokened vectors are available to causally interact in or-
der to form systematically related mental representations. Moreover, neither con-
36
1
 Fodor 1998 and Fodor and McLaughlin 1995.
stituent structure trees nor tree?vector algorithms play any causal roles within 
Smolensky architectures.
In this chapter, I?ll examine and reject a variety of objection?s to Fodor and 
McLaughlin?s argument. Note that their argument applies to any cognitive ar-
chitecture for which the constituency relation is nonconcatenative. So there is 
good reason to think that it applies to every Connectionist architecture (? 2.3).
3.1  Vector Constituent Causation
Some defenders of Connectionism have argued that Smolensky?s explanation of 
systematicity is (or could turn out to be) a causal explanation after all. Some of 
them argue that the vector constituents of tensor product mental representations 
do (or might) play causal roles at the representational level of description, ap-
pealing to either the notion of superposition, criteria for existence and causal effi-
cacy, or similarity relations among vectors. Contrary to first appearances, on this 
sort of view, vector constituents are (or might be) causally efficacious, even if not 
severally present within the relevant cognitive system.
Other defenders of Connectionism argue that nonconcatenative constituency 
is compatible with the architectural requirement that a vector?s constituents must 
have played a causal role in the eventual production of that vector, and that that 
is enough to guarantee the causal efficacy of those constituents. Still others argue 
that whether vector constituents themselves are causally efficacious is not the is-
37
sue; rather, it is whether facts that certain vectors have certain constituents are caus-
ally efficacious.
3.1.1  Superposition
Smolensky suggests the possibility that the constituent structure of tensor prod-
uct representations is analogous to the structure of such phenomena as complex 
waves.
2
 Thus, when a musical chord is played, the sound waves of its individual 
notes are in superposition. They are not independently tokened within the re-
sulting complex wave, in the sense that the waves in superposition are not like 
the separate strands of a string. Nevertheless, they each have their own causal 
consequences. For instance, they can be discriminated by the human ear.
Or consider the example of a single-trace recording of a chord on magnetic 
tape.
3
 The magnetic pattern on the tape, it might be claimed, is a superposition of 
the patterns that would have been present if the chord?s notes had been recorded 
separately. None of those patterns is actually present on the tape. But if the tape 
is played on suitable sound processing equipment, each individual note?s pattern 
can have its own causal consequences.
38
2
 Smolensky 1995c, pp. 241, 284n26.
3
 The example is from Horgan and Tienson (1996, p. 183, note 3). Horgan and Tienson do not ar-
gue that vector constituents are causally efficacious at the representational level. But a defender 
of Connectionism might be tempted to argue that they are, or could be, on the basis of such ex-
amples. Horgan and Tienson?s position will be examined below.
I find neither of these analogies persuasive. Let?s start with the recording 
case. As Fodor and McLaughlin have argued, the trouble with such cases is sim-
ply that counterfactual causes cannot have actual effects.
4
 The current question is 
whether the type of magnetic pattern under discussion has ?constituents,? of the 
specified sort, with independent causal powers. And the answer is clearly no. A 
magnetic pattern that would have been there in a counterfactual situation is not 
in fact there and so cannot have actual causal consequences.
Of course, the magnetic pattern that is in fact on the tape is a kind of encod-
ing of a chord. And the pattern can be decoded so as to more-or-less accurately 
reproduce the chord. So it might appear that some sort of constituent ?structure-
sensitive processing is going on. But the fact that the pattern can be decoded 
doesn?t show that it has causally efficacious, single-note encoding constituents. It 
only shows that it carries information about the chord?s structure. And this it can 
do, even if it has no such constituents at all. After all, in principle, each distinct 
chord type could be encoded by a different simple numerical symbol.
The sound wave case might be different from the magnetic-pattern case. In 
the magnetic-pattern case, the constituents are only ?counterfactually there.? If 
the same is true in the sound wave case, then the same response is called for: 
counterfactual causes cannot have actual effects. However, it might be thought 
39
4
 Fodor and McLaughlin 1995, pp. 214?215.
that in the case of sound waves, any constituent waves are somehow actually 
there, even though they are not separately tokened. And if they are actually 
there, then they can have actual effects. We?d then have the kind of case the pres-
ently envisioned defender of Connectionism wants: a clear example of non-
Classical, nontokened constituents with causal efficacy.
For example, I can imagine someone wanting to claim that in the case of a 
chord?s sound wave, the individual notes? waves could first severally come into 
being and then superimpose to form the chord?s waveform. If that is the case, 
then clearly each note?s wave pattern makes a causal contribution to the charac-
ter of the complex wave pattern, even though its individual character is lost in 
the superposition. And since it makes a causal contribution to the character of the 
complex pattern, it can have further causal consequences through that contribu-
tion. Moreover, even if a chord?s wave pattern is produced all at once, without its 
component waves having been produced independently, it still seems to be the 
case that each component wave?s pattern makes a causal contribution to the 
character of the chord?s wave pattern. Thus, it certainly appears that something 
can be actually present, in some sense, without being separately tokened, and 
that that is enough for it to be causally efficacious.
Clearly, one problem with the move under consideration is that sense needs 
to be made of the purported distinction between being actually present and be-
40
ing separately tokened. If to be tokened, ?separately? or otherwise, is something 
other than to have an instance actually present, then what is it? 
Furthermore, in the case of wave phenomena, there is in fact no pressure to 
distinguish between a wave?s being actually present and its being tokened.
The law of superposition can be stated as follows: The existence of one 
wave does not affect the existence or properties of another wave, even if 
they are in the same place at the same time. This is equivalent to the 
statement that waves add algebraically; that is, the displacement of the 
sum wave A + B is equal to the displacement due to wave A added to the 
displacement due to wave B at the same point and time. ? This clearly 
distinguishes waves from material things, no two of which can occupy the 
same place at the same time. Waves can pass through each other without 
affecting each other.
5
 
Given what we know about waves, and contrary to the envisioned view under 
discussion, component waves do not lose their individual character when in su-
perposition. So there is no reason to regard them as nontokened, without all of 
their defining properties intact. Of course, we might not be able to tell what the 
component waves of a complex sound wave are, just by looking at (say) the dis-
placement pattern due to the complex wave. In that sense, waves do lose their 
?individual character,? or appearance, when in superposition. But that?s an 
epistemological problem, not one about the nature of waves.
By now it should be clear that tensor product representations and complex 
waves are significantly disanalogous. Given what we know above waves, the 
41
5
 Berg and Stork 1995, p. 29.
component waves of a complex wave must be (separately) tokened in order for it 
to have the properties it has.
6
 That?s why each of its component waves can have 
its own causal consequences. However, the vector constituents of a tokened ten-
sor product representation need not themselves ever be tokened in order for it to 
have the properties it has. To put this another way, waves and vectors superim-
pose differently. A complex wave token is a result of physical interactions among 
its component wave tokens. A tensor product representation, on the other hand, is 
a result of computations that rely on mathematical relations among its vector con-
stituent types, regardless of whether or not those types are ever tokened. So the 
fact that waves in superposition can each be causally efficacious provides no rea-
son for thinking that nontokened vector constituents can be causally efficacious.
7
3.1.2  Criteria for Existence and Causal Efficacy
Matthews argues that, on Fodor?s own criteria for existence and causal efficacy, 
vector constituents appear both to exist and to have causal consequences.
8
 
42
6
 Contrast Horgan and Tienson (1996, p. 74): ?Sound waves, like all waves, superimpose; so in the 
chord none of the individual waves that went to make it up is tokened.?
7
 Fodor (1998) also argues that waves and vector constituents are significantly disanalogous. 
However, his discussion is misleading and confusing because of the way he construes vector con-
stituency: ?C is a derived [vector] constituent of vector V iff V (uniquely) encodes C* and C is a con-
stituent of C*. That is, the derived [vector] constituents of a vector V are the constituents tout court 
of the tree that V encodes? (p. 177). Construing vector constituency is this way makes it all too 
easy to argue that vector constituents are not like waves in superposition, since nonimplementa-
tional connectionist architectures don?t support Classical symbol structures. They do, however, 
support vectors, and vector constituents are vectors, after all.
8
 Matthews 1996, pp. 164?166.
Roughly, on Fodor?s view, a science is committed to the existence of those theo-
retical entities that figure essentially in its explanations and generalizations,
9
 and 
a scientific theory is committed to the causal efficacy of a property if the theory 
includes a causal law to the effect that something?s having that property (for 
example, a sail?s having the property of being an airfoil, or a bank of nodes? hav-
ing the property of instantiating a certain vector) is nomologically sufficient for the 
occurrence of an event of some specific kind (under appropriate conditions). 
More formally, a theory is committed to the causal efficacy of a property, F, if, 
according to the theory, an occurrence of an event that has F is nomologically suf-
ficient for the occurrence of an event that has a certain property, G.
10
 According to 
Matthews, vector constituents satisfy both criteria. For on Smolensky?s theory, 
decomposing tensor product representations into their constituents is essential to 
understanding and explaining the regularities in a network?s behavior.
11
 
Matthews is wrong if his claim is that Fodor is committed to the existence 
and causal efficacy of nontokened vector constituents. Note first that there is no 
problem at all for Fodor regarding the existence and causal efficacy of tokened
43
9
 Compare Fodor (1998, p. 123): ?What kinds of things a theorist says there are sets an upper 
bound on what taxonomy his explanations and generalizations are allowed to invoke. And what 
taxonomy his explanations and generalizations invoke sets a lower bound on what kinds of 
things the theorist is required to say that there are.?
10
 See Fodor 1990, chapter 5.
11
 See Smolensky 1995a, pp. 188?191.
vectors. Nor is there a problem for Fodor regarding the existence and causal effi-
cacy of vectors as types. For they can be tokened, and if they are, they can have 
causal consequences. The specific issue is whether a vector, as a nontokened vector 
constituent, can have causal consequences.
12
 
Now, it is certainly true that decomposing tensor product representations 
into their constituents is essential to understanding and explaining the regulari-
ties in the behavior of a Smolensky architecture, including any regularities re-
lated to systematicity. But the broad issue here is whether Smolensky?s explana-
tion of systematicity is a causal one, and Matthew?s objection just presupposes 
that it is. It is tendentious whether Smolensky?s theory requires that a nonto-
kened vector constituent?s possessing some specific property is nomologically 
sufficient for the occurrence of some specific kind of event, and Matthews pro-
vides no reason for thinking that this is so.
Still, Matthew?s presupposition could be right. So we need to look at how 
decomposing tensor product representations into their constituents is essential to 
understanding and explaining the regularities in the behavior of a Smolensky 
44
12
 The fact that tokened vector constituents are causally efficacious is thus beside the point (see 
also Section 3.1.4). Nothing prevents a Connectionist system from tokening both a representa-
tionally complex vector and one or more of its vector constituents. But whether or not a particular 
Connectionist system does that is not a matter of cognitive architecture (that is, the system?s do-
ing that wouldn?t be part of what makes it a Connectionist system). It could be made a matter of 
cognitive architecture, so that operating on a tokened vector requires decomposition into, and 
operations on, tokens of its vector constituents. But then the architecture wouldn?t be a com-
pletely Connectionist one. A good example of a system having such an architecture is Touretzky?s 
(1986) BoltzCONS.
architecture. Smolensky?s exposition of this, though not difficult to follow, takes a 
few pages.
13
 So rather than summing up the entire exposition in terms of general 
principles, I?ll provide a simple example.
Suppose we have a Smolensky architecture that computes a simple function, 
namely, the function whose value is the binary tree (y, x), for any binary tree (x, y) 
in its domain. The network, we?ll assume, computes this function in one step, by 
multiplying the vector Vi, which translates (x, y), by the connection weight ma-
trix W, yielding the vector Vo, which translates (y, x). So how does the network 
work? How does it compute the function in the manner it does?
First, as Smolensky shows, there are weight matrices W
extract left
 and
W
extract right
, such that
W
extract left
 ? Vi = Vx
and
W
extract right
 ? Vi = Vy,
where ??? is matrix multiplication, and Vx and Vy translate the trees x and y, re-
spectively. There are also weight matrices W
construct left
 and W
construct right
, such 
that
45
13
 Smolensky 1995c, pp. 245?249.
Vo = (W
construct left
 ? Vy) + (W
construct right
 ? Vx).
Thus, substituting W
extract left
 ? Vi for Vx and W
extract right
 ? Vi for Vy, we obtain
Vo = (W
construct left
 ? W
extract right
 ? Vi) + (W
construct right
 ? W
extract left
 ? Vi)
Vo = Vi ? ((W
construct left
 ? W
extract right
) + (W
construct right
 ? W
extract left
)).
Since the products and sums of weight matrices are themselves weight matrices, 
there is a weight matrix, W, such that
W = (W
construct left
 ? W
extract right
) + (W
construct right
 ? W
extract left
).
Hence,
Vo = Vi ? W.
This derivation shows how the network computes the function in question by 
means of a single vector transformation. It can do so because of the mathematical 
structure of W, Vi, Vo, Vx, and Vy and because of the fact that Vx and Vy trans-
late the appropriate trees.
Of course, Vx and Vy are the vector constituents of Vi and Vo. And they 
must be referred to in order to explain how the network works in this case. This 
is just one example of the fact that ?tensor product constituents play absolutely 
46
indispensable roles in the description and explanation of cognitive behavior in 
[Smolensky architectures].?
14
 
So is Matthews right? Is Fodor committed to the causal efficacy of nonto-
kened vector constituents? Clearly, not. The explanation relies on mathematical 
relationships as opposed to lawful relationships between events. Nothing about 
the explanation we?ve just gone through requires that Vx?s (or Vy?s) being a 
(nontokened) constituent of Vi (or Vo) is nomologically sufficient for the occur-
rence of anything. Nor is there anything about the explanation that requires that 
something?s being an instance of Vx (or Vy) is nomologically sufficient for the 
occurrence of anything, since the explanation simply does not require any vector 
constituent to be instanced.
Matthews also argues that Fodor and company?s rejection of the causal effi-
cacy of vector constituents is incompatible with Fodor?s view that causation oc-
curs at macroscopic levels of description, not only at more primitive levels.
Matthews claims that Fodor and colleagues? ?complaint against tensor product 
representations ? is that they don?t actually have constituent structure. They 
don?t have it, because ? the normal modes [vector constituents] into which the 
tensor product vectors are decomposed don?t ?correspond? to causal agents in the 
network.?
15
 And, Matthews claims, Fodor and his associates think that vector 
47
14
 Smolensky 1995c, p. 249.
15
 Matthews 1996, p. 165.
constituents are not causal agents because all the causal work in a Connectionist 
network is done at the level of individual units and connections. But if, as Fodor 
maintains, causation occurs at many levels, causation at the level of individual 
units and connections does not rule out causation at higher levels, in particular, 
at levels of representation.
I find this objection to be quite puzzling. Certainly Fodor wouldn?t deny that 
tokened vectors are causally efficacious. Fodor would surely agree that they can 
have causal consequences, even though there are causal processes operating at 
the level of individual units and connections. What Fodor does deny is that 
nontokened vectors can have causal consequences. His view on this matter has 
nothing to do with levels of causation; rather, it simply rests on the extremely 
plausible assumption that nonexistents can?t cause anything. If there aren?t any 
network unit activation patterns that token a certain vector, then there aren?t any 
causal effects due to activation patterns that token that vector.
Matthews seems to be presupposing, contrary to what we concluded earlier, 
that the vector constituents of tensor product representations do not lose their 
individual character in superposition, so that if a tensor product representation is 
tokened, so too must be its vector constituents.
16
 That is, he appears to be as-
48
16
 Smolensky at time writes as if he thinks this: ?The representation is distributed: since the vec-
tors realizing all the constituents in the structure are superimposed upon each other, each unit 
participates in the realization of many symbols? (1995c, p. 249).
suming that vector constituents are like wave components. On that assumption, 
if Fodor accepts the existence and causal efficacy of wave components, then he 
should do likewise regarding vector constituents. We?ve already rejected the idea 
that vector constituents are like wave components, but let?s see what its conse-
quences would be if it were true.
Suppose the vector constituents of a tensor product representation must be 
tokened whenever it is tokened. A consequence of this supposition is that tensor 
product representations would be Classical representations. For, first, vector con-
stituency would be a kind of co-tokening relation. Second, the structure of tensor 
product representations would be governed by a combinatorial syntax. Certain 
vectors would be of certain formal types and would physically combine, or con-
catenate, to form more complex vectors according to syntactic rules. The repre-
sentation forming processes would be sensitive to the causally efficacious syntactic 
properties of the tokened vectors on which they would operate.
More specifically, the following rules specify a perfectly good syntax, on the 
present supposition that the constituency relation for tensor product representa-
tions is concatenative.
1. There are two sets of atomic vectors, role vectors ({r
0
, r
1
}) and filler vec-
tors. ({A
1
, A
2
, A
3
, ?}).
2. For any atomic filler vector, A
i
, the vectors r
0
 ? A
i
 and r
1
 ? Ai are wffs.
49
3. If the vectors r
0
 ? V
i
 and r
1
 ? V
k
 are wffs, then the vector (r
0
 ? V
i
) +
(r
1
 ?. V
k
) is a wff.
4. If the vector V
i
 + V
k
 is a wff, then the vectors r
0
 ? (V
i
 + V
k
) and r
1
 ?
(V
i
 +. V
k
) are wffs.
5. There are no other wffs. 
Of course, the symbols ???, ?+?, ?(?, and ?)?, as they occur in these rules, are not 
part of the object language. They describe how the relevant vectors are concate-
nated, rather than designate any of the things that are concatenated.
Tensor product representations having concatenative constituency and a 
combinatorial syntax might be acceptable to some Connectionists, provided that 
they could still make a good case for the claim that cognitive processes would 
nonetheless be non-Classical. However, the suggested reason for thinking that 
vector constituency is a co-tokening relation?vector constituents do not lose 
their individual character in superposition?would seem to apply equally to 
weight matrices. As we?ve seen, weight matrices themselves are superpositions 
of other weight matrices. Like tensor product representations, weight matrices 
are sums and products of weight matrices. So if vector constituency is a co-
tokening relation, so too is matrix constituency.
Furthermore, in a Smolensky architecture, fundamental weight matrices en-
code steps of Classical algorithms for operating on vector constituents. There are 
weight matrices for extracting vector constituents and weight matrices for com-
50
bining vector constituents. However, if vector constituency and matrix constitu-
ency are co-tokening relations, then it looks very much like Smolensky architec-
tures are nothing more than massively parallel implementations of Classical ar-
chitectures. For all the steps of the relevant Classical rules, and all the relevant 
Classical constituents on which they operate, would be encoded, and the encod-
ing weight matrix constituents and vector constituents would be tokened and 
thus available to play causal roles. We?d end up with an implementation of a 
Classical system in a wave-like substrate.
By way of illustration, notice that the above derivation that explains how a 
Smolensky network can compute a function by means of a single vector trans-
formation outlines a Classical algorithm for computing that function solely by 
means of Classical operations on tokened vectors and their tokened constituents. 
A Classical machine could implement such an algorithm to compute, in a se-
quence of steps, what the network computes in one step. Also, the extraction and 
construction matrices (and matrices for adding vectors and matrices) encode 
steps of that algorithm. If we assume that all the Classical-constituent encoding 
vectors and algorithm-step encoding matrices must be actually tokened in the 
network, then its hard to see how the network is not merely an implementation, 
albeit a massively parallel one, of a Classical machine.
51
3.1.3  Vector Similarity
van Gelder
17
 has argued that representation processing in Connectionist net-
works is causally sensitive to the constituent structure of vectors. He attempts to 
derive the causal efficacy of vector constituents from both the causal efficacy of 
the tokened vectors which have them as constituents and the notion of distance 
in vector space. As he points out, vectors stand in nonsyntactic, internal-structure 
similarity relations. These similarities are (or correspond to) the distance relations 
among vectors in the vector space for the relevant system.
18
 Furthermore, sys-
tematically related vectors are more similar in this regard than non-
systematically related vectors: systematic variants are closer in vector space than 
non-systematic variants.
These similarities, van Gelder argues, are of causal significance. First, the 
behavior of a network causally depends on the precise activation values of its 
units. And the activation values of particular units instantiate vectors, which, of 
course, have locations in vector space. So the behavior of a network?the conse-
quences of its vector operations?causally depends on the vector space locations 
of its currently instantiated vectors. Second, the location of a tokened vector de-
pends upon its constituents and its constituent structure. Any two tokened vec-
52
17
 van Gelder 1990, pp. 379?380.
18
 For example, vectors with two elements can be classified as more or less similar according to 
their Cartesian-plane distance relations.
tors which differ in constituents or constituent structure will instantiate different 
vector types and so have different locations. Thus, vector operations must caus-
ally depend on the constituents and constituent structure of the relevant vectors.
van Gelder?s argument, however, is either invalid or relies upon tendentious 
assumptions. True, a Connectionist network that exhibits some degree of syste-
maticity will causally process vectors that have the same constituents and con-
stituent structure in similar ways. So such a system would be at least functionally 
sensitive to constituent structure. Now, as van Gelder acknowledges, the con-
stituents of tokened vectors are not explicitly available. So the system cannot be 
sensitive to similarity relations among vectors by directly detecting their con-
stituents. But, according to van Gelder, it can detect similarities with respect to 
distance in vector space. And the distance in vector space between two vectors 
depends on what constituents and constituent structures they have. But how is it 
supposed to follow that the constituents of a vector token have causal roles in 
operations on that token? 
First, in what sense could it be true that a network literally detects similari-
ties with respect to distance in vector space? Such similarities would seem no 
more explicit than the constituents of a complex vector. Of course, we can use 
similarity of location in a network?s vector space to describe some aspects of its 
behavior. But the network itself doesn?t make use of such descriptions. The only 
53
respect in which it seems true that a network ?detects? such similarities is that 
the system is functionally sensitive to them; that is, it can exhibit appropriate 
systematic behaviors. But whether such sensitivity requires that the constituents 
of a vector token have causal roles in operations on that token is precisely what?s 
at issue. (The issue of whether such sensitivity requires that vectors carry infor-
mation about their constituent structure is addressed below in Section 3.1.5.)
Second, for any given network that exhibits some sort of systematicity, it 
would appear to be an empirical question whether systematically related vectors 
are closer in vector space than non-systematically related vectors. Servan-
Schreiber et al.
19
 studied various simple recurrent networks trained to predict 
legal continuations of symbolic expressions having a simple grammar. The net-
works varied in number of units. For networks with a relatively small number of 
units, the encodings of similarly structured symbolic expressions had similar lo-
cations in vector space. However, this correspondence in similarity became 
weaker as the number of network units increased. This suggests that for net-
works with very large numbers of units, there might not be such a correspon-
dence at all.
Given that possibility, it becomes doubtful whether the systematicities ex-
hibited by Connectionist networks require, for their explanation, causal sensitiv-
54
19
 Servan-Schreiber et al. 1991. Servan-Schreiber et al.?s results are briefly discussed in Garson 
1997, pp. 350?351.
ity to similarity of location in vector space. For a network that exhibits a certain 
systematicity might encode similar structures with vectors that do not have a 
correspondingly similar location. 
Finally, in what sense does the location of a vector depend upon its constitu-
ents and its constituent structure? If the former does not causally depend upon 
the latter, then van Gelder?s argument does not go through. For the argument 
requires a bridge from the causal efficacy of tokened complex vectors to the 
causal efficacy of their nontokened constituents. And van Gelder cannot just as-
sume that the location of a tokened vector is causally dependent on its nonto-
kened constituents, for that assumption would presuppose that such constituents 
are causally efficacious, and that is at issue. Lastly, there is no reason for thinking 
that a tokened vector?s location is causally dependent on it constituents. For a 
tokened vector would have its location (that is, be an instance of a specific vector 
type) regardless of its causal history. 
3.1.4  Vector Constituents as Causal Precursors
Some defenders of Connectionism
20
 have argued that nonconcatenative constitu-
ency is compatible with the adoption of the architectural (representational-level) 
requirement that a vector?s constituents must have played a causal role in the 
eventual production of that vector. Further, that requirement would be enough to 
55
20
 See Hadley 1997, Butler 1991, and van Gelder 1991.
guarantee the causal efficacy of those constituents, even when they are not cur-
rently being tokened. Causation is transitive, so if there are causal chains of 
events from the tokenings of a tokened vector?s constituents to the tokening of 
that vector, then the tokenings of those constituents will play causal roles in any 
operations on that vector.
On this view, the constituency relation for complex vectors remains noncon-
catenative. It is not metaphysically necessary that a complex vector is tokened only 
if its constituents have also been tokened. However, this is nomologically neces-
sary, given the architectural properties of the sort of networks envisioned.
There is a serious problem with the view under consideration: the proposed 
architectural property would add nothing to Connectionist explanations of sys-
tematicity. It should be clear from Smolensky?s account of systematicity that a 
Connectionist system which exhibits certain systematicities with respect to vari-
ous complex vector representations would exhibit those systematicities regard-
less of whether or not the constituents of the relevant tokened vectors have ever 
themselves been tokened. In particular, this is true of Connectionist systems 
having the architectural property that a vector?s constituents (nomologically) 
must have played a causal role in the eventual production of that vector. To see 
this, note that, in such a system, it is nonetheless possible that a complex vector 
be (fortuitously) tokened without any of its constituents ever having been to-
56
kened. And supposing that the system exhibits systematicity, it will be nonethe-
less be the case that, if it is capable of (fortuitously) tokening such a vector, then it 
will be capable of tokening its systematic variants. An explanation of the sys-
tem?s systematic behavior, in this sort of case, couldn?t appeal to the causal effi-
cacy of the appropriate tokened constituents, since, by assumption, there never 
were any. But if there is an explanation of the systematic behavior of the network 
in this sort of case, it should apply just as well to cases in which the constituents 
of the relevant complex vectors have been tokened. So whether or not there have 
been tokenings of those constituents shouldn?t matter; they would add nothing 
to the explanation.
More generally, what enables Connectionist representational processes to be 
constituent-structure sensitive is that constituent structure is vector encoded. The 
only way in which processes that operate on syntactically simple representations 
can be sensitive to their constituent structure is to have that structure encoded in 
the representations. Constituent-structure sensitivity, then, needs to be explained 
in terms of properties of the encodings. But if the properties of the encodings (to-
gether with the processing mechanisms) do the explanatory work, then there is 
no need to appeal further to tokenings of constituents of the encodings.
To put this another way, Connectionist explanations of systematicity turn on 
the mathematical properties of vectors in relation to a network?s (causal) vector 
57
operations. But tokened vectors have their mathematical properties independ-
ently of their etiology. Rather, they are inherited from the vector types they in-
stantiate. So they have their mathematical properties independently of tokenings 
of any of their constituents. Of course, a true causal explanation of the tokening 
of a particular complex vector might advert to tokenings of its constituents. But 
an explanation of the mathematical properties of a particular vector doesn?t re-
quire a causal explanation of its tokening. Indeed, a vector has its mathematical 
properties regardless of whether it is ever tokened.
In sum, the proposed architectural requirement that a tokened vector?s con-
stituents (nomologically) must have played a causal role in the eventual genera-
tion of that vector would add nothing to Connectionist explanations of systema-
ticity. Its only consequence would be to give some causal roles to tokenings of the 
relevant vector constituents. And that alone is not enough to make Connectionist 
explanations of systematicity reliant upon those causal roles.
3.1.5  Causal Efficacy of Information about Constituents
There is a further issue that needs to be addressed before we may confidently 
conclude that Connectionist accounts of systematicity which appeal to noncon-
catenative distributed representations are not causal explanations. Horgan and 
Tienson appear to concede that nontokened vector constituents themselves do 
not have causal consequences. They argue, however, that the fact that a particular 
58
tensor product representation has a certain vector constituent can play a causal 
role in Connectionist architectures.
The question is not whether constituents can play a causal role. The ques-
tion is whether the fact that a representation has a particular constituent 
can play a causal role. And that fact can play a causal role if the represen-
tation carries the information that it has that constituent.
21
 
Furthermore, they argue that vector representations which encode symbolic 
structures do in fact carry the information that they have the constituents they 
do. So if Horgan and Tienson are right, nontokened vector constituents them-
selves need not be causally efficacious in order for a Connectionist explanation of 
systematicity to be a causal one.
I think that Horgan and Tienson?s attempt to refocus the issue does nothing 
to further the Connectionist?s cause. Let?s first examine why they think that a 
complex vector representation carries the information that it has a particular con-
stituent. As far as I can see, their only reason for thinking this is that a Connec-
tionist architecture can perform what they consider to be constituent-sensitive 
operations.
22
 For example, as Smolensky has shown, networks can process tensor 
product representations so as to yield their systematic variants. How is this pos-
59
21
 Horgan and Tienson 1996, p. 79.
22
 See Horgan and Tienson 1996, pp. 80, 184n6.
sible if tensor product representations don?t carry the information that they have 
certain constituents?
We may grant that some Connectionist architectures can compute the same 
functions as certain Classical architectures. The issue is whether the explanation 
of such facts must appeal to processes that are causally sensitive to the constitu-
ent structure of vector representations. Horgan and Tienson appear just to as-
sume that this is so. That is, they appear to just assume that the information that 
a representation has a particular constituent must play a causal role in mental 
processing. They need to provide an explanation of how Connectionist systems 
compute the functions they do, where that explanation adverts to causal roles for 
such information.
Horgan and Tienson do say that ?how tensor product representations carry 
such information is no miracle; it is explainable mathematically.?
23
 But we?ve al-
ready seen the form of such mathematical explanations, in our discussion of how 
decomposing tensor product representations into their constituents is essential to 
understanding and explaining the regularities in the behavior of a Smolensky 
architecture (? 3.1.2). Such explanations do not require that information about 
vector constituents is causally efficacious. For instance, nothing about the sample 
explanation presented earlier requires that there are causal consequences of the 
60
23
 Horgan and Tienson 1996, p. 80.
fact that the vector Vi (or Vo) has Vx (or Vy) as a constituent. In fact, such expla-
nations seem to show how vector operations can be ?constituent sensitive? with-
out being causally sensitive to information about constituents.
24
 
I should emphasize that I?m not attempting to deny that complex vectors en-
code information about their constituents. We need to be careful so as to not con-
fuse the idea of ?carrying? information with the idea of ?encoding? information. 
Tensor product representations certainly encode their constituent structure. What 
I deny is that they carry information about their constituents in such a way that 
that information could play a causal role in operations performed on those repre-
sentations. Rather, such information, I think, is used only by us in designing 
Connectionist networks or in understanding how they work. But so far I?ve ar-
gued only that there is no good reason to think that Connectionist architectures 
causally use such information. Is there a positive argument for my claim that 
they don?t? I think there is. I?ll state my case in terms of Smolensky architectures.
61
24
 Hadley suggests, but does not insist upon, a line of argument that appears to be either a version 
of Horgan and Tienson?s view or a version of the view, envisaged above, that vector constituents 
are somehow actually there, even though they are not separately tokened. He writes, ?informa-
tion that is recoverable from complex structures, even when background mechanisms must be 
assumed, may be regarded as implicit in a special sense. One could argue further that information 
that is implicit in this sense in not merely imaginary, because the complex structures in question 
must possess specific properties that reflect the derivable information. In particular, Smolensky?s 
tensor-product representations possess special properties that reveal the identity of their (pur-
ported) ?imaginary? atomic constituents. Thus, some trace of the atomic constituents is present 
even in the complex representations? (1997, p. 148; italics in original).
But the argument applies as well to any Connectionist system for which the rep-
resentational constituency relation is nonconcatenative.
Assume that there is a tensor product representation, R, that has a vector, C, 
as a constituent. Also, let?s suppose that the fact that R has C as a constituent 
plays a causal role in the operations which a Smolensky network performs on R. 
We want to explain how that fact is causally efficacious. An important question 
to initially ask is, What is the fact that R has C? What makes it true that R has C? 
Well, given the notion of vector constituency, the answer is the fact that R 
uniquely translates a constituent structure tree, T, C uniquely translates the tree 
T*, and T* is a Classical constituent of T. According to Horgan and Tienson, that 
fact can play a causal role if R carries the information that that fact obtains. 
So how could R carry that information? First, notice that the information is 
about properties of R that are nonlocal and nonphysical (radically physically het-
erogeneous). Nonlocal, because (1) trees are not tokened, and do not play causal 
roles, in Smolensky architectures, and (2) the tokening of R does not require the 
tokening of C. Nonphysical, because translation relations (and, thereby, vector 
constituency relations) are nonphysical. In Smolensky networks, there are no 
physical interactions between trees and the vectors that translate them. More-
over, since vector constituency is not a co-tokening relation, the property of hav-
ing C as a constituent is physically heterogeneous. It is not the case that if two 
62
tensor product representations have C as a constituent, then they must thereby 
have a specific physical property in common. Having C as a constituent is simply 
not a physical property, in the required sense.
Second, it is quite plausible, to say the least, that the only way in which a 
computational system, of any kind, could be systematically sensitive to nonlocal, 
nonphysical properties is by representing them.
25
 If a computational device is to 
function properly, its mechanical, information manipulating processes need to be 
systematically sensitive to various local and physical properties of its information 
bearing structures. It could not be expected to function properly if its operations 
have to be sensitive to whether or not the representations on which they operate 
possess certain nonlocal or nonphysical properties. For example, we can?t expect 
a computational system to work if its processes have to detect whether or not the 
representations on which they operate are, say, within 200 yd of a school build-
ing, or are numerals, rather than some other sort of symbol.
So, since having C as a constituent is a nonlocal, nonphysical property of R, 
if R is to effectively carry the information that it has C, if that information is to be 
reliably and mechanically detectable, then R must somehow represent the fact that 
it has C by means of some of its local, physical properties. Let?s call the feature or 
63
25
 See Rey 1997, ? 4.3, and Rey 2003.
features of R that instantiate the relevant physical properties the bearer of the 
information that R has C. 
What, then, is the nature of the bearer of that information? In a Smolensky 
architecture, representations are vectors. So the bearer of the information that R 
has C must be a vector, V, that represents the fact that R has C. 
Well, then, which vector is V? V represents the fact that R has C, and so 
must have constituents, R* and C*, which refer to R and C, respectively. So R 
could be V only if it has those same constituents. That is, R could be V only if its 
correct interpretation is ?R has C?. (Remember, it?s provable just what are the con-
stituents of a given tensor product representation.) But then our explanation of 
how R carries the information that it has C as a constituent would have the con-
sequence that it could do so only if that?s what it meant in the first place. Natu-
rally, what we need is an explanation of how R carries the information that it has 
C, regardless of what is the correct interpretation of R. Of course, when we 
started out, R was supposed to be the vector that carried the information in 
question. But let?s see if some other vector could do the job.
Could V be a constituent, or some other nontokened vector component, of 
R? That won?t do, since V has to instantiate those local and physical properties of 
R which bear the information that R has C. That is something V could do only if 
64
it is tokened along with R. Remember, we?re looking for a causal explanation. So 
whatever carries the information that R has C has to really be there.
The only remaining option is that V is a subvector of R. That is, if R is the 
vector <1, 2, 3, 4>, perhaps V is the vector <1, 2>. However, if that?s the case, then 
R has a representational constituent, V, that must be tokened whenever it is. This 
option, then, gives up nonconcatenative constituency. It also has another prob-
lem. For V, too, would have to carry information about its constituents via its 
subvectors (which in turn would have to have their own constituents, in that 
they would attribute properties to V). But vectors are finite. So eventually there 
would have to be a vector, V*, that either did not carry information about its con-
stituents (since it wouldn?t have a subvector to do the job) or did carry informa-
tion about its constituents by some other means. If the former, then our explana-
tion would have to allow that some vectors don?t carry information about their 
constituents, in which case one would wonder why any would have to. If the 
latter, then we?d need another explanation of how vectors like V* carry informa-
tion about their constituents.
We thus have a kind of reductio of the supposition that the information that 
R has C is causally employed by a Smolensky network. For if that supposition 
were true, then there should be a causal explanation of it. There should be an ex-
planation of how that information plays a causal role. But it appears that such an 
65
explanation is not to be had. Therefore, that information is not causally used by 
such systems.
A defender of Horgan and Tienson might point out that, contrary to what 
my argument appears to assume, on their view, R alone does not carry the infor-
mation that it has C. Rather, it carries that information relative to the entire sys-
tem: 
In classical systems ? representations ? have constituents only in the 
context of the whole system. The structure of the system as a whole de-
termines that representations have a causal role that is sensitive to their 
constituent structure. And it is only by virtue of their having such a causal 
role that it makes sense to say that certain physical items are constituents. 
In connectionist systems ? the information that representation R has con-
stituent C is [sic] carried by the representation R?relative to the whole 
system, even though constituent C is not physically present.
26
 
My argument, however, does not assume that R carries the relevant information 
independently of the entire system. It?s fine with me if R carries that information 
only relative to the system as a whole. It would still remain that case that if R is 
to carry the information, it must somehow represent it by means of some of its 
physical properties, regardless of whether or not those properties represent that 
information in and of themselves. What Horgan and Tienson need is a causal ex-
planation of how R could carry (relative to the system) the information that it has 
C, and I?ve argued that one cannot be had.
66
26
 Horgan and Tienson 1996, pp. 79?80.
Perhaps Horgan and Tienson are drawing attention to the distinction be-
tween explicit information and implicit information. In a Classical system, a rep-
resentation like Fa explicitly means whatever it means, but only implicitly carries 
the information that it has F as a constituent. So maybe my argument errs by 
construing the information that R has C as information which is explicitly, rather 
than implicitly, carried by R. But this sort of response to my argument would 
miss its point. Classical representations can implicitly carry information about 
their constituents because those constituents are right there, instantiating all their 
local and physical properties. In contrast, representations for which the constitu-
ency relation is nonconcatenative can?t implicitly carry information about their 
constituents in that way, since their constituents aren?t there. So if they do carry 
that information, they have to do it in some other way. Since the information is 
about nonlocal, nonphysical properties of the representations, it must be carried 
by being represented.
Still, one might wonder, Isn?t the fact that nonconcatenative representations 
encode their constituent structure enough to show that they implicitly carry in-
formation about what constituents they have? Well, no. That just takes us back to 
where we started. We have an explanation of how systems which employ non-
concatenative representations exhibit systematicity. That explanation appeals to 
67
properties of the encodings. The issue then arises whether the explanation is a 
causal one. Hence, this chapter.
Again, my argument applies to any Connectionist system for which the rep-
resentational constituency relation is nonconcatenative. We need only replace the 
specific version of vector constituency that applies in the case of Smolensky ar-
chitectures with a more general version: a vector, V
n
, is a vector constituent of 
another vector, V
m
, only if V
n
 uniquely encodes a symbolic structure, S, V
m
 
uniquely encodes another symbolic structure, S*, and S is a (concatenative) con-
stituent of S*. 
But one might question whether my argument goes through when applied 
to Connectionist systems which employ the architectural requirement that a to-
kened vector?s constituents (nomologically) must have played a causal role in the 
eventual generation of that vector. For it might not be clear that, in such systems, 
the information that a particular complex vector has certain constituents is about 
properties of that vector which are nonlocal and nonphysical. 
However, having a certain constituent as a causal precursor doesn?t make 
having that constituent a local, physical property. It seems easy enough to imag-
ine two tokenings of the same vector having the same nomologically possible ef-
fects, where one has a tokening of one of its constituents as a causal precursor but 
the other does not. At the least, I find it hard to imagine a non-question begging 
68
way of arguing that having a certain constituent as a causal precursor makes 
having that constituent a local, physical property.
Moreover, having a certain constituent as a causal precursor isn?t enough to 
make having that constituent a physical property. For the property of having a 
certain constituent as a causal precursor is itself physically realizable in a very 
wide variety of ways.
Based on the arguments presented in this chapter, I think we may confi-
dently conclude that Connectionist explanations of systematicity are not causal 
explanations. Of course, this conclusion does not in itself pose an immediate dif-
ficulty for Connectionism unless the only sort of acceptable explanation in cog-
nitive science is causal explanation. However, the next two chapters do present 
more direct problems for Connectionism; and one of those problems arises once 
it is seen that Connectionist explanations of systematicity are not causal ones.
69
Chapter 4
Acausal Explanation?
Defenders of Smolensky could concede that his explanation of systematicity is 
not a causal one in that it does not advert to causal laws governing the constitu-
ents of systematically related vectors. For they could deny that the only accept-
able form of explanation in cognitive science is causal explanation. In particular, 
they could argue that Smolensky?s explanation is a good one, even though it 
does not take the form of a causal explanation.
If Smolensky?s explanation is not a causal one, then what kind of explana-
tion is it? Well, presumably it is supposed to work in the following way. We un-
derstand the systematicity of Classical systems of representation, such as consti-
tuent-structure trees. The tree?vector algorithms show that there is a one-to-one 
mapping between trees and the vectors which translate them. Furthermore, sys-
tems with Smolensky architectures are designed so that vector processing is car-
ried out in a way that maintains that one-to-one mapping.
One difficulty with this sort of explanation, according to Fodor, is that ex-
planatory adequacy is not in general preserved under one-to-one correspond- 
70
ence.
1
 So that there is a one-to-one correspondence between trees and tensor 
product vectors does not show that vector constituents and the notion of vector 
constituency are doing any explanatory work. In fact, Fodor maintains, the ex-
planatory burden seems to be carried exclusively by Classical trees and the no-
tion of concatenative constituency. On his view, all that tree encoding/deriving 
algorithms and the notion of vector constituency do for Smolensky is allow him 
to completely parasitize the Classical explanation without adding anything of 
substance to it. If his explanation appears to be adequate, that is only because it is 
merely the Classical explanation in disguise.
I think Fodor?s conclusion might be too strong. I?m not sure that it is, and I 
won?t attempt to convincingly establish that it is. Suffice it to say that the expla-
nation presented in Section 3.1.2 is of the same kind as Smolensky?s explanation 
of systematicity; and the former appears to be an adequate and illuminating ex-
planation. Moreover, it is hard to see how the burden of that explanation is sup-
posed to be carried exclusively by Classical representations and the notion of 
concatenative constituency.
Rather than take a definitive stand on whether Fodor is right, I want to re-
cast the issue somewhat. I want to claim that, insofar as Smolensky?s explanation 
is a good one, it fails to explain what it sets out to explain. (Perhaps, in the end, 
71
1
 Fodor 1998, p. 120.
my objection amounts to the same point as Fodor?s, viewed in a different light.) 
The best way I can think of to clarify all this is just to get on with it.
4.1  An Adequate Explanation, but Not of Systematicity
One way to show that an explanatory strategy is a good one is to provide a case 
in which it clearly succeeds. And one might appeal to Smolensky?s ?Visa Box? 
example, as he himself does, in order to show that some ?acausal? explanations, 
as he calls them, are in fact good ones.
2
 I?ll agree with Smolensky that the expla-
nation of how the Visa Box works is a good one, but I?ll claim that the explana-
tory strategy is inadequate with respect to systematicity.
The Visa Box is a device that assists in restaurant bill tip calculation, when 
the bill is not itemized. It?s inputs are the bill subtotal (food total, plus tax), the 
local food tax percentage, and the chosen tip percentage. It?s output is the bill 
total (food total, plus tax, plus tip). One would naturally surmise that the device 
works by sequencing through the following calculations, or some very similar to 
them:
72
2
 Smolensky 1995c, pp. 244?245.
$FOOD = SUBTOTAL/(1 + x/100)
3
$TIP = $FOOD (p/100)
TOTAL = SUBTOTAL + $TIP,
where x and p are the tax and tip percentages, respectively. Thus, it is natural to 
suppose that the Visa Box employs $FOOD representations in its calculations. But, 
in fact, the device works by calculating a number, w, and then multiplying w by 
the subtotal to obtain its output:
w = (100 + x + p)/(100 + p)
TOTAL = w (SUBTOTAL)
How does the Visa Box, without tokening $FOOD representations, compute the 
correct TOTAL for a given set of inputs? Here?s a derivation that provides an ex-
planation:
TOTAL = w (SUBTOTAL)
= [(100 + x + p)/(100 + x)] SUBTOTAL????????????????????????????????????????????????????????????????Substitution
= [(1/100) (100 + x + p)/(1/100) (100 + x)] SUBTOTAL?????????????????Multiplication by
?????????????????????????????????????????????????????????????????????????????????????????????????????????(1/100)/(1/100)
73
3
 This equation is derivable as follows: 
SUBTOTAL = $FOOD + $FOOD(x/100)
SUBTOTAL = $FOOD(1 + x/100)
SUBTOTAL/$FOOD = (1 + x/100)
1/$FOOD = (1 + x/100)/SUBTOTAL
$FOOD = SUBTOTAL/(1 + x/100)
= [(1 + x/100 + p/100)/(1 + x/100)] SUBTOTAL???????????????????????????????Distribution
= (1 + x/100 + p/100) [SUBTOTAL/(1 + x/100)]???????????????????????????????Association:
????????????????????????????????????????????????????????????????????????????????????????????????????????(m/n)s = m(s/n)
= (1 + x/100 + p/100) $FOOD?????????????????????????????????????????????????????????????Substitution
= ($FOOD + $FOOD (x/100) + $FOOD (p/100)??????????????????????????????????????Distribution
= $FOOD + $TAX + $TIP??????????????????????????????????????????????????????????????????????????Substitution
= SUBTOTAL + $TIP
This derivation provides an explanation of how the Visa Box works without its 
employing $FOOD representations. We see how the algorithm it uses and one we 
would naturally expect it to use each compute the same function. The explana-
tion appears perfectly adequate. So, although the Visa Box does not employ 
$FOOD representations, an adequate explanation of how it works may nonethe-
less appeal to $FOOD representations.
On Smolensky?s view, this should not be surprising. For, first, the content 
[food price] is a constituent of each of the contents expressed by x and p, since x 
expresses the content [local tax percentage on food price] and p expresses the 
content [chosen tip percentage on the food price] for the relevant bill. Second, 
[food price] is also a constituent of the content expressed by TOTAL, since
TOTAL = SUBTOTAL = $FOOD + $TAX.
74
It is useful, then, for [food price] to enter into both the semantic characterization 
of the function the Visa Box computes and the explanation of how the device 
works. Of course, it is given that the Visa Box does not use $FOOD representa-
tions. But the fact that we may appeal to [food price] in order to explain how the 
device operates shows that an adequate explanation may (perhaps must) use 
representations that express that content. Regarding the particular explanation 
under consideration, the appeal to the representation $FOOD is explanatorily ade-
quate, despite that fact that the explanation does not posit a causal mechanism 
that involves the tokening of $FOOD representations. Thus, we may say that 
$FOOD, together with the above derivation, is acausally explanatory?or, perhaps 
more clearly, mathematically explanatory.
Let?s relate the above explanation to Smolensky?s explanation of systematic-
ity. Representations containing $FOOD are to be taken as analogous to constituent 
structure trees. And the equality
[(100 + x + p)/(100 + x)] SUBTOTAL = $FOOD + $TAX + $TIP
is to be taken as analogous to the bi-unique derivation relations between con-
stituent structure trees and tensor product representations. The conclusion is that 
Classical trees, together with Smolensky?s tree?vector algorithms, provide the 
basis of an adequate acausal explanation of systematicity, just as $FOOD and the 
75
aforementioned equality provide the basis of an adequate acausal explanation of 
the operations of the Visa Box.
So what?s the problem? The problem is that, insofar as the Visa Box explana-
tion is adequate, it is not really an acausal one. Moreover, insofar as it is an 
acausal explanation, it does not explain how the Visa Box operates. In order to 
see why this is the case, it is necessary to distinguish two senses in which an ex-
planation could be an explanation of ?how something works.?
The question, ?How does it work?? is quite vague. It could mean, to mention 
just two possibilities, (1) How did its inventor get it to work in the way it does?, 
or (2) What operations does it perform? Clearly these are two very different 
questions, requiring two very different kinds of answers. Now, the Visa Box ex-
planation would be a natural and adequate explanation of how someone was 
able to make the device work as it does.
4
 But then it would not really be an 
acausal explanation. Food-price representations and representations of the above 
equality would be attributed causal roles, since the inventor of the device would 
have made use of them in designing it.
76
4
 Compare Smolensky (1995c, p. 245), who ?marvels? at the ?ingenuity? of the person who made 
the device.
Similarly, consider a ?Swamp Visa Box? (I assume you are familiar with 
Swampman).
5
 We might learn what it does with numeric inputs. And once we 
learn that, we might also discover that we can use the object to calculate tips, re-
alizing that
[(100 + x + p)/(100 + x)] SUBTOTAL = $FOOD + $TAX + $TIP.
But the explanation of that discovery would also attribute causal roles to food-
price representations and that equality, since they would have been employed by 
those who discovered that the object could be used for said purpose. In the ab-
sence of any such representations, we could never discover that we could use the 
object as a tip calculator.
In contrast, it is not the case that either $FOOD or the equality are useful for 
explaining how the Visa Box or Swamp Visa Box operates on its inputs, or for ex-
plaining the nuts and bolts of its operation. For neither $FOOD nor the equality 
have causal roles in the objects themselves.
So what we have in the case of the Visa Box is an (implicit) adequate causal 
explanation of why it can be used to calculate tips, along with an adequate 
77
5
 For those unfamiliar with Swampman thought experiments, the Swamp Visa box is a molecule-
for-molecule duplicate of the genuine Visa box, but it is not the creation of a minded being. It 
?popped? into existence (appropriately enough, somewhere in a remote swamp with an aura of 
deep mystery) as the result of, say, astronomically improbable quantum events. The idea is that, 
since the Swamp Visa box is not an artifact, created for a particular purpose, it cannot be charac-
terized in intentional terms (in particular, as computing over representations) unless it is used as 
a device having semantically interpretable states.
representational-level, causal explanation of the operations it performs on its 
numeric inputs. But what we don?t seem to have is an explanation that appeals to 
$FOOD representations but does not require that they have causal roles. It might 
appear to someone that we do have such an explanation only if he or she fails to 
keep distinct the different senses of ?How does it work??
Now, since the Visa Box explanation and Smolensky?s explanation of syste-
maticity are of the same type, what we have regarding the latter is an (implicit) 
adequate causal explanation of how a Connectionist network could be designed 
to compute the same functions as certain Classical architectures. We also have, as 
part of that explanation, an adequate representational-level, causal explanation of 
the operations such a network could perform on tokened activity patterns. But 
what we don?t seem to have is an explanation that appeals to Classical represen-
tations or tree?vector algorithms but does not require that they have causal roles. In 
short, we don?t seem to have an acausal explanation of systematicity. 
Perhaps someone might think that my argument relies too much on the Visa 
box?s being an invention, a tool, without intrinsic content. But my claim is that 
the sort of acausal explanation at issue works only for such devices. My point can 
be illustrated by means of the following hypothetical example. Suppose there is 
an organism whose systematic behavior can be explained by attributing to it a 
Classical architecture. We eventually discover, however, that it does not have a 
78
Classical architecture; rather, it has a Connectionist architecture. Thus, the Classi-
cal explanation of the relevant behaviors is simply false. Nonetheless, we may 
suppose that nature, not us, designed the organism, so that the actual contents of 
its representational states are independent of our purposes and of how we think 
about those states. However, nature presumably doesn?t have available any sys-
tem of representation that it can use for purposes of designing Connectionist 
minds. How, then, could nature have designed the organism to work as it does? 
How could nature have bestowed upon it systematically related cognitive ca-
pacities? The answer to that question, I submit, would remain a mystery. In par-
ticular, what would not be forthcoming is a Connectionist explanation that ap-
peals to Classical representations and requires that they have causal roles.
Look at the matter from a slightly different angle. Clearly, the organism?s 
vector representations wouldn?t encode, in the sense of having been translated 
from, Classical representations and their structures. But presumably they would 
encode their own semantic structures. So we?d be able to see how the organism 
appears to have a Classical architecture without actually having one. However, 
what we wouldn?t be able to see is how its vectors could have come to encode 
their own semantic structures in the first place. Where could such structures have 
been instantiated? Not in the organism?s architecture, since Connectionist net-
works don?t support structured vehicles of content. Nor could they have been 
79
instantiated by anything in the organism?s environment?again, nature pre-
sumably doesn?t have available any system of representation that it can use for 
purposes of designing Connectionist minds. The only remaining alternative is 
that they could have been instantiated in minds like our own. So it would be 
clear how we might have been able to design such an organism. What would be 
not at all clear is how nature could have.
In the previous chapter, I concluded that Smolensky?s explanation of syste-
maticity is not a causal one. A few words need to be said about how that conclu-
sion relates to the present argument. His explanation is not causal in that (1) it 
explicitly rejects causal roles for Classical representations and tree?vector algo-
rithms, and (2) neither vectors, as nontokened vector consituents, nor informa-
tion that certain vectors have particular constituents have causal roles. On the 
other hand, the explanation is causal in the sense that it is an (implicit) causal ex-
planation of how a Connectionist network could be designed to compute the 
same functions as certain Classical architectures do, including functions from 
representations to their systematic variants. As such, but only as such, it is an 
adequate explanation. To put the point a bit cursorily, the explanation is an ade-
quate one only if it attributes causal roles to Classical representations; but it 
doesn?t, so it isn?t. What we end up with, then, is neither an adequate causal ex-
80
planation of systematicity nor an adequate acausal explanation of systematicity. 
In short, we don?t have an adequate explanation of systematicity at all.
4.2  Moral of the Argument
The argument of the preceding section applies to the explanatory adequacy of 
any Connectionist account for which the constituency relation is nonconcatena-
tive, and hence it applies to any Connectionist account that is an alternative to 
the Classical picture. The problem systematicity poses for Connectionism is to 
show how Connectionist-network operations defined over syntactically simple 
representations nomologically must be sensitive to representational-constituent 
structure. Since vectors are syntactically simple, constituent structure must be en-
coded. Moreover, encoding of constituent structure requires computation of a 
function from constituent structures to encodings.
Now, the representational structures encoded are not the formal, configura-
tional structures of representations supported by the relevant Connectionist net-
works; such networks don?t support representations with concatenative constitu-
ency. So they must be structures of representations instantiated outside such 
networks. But, as I?ve argued, a Connectionist (purportedly adequate) explana-
tion of systematicity that adverts to such representations could at best provide an 
adequate explanation of how we could design a network to compute the same 
functions as certain Classical architectures do, including functions from repre-
81
sentations to their systematic variants. And one that does not advert to such rep-
resentations could at best provide an adequate causal explanation of the opera-
tions a network could perform on tokened activity patterns. But we don?t get a 
Connectionist explanation of systematicity per se.
In this chapter, I?ve argued that a Connectionist explanation of systematicity 
would not be an adequate explanation of systematicity. What I show next is that if 
we nevertheless construe Connectionist explanations as explanations of syste-
maticity, the result, not surprisingly, is that they become unprincipled in a rather 
serious way.
82
Chapter 5
Structure Sensitivity and Principled Explanation
Another difficulty for Connectionist explanations of systematicity is that they 
appear to be unprincipled, arbitrary, or ad hoc in a rather serious way.
1
 Cummins 
et al. (who defend Connectionist explanations of systematicity) introduce this 
objection as the claim ?that classical representational schemes predict systematic-
ity, whereas connectionist schemes at best accommodate it.?
2
 Our first task with 
regard to this objection is to see just what it amounts to.
83
1
 The source of this objection is Fodor and McLaughlin (1995, p. 216). The objection is part of (or 
perhaps just is) their argument that Smolensky architectures don?t provide an adequate basis for 
explaining the lawfulness of systematicity. I don?t address the lawfulness issue independently of 
the ?principledness? issue. For I regard a principled explanation as a necessary condition of an 
adequate explanation of lawfulness. Cummins et al. (2001) discuss the principledness and law-
fulness issues separately (the latter as somewhat of an afterthought), but they provide no reason 
at all for thinking that the two issues are separable. In any event, I?m willing to grant that if a 
Connectionist explanation of systematicity is principled, then it is a principled explanation of the 
lawfulness of systematicity.
2
 Cummins et al. 2001, p.172. See also Cummins 1996, p. 605. Cummins et al. (2001) explicate this 
objection in terms of how a Classical parser would parse sentences as opposed to how a Connec-
tionist parser would parse sentences. I instead present the objection in terms of how Classical and 
Connectionist cognitive systems are supposed to be able to think systematically related thoughts.
5.1  Prediction versus Accommodation of Systematicity
Aizawa provides two cases from the history of science which illustrate well the 
nature of the objection under consideration.
3
 One case concerns Darwin?s and the 
Creationist?s explanations of why the close resemblance between blind subterra-
nean forms of organisms and their sighted, surface counterparts is tied to their 
geographical location. The other case concerns the Copernican and Ptolemaic ex-
planations of the fact that Mercury and Venus, unlike the other planets, are never 
found in opposition to the Sun. 
Darwin
4
 notes that the blind forms of insects that live in limestone caverns in 
the United States resemble their sighted counterparts on the surface, and that the 
same is true regarding blind and sighted forms in Europe. However, the relevant 
European and American blind insects don?t bear a close resemblance to each 
other, despite the close similarity of their environments. On the evolutionary ac-
count, this is easily explained by the hypothesis that the blind forms and sighted 
forms, in their respective regions, evolved by natural selection from a common 
ancestor. For if that is true, the observed similarities and dissimilarities would be 
84
3
 Aizawa 1997. Aizawa argues that both the Classical and the Connectionist explanations of sys-
tematicity are unprincipled. However, his argument works only if the kinds of representational-
level processes required by each account are arbitrary with respect to the kinds of mental repre-
sentations each account posits. I argue that this is not the case for Classicism, but that it is the case 
for Connectionism. The objection, as I present it, makes use of Aizawa?s cases but follows the 
outline of the objection as presented by Cummins 1996, pp. 605?608.
4
 Darwin 1985, chapter 5, pp. 178?179.
just what you would expect. On the Creationist?s account, the similarities and 
dissimilarities are due to the Creator?s plan.
But given the close similarity of the environments of the American and 
European caverns, the Creator could just as easily have placed similar blind in-
sects in the two habitats. Indeed, the Creator could just as easily have made the 
blind forms in Europe similar to the sighted forms in America, and vice versa. 
Nothing about Creationism alone precludes this. Creationism alone does not ex-
plain the facts. In order to cover the data, then, the Creationist account must in-
voke an arbitrary assumption to the effect that the Creator did one thing, when 
he or she could just as easily have done something else. This gives us a defeasible 
reason to prefer the Evolutionist explanation to the Creationist one.
Turning to the Copernicus?Ptolemy case, the position of Mercury, as seen 
from Earth, never deviates from that of the Sun by more than about 28? of arc. 
Venus is never farther from the Sun than about 45?. On the other hand, the posi-
tions of the other planets can deviate from that of the Sun by 180?. Now, the Co-
pernican and Ptolemaic theories of the solar system both advert to deferents and 
epicycles. But the Copernican hypothesis that the planets orbit the Sun in the or-
der Mercury, Venus, Earth, Mars, Jupiter, and Saturn provides an immediate ex-
planation of the observation that Mercury and Venus never stray far from the 
Sun. No further assumptions are required. In contrast, the Ptolemaic theory pro-
85
poses that the solar bodies orbit the Earth in the order Mercury, Venus, the Sun, 
Mars, Jupiter, and Saturn. That theory alone, however, does not explain the 
planetary movements. Another hypothesis is required, namely, that the deferents 
of Mercury, Venus, and the Sun are ?locked? together, so that the centers of the 
epicycles of Mercury and Venus are always in line with the Sun (while none of 
the deferents of the remaining planets are locked with any other). 
Thus, unlike the Copernican account, in order to cover the data, the Ptole-
maic explanation must invoke an arbitrary assumption. The Ptolemaic theory 
alone is insufficient. For while geocentrism allows the deferents of Mercury, Ve-
nus, and the Sun to be locked together, it also allows them to be independent of 
each other. This gives us a defeasible reason to prefer the Copernican explanation 
to the Ptolemaic one.
With respect to the above cases, the objection that the Connectionist expla-
nation of systematicity is unprincipled likens Classicism to Evolutionism and 
Copernican theory, and likens Connectionism to Creationism and Ptolemaic the-
ory. As we?ve seen (? 2.1), Classicism explains systematicity by hypothesizing 
that mental representations have a combinatorial syntax and semantics and that 
mental operations are sensitive to the syntactic properties of mental representa-
tions. It will be useful here to revisit the essentials of that explanation (and the 
Connectionist style of explanation) in light of the preceding discussion.
86
Thus, consider a complex Classical mental representation, aLb. It is com-
posed of the simpler representational constituents a, L, and b. By virtue of some 
of their local, physical properties, a and b have the syntactic role of designator, 
while L has the syntactic role of 2-place predicate. The mental processes that 
contribute to the formation of aLb are sensitive to those syntactic properties. 
From this it should be clear that if the relevant cognitive system can form aLb, it 
is to be expected that it can just as easily form bLa. The very same mental proc-
esses which can construct the former can construct the latter. No additional op-
erations are required. For a is placeable in the subject or object slot of L, as it 
were, by virtue of being a designator; and the same is true of b.
Of course, when the constituents of aLb stand in construction so as to form 
that representation, a acquires the syntactic role of subject, and b acquires the 
syntactic role of object, and vice-versa for bLa. But these further syntactic roles 
are consequences of the representation forming process. The formation of tokens 
of aLb and bLa employ the very same types of mental operations.
On Connectionism, representations are vectors, and mental processes are 
vector operations. Vectors are syntactically atomic, so a Connectionist explana-
tion of systematicity cannot appeal to processes that are sensitive to their syntac-
tic structure. But vectors are capable of encoding, by virtue of their local, physical 
properties, representational-constituent structure. So vector operations can be 
87
sensitive to constituent structure through their sensitivity to the relevant physical 
properties of vectors.
Thus, consider a vector, V
aLb
, where a, L, and b are its vector constituents. 
One way this vector can be tokened in a Connectionist system is by means of a 
vector operation on its vector constituents. But (in contrast with Classicism) the 
system must do more than just combine those constituents. It can?t merely su-
perimpose them, say, for that wouldn?t account for the different structural roles 
of a, L, and b in V
aLb
. The Connectionist solution is to posit operations that bind 
constituents to the appropriate structural roles.
5
 (As we?ve seen, Smolensky ar-
chitectures bind a particular vector to a particular structural role by taking the 
tensor product of that vector and the vector that ?represents? that structural role 
[? 2.2].) So any process that constructs V
aLb
 must be different from one that con-
structs V
bLa
. For their constituents must be bound to the appropriate structural 
roles, and the structural roles of a and b differ in the two vectors. With role 
binding operations in place, systematicity is then explained in terms of the sensi-
tivity of vector operations to the local, physical properties of complex vectors 
that encode all of the structural roles of their constituents.
88
5
 Computer scientists call this variable binding. Notice that Connectionists also need to distin-
guish vectors representing individuals from vectors representing attributes of individuals (cf. 
Marcus 2001) and thus need to posit operations that bind the former to designator structural roles 
and the latter to predicate structural roles. The principle ways Connectionists have attempted to 
achieve variable binding are reviewed in Browne and Sun 1999.
This sort of explanation, however, like the Creationist and Ptolemaic expla-
nations above, is unprincipled in that it requires an arbitrary assumption. It is not 
a tenet of Connectionism that networks have operations that bind vectors to 
structural roles. To employ such operations is not part of what makes a system a 
Connectionist network. What makes a system such a network is that its repre-
sentations are syntactically simple vectors, and its operations are vector opera-
tions, such as matrix multiplication. A Connectionist system could just as easily 
have structural-role binding operations as not have them. Therefore, Connec-
tionism by itself fails to explain systematicity. More hypotheses are required.
6
 
This gives us a defeasible reason to prefer the Classical explanation to Connec-
tionist explanations.
7
 
This objection is at times misunderstood. For example, Hadley complains 
that 
? on the classical account, the systematicity of representations arises only 
in the presence of assumed algorithmic processes. ? It follows, then, that 
89
6
 Compare Phillips (1998, p. 157): an ?architecture based on a network of units coupled with a 
learning algorithm ? is attractive. It makes fewer commitments to the design of specific mecha-
nisms that realize cognitive behaviours ?. Nevertheless, if one accepts the requirements of sys-
tematicity, then those requirements are not met by just this type of architecture. Either additional 
properties are necessary to explain why networks are configured in a particular way so as to ex-
hibit systematicity or additional subnetworks are required to preprocess potential components 
into similarity-based representations, for which it may be possible to demonstrate ? systematic-
ity. Either way, the standard approach will not suffice.?
7
 One could stipulate that the additional requisite hypotheses are part of the theory. But, as Ai-
zawa (1997) observes, this move clearly wouldn?t help in the cases of Creationism and Ptolemaic 
Geocentrism. So it shouldn?t help in the case of Connectionism either.
when the ? characteristics of a connectionist architecture are considered, 
we must permit the connectionist to assume that correspondingly general 
processing mechanisms are in place. ? Yet [Fodor and McLaughlin] seem 
unwilling to allow Smolensky the connectionist mechanisms that would 
permit a network to process his tensor-product representations ? in a 
manner that would engender systematic relations between those rep- 
resentations.
8
 
But the unprincipledness objection does allow the Connectionist correspondingly 
general processing mechanisms that permit a network to do the job. The point of 
the objection is that such mechanisms don?t guarantee that every network em-
ploying them can do the job. Further mechanisms, not essential to or definitive of 
Connectionism, are needed. Hadley fails to see that the Connectionist mecha-
nisms of a Smolensky architecture are of the latter sort, not of the former.
Still, though I?ve already taken some pains to do so, I might not have made it 
sufficiently clear that the unprincipledness objection attributes correspondingly 
general processing mechanisms to Classicism and Connectionism. Perhaps I?ve 
failed to attribute to Classicism all the processing mechanisms the Classical ex-
planation of systematicity requires, in which case I should attribute further (and 
correspondingly general) processing mechanisms to Connectionism. It?s impor-
tant, then, to say a bit more on this issue.
90
8
 Hadley 1997, p. 143 (emphasis in original).
5.2  The Nonarbitrariness of Classical Processes
Aizawa
9
 argues that the two hypotheses that mental representations have a com-
binatorial syntax and semantics and that mental processes are causally sensitive 
to syntactic structure do not explain systematicity. His argument amounts to the 
observation that cognitive architectures of which those hypotheses hold can just 
as easily be nonsystematic as systematic. For one could easily program a system 
having such an architecture so that it can token, say, aLb, but not bLa. If that?s 
right, it looks like the Classicist can explain systematicity only by hypothesizing 
mental processes specifically designed to capture it. And that would mean either 
that the Classicist?s explanation is just as unprincipled as the Connectionist?s or 
that the Classicist must allow the Connectionist to appeal to correspondingly 
general mental processes.
The easiest way to make it clear that Aizawa?s argument fails is to consider 
what it would take to program a Classical system that is capable of tokening aLb 
but incapable of tokening bLa. The representation forming mechanisms would 
have to be sensitive to more than just the syntactic properties of a, L, and b. Oth-
erwise they could just as easily produce bLa as aLb. They would also have to be 
sensitive to the nonsyntactic ?shapes? of a and b. That is, a rule to the effect that 
?If x = b and yLz, then x ? y? would be required. Such a rule, however, is com-
91
9
 Aizawa 1997, pp. 127?135.
pletely arbitrary with respect to Classicism. So within the Classical framework, it 
is asystematicity, not systematicity, that has to be specifically designed into Classi-
cal systems. Although it is clearly possible to have a Classical system that can 
token aLb but not bLa, such a system would have to be specially so crafted.
For Connectionism, the situation is reversed. A Connectionist architecture 
that is able to token both V
aLb
 and V
bLa
 will not be able to do so because they are 
systematic variants. Rather, a Connectionist architecture could do so only if that 
capacity has been specifically built into the system. It could just as easily have 
been built out of the system.
There is here a connection with the argument of the previous chapter. Recall 
the hypothetical organism discovered to have a Connectionist cognitive archi-
tecture. I argued that it?s clear how we might have been able to design such an 
organism, whereas it?s not at all clear how nature could have. Now it looks like 
we have a reason why that is so. For, insofar as the organism has a Connectionist 
cognitive architecture, it seems that nature could just as easily have made the 
organism?s mind nonsystematic as systematic.
92
5.3  Unprincipledness is Not Structured-Domain Relative
Cummins and colleagues
10
 (hereafter, Cummins) argue that the Classicist must 
either concede that the unprincipledness objection is not all that serious or admit 
that some, perhaps a great deal, of mental representation is non-Classical.
Cummins begins by pointing out that acquiring knowledge about some do-
mains requires acquiring knowledge about their underlying structure. Acquisi-
tion of a language requires acquisition of its grammar. Learning which direction 
from a novel location is homeward requires learning the relationships between 
various directional cues and certain places you?ve been. Likewise, learning the 
layout of one?s environment requires learning the relationships among various 
locations within it. Some domains are not like this. Learning the state capitals 
does not require learning about any structural properties of states or their capi-
tals, other than simply what capital is situated within what state.
According to Cummins, the fact that a cognitive system has learned about 
the structure of a certain domain will manifest itself in various psychological ef-
fects. That is, the system will become subject to certain psychological laws, the 
specific nature of which will depend on the structure of the relevant domain as 
well as on various properties of the system?s cognitive architecture and physical 
organization. Cummins calls such effects ?systematicity? effects. I call them 
93
10
 Cummins et al. 2001, Cummins 1996.
?structural? effects, so as to avoid the tendentious suggestion that this sort of 
systematicity is to be identified with the systematicity of thought.
Cummins distinguishes primary structural effects from incidental structural 
effects. Primary structural effects are laws relating a cognitive system?s inputs to 
its outputs. If, for example, Andy has learned how to multiply integers, his cog-
nitive system will be governed by a psychological law stating (more-or-less) that 
if a mathematically well-educated cognitive system s is asked on an exam to 
multiply two integers, n and m, s will, ceteris paribus, provide the answer nm.
Incidental structural effects are the result of not only what a system com-
putes but also a number of other factors, including what algorithms the system 
uses to perform its computations, the kind of hardware on which those algo-
rithms are implemented, and the effects of external or internal environmental 
conditions on the system?s operation. Thus, two systems can exhibit the same pri-
mary effect while exhibiting different incidental effects. Andy and Betty could ex-
hibit the same primary multiplication effect, but they could nonetheless exhibit dif-
ferent incidental multiplication effects if they use different procedures to multiply.
Important to Cummins? argument is the distinction between structural rep-
resentations, structural encodings, and pure encodings (all of which are repre-
sentations). Structural representations and what they represent are isomorphs. 
They have constituents which represent (at least in the context of the representa-
94
tion as a whole) parts of the relevant domain; and how the constituents of such a 
representation are structurally related represents how the represented parts of 
the domain are structurally related. An accurate map of Boston is a structural 
representation of Boston. Classical binary trees can serve as structural represen-
tations of sentences.
The constituency relation for structural representations is a part ?whole rela-
tion and thus a kind of co-tokening relation, as it is on the Classical account. 
However, it is not the case that all structural representations are Classical repre-
sentations. For some structural representations (such as standard maps, photo-
graphs, and scale models) do not have a combinatorial syntax and semantics. The 
content of a representational part of a structural representation need not be con-
text independent. Nor must such a part, independently of its representational 
context, represent anything at all.
Structural encodings, on the other hand, do not share structure with what 
they represent. However, the structure of what a structural encoding represents 
is systematically recoverable from it, by means of a genera?/productive algo-
rithm. As we saw in Section 2.2, tensor product representations can serve as 
structural encodings of binary trees. G?del number representations can serve as 
structural encodings of sentences.
95
Finally, pure (or arbitrary) encodings do not share structure with what they 
represent, nor is the structure of what a pure encoding represents systematically 
recoverable from it.
According to Cummins, an adequate argument from the systematicity of 
thought to the conclusion that mental representations are Classical requires the 
assumption that the systematicity of thought is an incidental rather than a pri-
mary structural effect of having acquired knowledge about certain domains. For 
primary structural effects don?t provide us with evidence about how a cognitive 
system represents a domain or processes information about it. And that?s what?s 
at issue. So, for Cummins, an adequate Classical explanation of systematicity, as 
an argument for Classical mental representations, should have the following 
form:
1. There are incidental structural effects of having acquired knowledge of 
domain D.
2. If there are such effects, then mental representations somehow preserve 
information about D?s structure.
3. D?s structure is sentence-like.
4. Assuming that the structure of mental representations is sentence-like 
provides the best explanation of the fact that mental representations pre-
serve information about D?s structure.
5. Therefore, mental representations have sentence-like, that is, Classical 
structure.
96
Cummins regards steps 1 and 2 as uncontroversial. He also thinks that positing 
structural representations is the most natural way to explain the various inci-
dental structural effects associated with the acquisition of knowledge of different 
domains. So he considers step 4 to be very plausible. The trouble with the argu-
ment, on his view, is that step 3 is clearly not true for every domain. Different 
domains have different structures; in particular, many domains have non-Clas-
sical, that is, non-sentence-like, structures.
Cummins? case in point involves the perception of objects in space.
11
 If a 
cognitive system has learned about the structure of visual scenes containing dis-
tinct objects, it will exhibit certain incidental structural effects. For example, any-
one who can perceive (imagine) a scene in which two objects are situated at com-
pletely distinct locations can also perceive (imagine) a scene in which the loca-
tions of those two objects are switched. Such structural effects would be naturally 
explainable in terms of multidimensional-graph-like, structural representations 
having representations of objects among their constituents. However, Cummins 
maintains, such structural effects would not be naturally explainable in terms of 
Classical representations. For Classical representations have sentence-like struc-
ture (a combinatorial syntax and semantics), not graph-like structure. Thus, 
97
11
 Cummins 1996, p. 604.
whereas they could serve as structural encodings of visual scenes containing ob-
jects, they could not serve as structural representations of them. 
On Cummins? account, then, any incidental structural effects of having ac-
quired knowledge of a non-Classically structured domain provide the basis of an 
argument, of the above form, for the existence of non-Classical, structural mental 
representations. The result is (perhaps massive) representational pluralism: for 
every differently structured domain, we have an argument for the existence of a 
structurally distinct kind of structural mental representation. This result 
shouldn?t sit well with Classicists who believe that all thought is grounded in 
Classical representation.
The Classicist can avoid this sort of representational pluralism, according to 
Cummins, by arguing that some incidental structural effects can be best ex-
plained by appeal to Classical representations that are structural encodings of 
what they represent (that is, by rejecting premise 4 of the above argument). But, 
Cummins argues, if structural encoding is allowed, then the objection that the 
Connectionist explanation of systematicity is seriously unprincipled must be 
given up. For, according to Cummins, if the Classicist concedes that certain inci-
dental structural effects can be given an adequate explanation by appealing to 
(Classical) structural encodings, then the incidental structural effects associated 
with the systematicity of thought can likewise be given an adequate explanation 
98
by appealing to non-Classical structural encodings (say, tensor product repre-
sentations). Such an explanation would be no more unprincipled than one that 
appeals to structural encodings that are Classical representations.
So if Cummins is right, it looks like the Classicist must either give up the 
objection that the Connectionist explanation of systematicity is seriously unprin-
cipled or admit that a great deal of mental representation is non-Classical.
That?s Cummins? argument. The rest of this chapter is about why it fails. 
Before I set out my main responses, though, it is worth noting that it is not neces-
sarily incompatible with Classicism if some kinds of mental representations are 
non-Classical. The important part of Classicism is that a significant part of a com-
plete account of the mind will have to advert to Classical representations and 
Classical operations defined over them. The Classicist could allow that part of 
that complete account will have to appeal to non-Classical representations and 
processes.
12
 In any event, Cummins hasn?t come close to showing that the Classi-
cist must either concede that the unprincipledness objection is not all that serious 
or admit that some, perhaps a great deal, of mental representation is non-
Classical.
99
12
 See, for instance, Fodor 2000.
5.3.1  The Relationship between Content Structure and Representation Structure
First, it is not at all clear that an adequate argument from systematicity to the 
conclusion that mental representations are Classical requires the assumption that 
systematicity is a structural effect of having acquired knowledge about certain 
domains. The Classical explanation of systematicity (? 2.1) makes no appeal 
whatsoever to knowledge about any particular structured domain. It?s an argu-
ment from the systematicity of thought, and thought (fortunately) is not domain 
specific. Systematicity is neither a primary nor an incidental structural effect of 
having acquired knowledge about any specific domain. Rather, it?s a psychologi-
cal effect of having acquired the ability to think or reason, regardless of domain.
Cummins does have a reply to this objection.
13
 The Classical explanation of 
systematicity appeals to the contents of mental representations. On Cummins? 
view, the problem with that approach is that there is no nontendentious way of 
identifying the systematic variants of a content. He first points out that an ade-
quate Classical explanation of systematicity must not depend on the assumption 
that contents have the structure of Classical representations. Contents might be 
structurally atomic or have a different kind of structure. So we should not con-
strue the notion of systematicity in this way:
100
13
 See Cummins 1996, pp. 594?599.
Systematicity 1  Anyone who can think a content of the form aRb can think 
a content of the form bRa.
This problem can be avoided by construing systematicity somewhat as follows:
Systematicity 2  Anyone who can think the content c can also think the 
systematic variants of c.
But then the question arises, How are the systematic variants of a content to be 
identified?
As it turns out, Cummins argues, what contents appear systematically re-
lated, and what structure contents appear to have, depends upon the structure of 
the representations we use to refer to them. Thus, consider the claim
(1) Anyone who can think the content [Andy loves Betty] can think the 
content [Betty loves Andy].
According to Cummins, the intuitive force of (1) is due entirely to the systema-
ticity present in natural language?in particular, to the fact that the sentences 
?Andy loves Betty? and ?Betty loves Andy? are systematic variants of each other. 
This should be clear, he says, from the fact that (1) would lose some or all of its 
intuitive force if we substituted atomic or differently structured (Classical or non-
Classical) representations for those sentences. He asks us to consider claims such 
as the following:
101
(2) Anyone who can think Betty?s favorite content can think the content 
[Betty loves Andy].
Even assuming that Betty?s favorite content is [Andy loves Betty], Cummins 
maintains, (2) fails to elicit systematicity intuitions.
Cummins also asks us to compare the following claims:
(3) Anyone who can think that a face is smiling can think that a face is 
frowning.
(4) Anyone who can image a smiling face can image a frowning face.
Claim (3) is dubious. But, Cummins says, given an appropriate scheme of imag-
istic representation (say, one which builds images from a palette of circles, lines, 
and arcs), claim (4) is quite plausible. For such a scheme would permit an image 
of a frowning face that is a permutation of an image of a smiling face. Thus, if our 
preferred scheme for representing contents were one of just that sort, then a suit-
able counterpart of (3), say,
(3*) Anyone who can think that A can think that K,
would become plausible as well.
14
 
What these examples are supposed to show is that:
Absent some representation-independent access to the structure of propo-
102
14
 Cummins doesn?t appeal to any statement like (3*), but it seems to me that doing so further 
clarifies his point.
sitions, which propositions seem to be systematic variants of each other 
will depend on one?s preferred scheme for representing propositions. If 
you linguistically represent the contents to be thought, then you will want 
mental representation to be linguistic, since then the systematicities in 
thought that are visible from your perspective will be exactly the ones 
your mental scheme can explain.
15
 
In short, an explanation of systematicity that identifies contents linguistically is 
one that covertly assumes that contents have the structure of Classical represen-
tations, and that?s cheating.
According to Cummins, then, what we need is a way of identifying syste-
maticities in thought that is independent of any assumptions about the structure 
of thought contents. This can be done if we focus on how we acquire knowledge 
of structured domains.
I don?t find this reply to the first problem I see with Cummins? argument to 
be all that forceful. To begin with, it?s not clear to me that the Classical explana-
tion of systematicity makes any commitment to the structure of contents.
16
 Sup-
pose for the sake of argument that contents are atomic. That wouldn?t change the 
fact that they have many properties. For example, the contents [Andy loves 
Betty] and [Betty loves Andy] have different truth conditions and thus might 
103
15
 Cummins 1996, p. 597.
16
 As McLaughlin (1993, p. 186) notes, ?classicism is not committed to what Clark (1988) calls ?the 
transparency thesis?, namely the thesis that there is a one-to-one correspondence between the 
concepts exercised in a thought and the (public language) words used to specify the content of 
the thought. The relationship between such words and concepts might prove to be quite compli-
cated.?
have different truth values. The first is true iff Andy stands in the loving relation 
to Betty, while the second is true iff Betty stands in the loving relation to Andy. So 
even if those contents are atomic, they stand in different relations to Andy, Betty, 
and the loving relation. And how one of those contents is related to those three 
things (whatever their ontological status) is quite plausibly systematically related 
to how the other content is related to them. 
My line of reasoning might seem to commit me to the possibility that the 
simplicity of mental representations is compatible with systematicity. For even if 
mental representations are atomic, they still have their contents and associated 
truth conditions. So perhaps thought is systematic just because truth conditions 
are. This move, however, wouldn?t work. We?d still have to explain why we can 
think thoughts with systematically related truth conditions. That is, we?d still 
have to explain why anyone who can think a thought, T, can think those 
thoughts the truth conditions of which are systematic variants of the truth condi-
tions of T. And that appears to be difficult to do without positing a language of 
thought.
I suspect that a defender of Cummins might respond by rerunning his ar-
gument, substituting ?truth-conditions? for ?contents?; that is, by arguing that 
what truth conditions appear systematically related, and what structure they ap-
pear to have, depends upon the kinds of representations that we use to refer to 
104
them. What I would do then is rerun his argument, substituting ?structured do-
mains? for ?contents.? The trouble is that we have to use representations to 
identify the relevant entities at some point. Otherwise, it?s hard to see how we 
could state our theories or theorize at all.
In some circles, that might be seen as a problem for both the Classical view 
and Cummins? way of arguing from structural effects to the structure of mental 
representations. But it?s not a problem for either. And that?s because Cummins? 
hasn?t shown that what contents appear systematically related, and what struc-
ture they appear to have, depends upon the structure of the representations we 
use to refer to them.
Consider again claims (1) and (2):
(1) Anyone who can think the content [Andy loves Betty] can think the 
content [Betty loves Andy].
(2) Anyone who can think Betty?s favorite content can think the content 
[Betty loves Andy].
Cummins does show that if we want to illustrate systematicity, we?d be advised 
(under normal circumstances) to use (1). But I don?t see why this is anything 
more than a matter of pragmatics. If Betty?s favorite content is in fact [Andy loves 
Betty], given an appropriate context (say, one in which it?s common knowledge 
what Betty?s favorite content is), (2) could be used to provide an example of sys-
tematically related contents. Furthermore, and more to the point, if the phrase 
105
?Betty?s favorite content? were an idiom that had the content [Andy loves Betty], 
then (2) would serve as well as (1) as an illustration of systematicity. Compare:
(5) Anyone who can think that Andy has been shanghaied can think that 
Andy has shanghaied some person or persons.
Even though the sentences ?Andy has been shanghaied? and ?Andy has shang-
haied some person or persons? are not syntactic permutations of each other, (5) 
seems to provide an acceptable illustration of systematicity. And (5) seems to do 
so because those two sentences have the contents they do. When providing an 
example of systematicity, the identification of the relevant contents does require 
the mediation of appropriate representations. But Cummins doesn?t show that 
it?s the structure (syntactic or nonsyntactic) of the mediating representations, as 
opposed to their contents, which determines whether or not we see systematicity 
in thoughts with those contents. After all, we use the words ?dog? and ?god? in 
that ?clauses to refer to the contents [dog] and [god]. And those words are sys-
tematic variants of each other. But that they are so related doesn?t in the least in-
cline us to believe that anyone who can token the concept having the content 
[dog] can token the concept having the content [god].
It may be worthwhile to note, regarding (5), that the sentences ?Andy has 
been shanghaied? and ?Andy has shanghaied some person or persons? arguably 
are syntactic permutations of each other. For syntactic structure need not neatly 
106
correspond to surface structure. However, any argument for the claim that the 
two sentences have the same syntactic structure would quite plausibly have to 
appeal to their respective contents. And that would be to admit that syntactic 
structure and (internal or relational) content structure are intimately related, 
which is precisely what Cummins questions.
Cummins? treatment of claims (3) and (4) is also flawed.
(3) Anyone who can think that a face is smiling can think that a face is 
frowning.
(4) Anyone who can image a smiling face can image a frowning face.
The problem here is that Cummins? argument would appear to prove too much. 
Perhaps if our preferred representational scheme for representing contents was 
one of the imagistic sort Cummins envisages, then (4) and
(3*) Anyone who can think that A can think that K,
would gain intuitive force. However, if (3*) would gain intuitive force, so too 
should
(6) Anyone who can think that A can think that .
For ? A ? and ?  ?would be structural permutations of each other. But that pre-
sents a problem, since the content of ?  ? could be merely arbitrarily relatable to 
the content of ? A ?, or it might have no content at all (even assuming the two 
107
images are ?image-grammatically? well formed). It?s content could be, say, [The 
sculpture in the square is outr?]. And I strongly suspect that we wouldn?t find 
the claim
Anyone who can think that a face is smiling can think that the sculpture in 
the garden is outr?,
to be very plausible. And we certainly wouldn?t find a nonsensical claim to be 
plausible. Thus, even assuming a suitable system of representation, (6) might not 
serve as a plausible illustration of systematicity.
Another of Cummins? examples is similar to the image case. Cummins 
claims that
if you think mental representations are activation vectors, then you are 
entitled to
Anyone who can think a thought of the form < ?a ? b ?> can think a 
thought of the form < ? b ? a ? >.
17
 
Well, perhaps if you can token one of those vectors, you can token the other. But 
again, the contents of those vectors could be merely arbitrarily related to each 
other, and the latter might not pick out a thought, even if the former does. 
In light of all this, it?s not too hard to see why (3*) has some intuitive force 
(within Cummins? scenario). It?s because the semantic relations between ?a face 
smiling? and ?a face frowning? are nonarbitrary. So it looks like we are free to ar-
108
17
 Cummins 1996, p. 598.
gue that because our intuitions about systematicity depend on semantic rela-
tions, it can?t be right that it?s the (syntactic or nonsyntactic) structure of the me-
diating representations, as opposed to their contents, which determines whether 
or not we see systematicity in thoughts having those contents.
5.3.2  Unprincipledness Rests with Vector Constituency, Not Encoding
Let?s move on to a second serious problem with Cummins? argument. It doesn?t 
appear to be true that if Classicists grant that certain structural effects can be 
adequately explained by appealing to (concatenative) structural encodings, then 
they should also grant that the systematicity of thought can likewise be ade-
quately explained by appealing to (nonconcatenative) structural encodings. For 
the objection that the Connectionist explanation of systematicity is unprincipled 
doesn?t turn simply on the fact that vectors are structural encodings. Rather, it 
turns on the fact that the constituency relation for vectors is nonconcatenative. So 
it?s hard to see how an explanation of certain structural effects in terms of struc-
tural encodings that are Classical representations would be inadequate (because 
unprincipled).
This should be fairly easy to show. Suppose we have a Classical, structural 
encoding scheme for representing maps. Suppose further that we have a Classi-
cal system which can represent, in that scheme, any structural (?systematic?) 
variant of any map it can represent. Would any Classical explanation of this be 
109
unprincipled in the way I have argued Connectionist explanations of systematic-
ity are unprincipled? I don?t see why it should be. My point is easily made in 
terms of a relatively concrete example.
Maps of Earth which illustrate plate tectonics provide a relatively good case 
of maps which are structural transformations of each other.
18
 Consider two such 
maps, one that represents South America as being n miles from Africa, and an-
other that represents South America as being m miles from Africa. Part of the ex-
planation of the fact that Classical systems can represent one of these maps if 
they can represent the other would presumably appeal to facts like this:
The Classical encodings of these maps each have constituents representing 
map?distance representations, Dxy = z. But they differ in that the encod-
ing of the first map has the representation Dsa = n (but not Dsa = m) as a 
constituent, whereas the encoding of the second map has Dsa = m as a 
constituent (but not Dsa = n) (where the contents of these constituents are 
the obvious candidates).
19
 
The explanation would also appeal to the syntax sensitivity of the operations 
which construct the relevant representations, showing that if a Classical system 
110
18
 For present purposes, I?ll assume that to structurally permute a map is to rearrange its repre-
sentational parts by rearranging (and possibly reshaping) those parts in such a way that they re-
tain their contents. Different ideas of what counts as a structural permutation of a map would 
merely require a different sort of example.
19
 It?s true that any artificial, Classical system of structural encoding will have to be tailored to the 
structural permutations one is interested to capture. But the same will be true for any artificial, 
distributed-vector system of representation. Furthermore, as I am in the process of arguing, this 
sort of arbitrariness is not what would make an explanation of domain-structure sensitivity that 
appeals to such a representational system unprincipled.
can construct a representation of one map, then it can construct a representation 
of a structural permutation of that map, proceeding along the lines of the Classi-
cal explanation of systematicity. Given a representation of a particular map, a 
Classical system?s syntax-sensitive operations allow it to construct a representa-
tion of a different map, where the second has as parts the same representations as 
the original map but standing in a different arrangement.
Now imagine a Connectionist system that captures the very same structural 
transformations that the above Classical system does. How does it do that? Well, 
it presumably employs an encoding scheme that represents map parts and map 
structural relations. And presumably its vector operations correspond to the ?le-
gal? permutations
20
 that can be performed on the map structures of interest; per-
haps its vector representations and operations are related to those permutations 
in just the way that the vector representations and operations of a Smolensky 
architecture are related to Classical trees and rules for extracting and combining 
tree constituents. 
But now it looks like we can run the unprincipled-explanation objection 
against the claim that such Connectionist systems could serve as the basis of a 
good explanation of the cognitive capacity (assuming there is one) to represent 
systematic map variants. It?s important to emphasize that there are two distinct 
111
20
 I don?t have in mind any particular technical notion of legal permutation for maps. The intui-
tive idea is that a legal permutation is one which yields a well-formed map.
kinds of structural encodings. There are structural encodings having concatena-
tive constituents and structural encodings with nonconcatenative constituents. 
The envisaged Classical system uses structural encodings of the former sort. 
With such representations (to echo a point Cummins makes regarding structural 
representations), ?the theorist is constrained by the form of the representations: 
you can only write permutation rules when there are permutable constituents.?
21
 
However, Connectionist models have no way of enforcing a comparably princi-
pled constraint. For such models, it?s not the case that you can only have certain 
vector operations when there are permutable representational constituents, for 
there simply aren?t any such constituents. Such models, then, need structural-role 
binding operations, which are arbitrary with respect to Connectionism. So it 
looks like it?s the case that, for any structural map variants M and M*, a Connec-
tionist system that is able to represent both maps will not be able to do so because 
they are structural variants. Rather, if a Connectionist system can represent certain 
map structural variants, it will be able to do so only because that capacity has 
been specifically built into the system. It could just as easily have been built out of 
the system.
What misleads Cummins is his view, which we?ve rejected, that arguments 
from systematicity to the structure of mental representations ought to focus on 
112
21
 Cummins 1996, p. 607.
the cognition of structured domains. From that perspective, a Classical explana-
tion of the structural effects for, say, the linguistic domain would be principled 
because the structure of Classical representations is isomorphic with the structure 
of that domain. On the other hand, a Connectionist explanation of those effects 
would be unprincipled, because the structure of vectors does not mirror the 
structure of that domain. Likewise, a Classical explanation of the structural ef-
fects for a domain having map-like structure would be equally unprincipled?
map-like domains do not have syntactic structure. If the issue is couched in these 
terms, then the distinction between syntactically complex structural encodings 
and syntactically simple structural encodings is bound to seem irrelevant. But, as 
I?ve argued, it?s not.
5.4  Representations for Navigation
I conclude this chapter by stating its bottom line in terms of the explanation of 
navigational capacities and by briefly noting a minor point relevant to represen-
tational pluralism.
I argued in this and the previous chapter that Connectionist explanations of 
systematicity are inadequate. This chapter yields a further conclusion. I will ar-
gue that certain navigational capacities are best explained by appeal to mental 
representations for which the constituency relation is concatenative. If that is 
right, then, regardless of whether those representations are structural representa-
113
tions or structural encodings, and regardless of whether they are Classical repre-
sentations, Connectionist explanations of the same capacities will be inadequate. 
Any such explanation will be subject to at least two strong objections: the expla-
nation (if a good one) doesn?t explain what it sets out to explain (Chapter 4), and 
it is unprincipled.
Finally, a relatively minor point relevant to representational pluralism. Even 
if adequately explaining certain navigational capacities requires positing map-
like structural representations, that in itself is not logically incompatible with the 
possibility that the required explanation is a Classical one. For although ordinary 
maps lack a combinatorial syntax and semantics, this need not be true for formal 
maps (or for cognitive maps, if there are such things). That is, it is possible to de-
vise limited, formal systems of representation that are both language-like and 
map-like. For illustrative purposes only, I?ve provided a simple, artificial exam-
ple in Appendix A.
22
 The expressive power of all such systems of representation 
very well might be quite limited. But that?s not a problem if adequately repre-
senting facts in the relevant domain doesn?t require all that much expressive 
power to begin with. A system of representation devoted to or useful for naviga-
tion might not need to be productive, say.
114
22
 Compare Casati and Varzi 1999, chapter 11. Their system is more complex and less artificial 
than mine. The only reason I provide an alternative is that it is much easier to state formally.
Chapter 6
Structure of the Honeybee?s Navigational Domain
Although complex representations such as charts and maps are quite useful for 
purposes of navigation, in part because of the structural similarities they bear to 
the regions they represent, that does not imply that any animal capable of ac-
complishing even fairly sophisticated navigational tasks represents some features 
of its environment with mental representations having similarly complex con-
tents or configurations. Through careful behavioral or neurophysiological ex-
perimentation, however, it is possible to discern the structural features, of a cer-
tain domain, to which an organism is sensitive. The nature of such features, to-
gether with details about the organism?s sensitivities to them, can provide clues 
regarding the semantic simplicity or complexity of its mental representations. 
And, if it turns out to be probable that they are semantically complex, then fur-
ther considerations can be marshaled to address the question of the configura-
tional structure of such representations.
In this chapter, I review a number of recent behavioral studies which reveal 
various structures that honeybees, as navigators, are capable of learning. I also 
review those and other studies that show what bees are able to do with some of 
115
that information. Based on the nature of the navigational capacities exhibited, I 
argue, in the next (and final) chapter, that certain classes of information acquired 
by honeybees exhibit systematicity. I conclude that we thus have at least one 
good reason to prefer Classical theories of honeybee navigational capacities over 
Connectionist ones.
Note that while I speak of honeybees acquiring information about various 
distances and directions, relying on their solar compass, and so on, I remain neu-
tral on the issue of what the contents and extensions of the representations in 
question actually are (I discuss this matter further in ? 7.1.1). Moreover, I want to 
avoid commitment to any particular theory of content or reference.
6.1  Simple Structures
Honeybees are capable of acquiring information about a number of relationships 
between various places of interest. My focus is on distance and direction rela-
tions. The following three sections are especially pertinent to the arguments for 
the systematicity of bee navigational capacities presented in Sections 7.1.2 and 
7.2.1. There I argue that the general capacities described below require certain 
more-specific capacities to acquire systematically related information. One of my 
main points will be that the capacities of bees to coherently track locations of in-
terest (including their own current location) require that the semantic relations 
among the items of information they acquire are nonarbitrary.
116
6.1.1  Distance and Direction Relations
A honeybee learns the distance and direction, from the hive, of a foraging site it 
discovers. During the bee?s outbound flight, it continually updates the informa-
tion it has about its location in relation to the hive by a process known as dead 
reckoning, or path integration: the bee continually integrates its most recent 
flight segment, or vector of travel, with the sum of its previous flight vectors.
1
 
The result is a single vector that informs it of its current direction and distance 
from the hive. When the bee discovers a foraging site, it stores some kind of rep-
resentation of the site?s location; and when it returns to the hive, it can go there 
directly, even if its outward path was circuitous, and even if no landmarks near 
the hive are visible from the site.
2
 
The waggle run, or dance, is the means by which a honeybee informs other 
colony members of the approximate distance and direction to a foraging site or a 
potential nest site.
3
 Some honeybee species, such as Apis mellifora and Apis cerana, 
orient their waggle runs with respect to gravity. They perform their runs in dark-
117
1
 Two terminological matters. First, I use the terms ?integrate? and ?sum? here somewhat figura-
tively, for convenience. At this point I?m not making any specific claims about the nature of hon-
eybee mental representations or processes. Second, also for convenience, I will often rely on con-
text to disambiguate terms such as ?vector??whether they refer to some feature of the environ-
ment (or the bee?s own behavior) or to the bee?s information about such a feature.
2
 For recent discussions of path integration in insects, see Collett (M.) and Collett 2000; Collett 
(M.) et al. 1998; Collett (T. S.) and Collett 2000, 2002; Giurfa and Capaldi 1999; Schmidt et al. 1992; 
Wehner et al. 1996, 2002; and Wohlgemuth et al. 2001.
3
 von Frisch 1967, Riley et al. 2005. See Dyer 2002 and Michelson 1999 for recent reviews.
ness on the vertical surface of a comb within the hive. The duration (and other 
properties) of the run corresponds to the distance to the site, and the angle of the 
run with respect to gravity indicates its current solar bearing (a vertical run indi-
cates that the direction to the site is toward the sun). 
Honeybees can use remembered hive-to-site vectors to return directly to 
previously visited locations. They can also use remembered site-to-hive vectors 
to return directly to the hive after having been displaced by an experimenter to a 
familiar or unfamiliar site.
4
 But they can learn vectors other than those relating 
the hive to other important locations. They can also learn ?local? vectors: those 
connecting a landmark (or other visual cue) to another landmark, or connecting a 
landmark to a goal site.
5
 
Honeybees estimate distance flown by monitoring optic flow, or image 
movement across the retina.
6
 They estimate direction principally by means of 
their solar compass.
7
 The next section presents features of the solar compass 
mechanism that are important for issues of cognitive architecture. 
118
4
 Menzel et al. 1998, 2000a, 2005; Riley et al. 2003; Sch?ne et al. 1998.
5
 Chittka et al. 1995a,b; Collett (M.) et al. 2002; Collett (T. S.) and Baron 1994; Collett (T. S.) et al. 
1993, 1996; Srinivasan et al. 1997; von Frisch 1967. Desert ants also learn local vectors, as shown 
by Collett (M.) et al. 1998.
6
 Esch and Burns 1996, Si et al. 2003, Srinivasan et al. 2000. For optic-flow-based distance estima-
tion by ants, see Ronacher and Wehner 1995.
7
 Dickinson and Dyer 1996; Dyer 1987, 2002; Dyer and Dickinson 1994, 1996; Wehner 1983, 1984; 
Wehner et al. 1996.
6.1.2  Solar Compass and Solar Ephemeris
Honeybees are able to use the sun (as well as the pattern of polarized sunlight) in 
order to set and hold a compass course. Because the sun moves in relation to the 
landscape, a bee?s returning to a familiar site at different times of day requires its 
flying at different angles in relation to the sun?s compass direction, or azimuth. In 
order to do so, it must be able to estimate how much the solar azimuth changes 
during the relevant time spans. This in turns requires the organism to be in-
formed about the time of day (information provided by its circadian clock) and to 
have a record of the solar azimuth as a function of time of day. Such a record is 
called a solar ephemeris.
The solar ephemeris varies with time of year and latitude. Hence, the current 
ephemeris for a particular locale must be learned. Complicating matters is the 
fact that the rate of change of the sun?s azimuth varies with the time of day, being 
slowest in the morning and evening and fastest at midday. Nonetheless, it has 
been shown that bees raised in an incubator and exposed to the sun only during 
a limited part of the day (for example, for a couple hours in the afternoon) learn 
the current solar ephemeris for their locale.
8
 
Honeybees learn how the solar azimuth varies in relation to the position or 
orientation of certain landscape features over the course of the day. Thus, on 
119
8
 Dyer and Dickinson 1994.
heavily overcast days, when neither the sun nor the pattern of polarized sunlight 
is visible, honeybees can estimate the direction of the sun by means of familiar 
landscape features in conjunction with their internal solar ephemeris.
9
 Remarka-
bly, this ability allows bees to estimate the solar azimuth on moonlit nights, pro-
vided that the necessary landmarks are visible.
10
The solar ephemeris learning mechanism, then, produces a record that al-
lows bees to estimate the position of the sun at times when they do not see it 
(due to heavy overcast), have not ever seen it (due to controlled, limited expo-
sure), or never will see it (due to the time of night).
This strongly suggests that bees are capable of freely generalizing their solar 
ephemeris function to novel inputs. That is, for times of day for which a bee has
120
9
 Dyer 1987. Sch?ne et al. (1998) report that, on sunny days, displaced bees tend to be initially but 
briefly disoriented upon release, if their vision of the surrounding terrain is blocked until the time 
of release. Their perception of the sun and its associated skylight patterns alone, under such con-
ditions, appears to be insufficient for them to set a course. In light of this, Sch?ne et al. suggest 
that solar compass course setting involves integration of terrestrial and celestial cues. If ?terres-
trial? is construed broadly, the truth of their suggestion is actually knowable a priori. For if what 
your compass ?needle? points at is in continual motion, it must be related to a stable reference 
direction if it is to be reliable. Consider how useful a standard compass would be in a possible 
world where the position of magnetic north varied predictably but rapidly over, say, 180?. Under 
such circumstances, a standard compass would be useless without a ?magnetic-pole ephemeris? 
chart. In the case of the bee, stabilizing cues are provided by its circadian clock and internal solar 
ephemeris. If ?terrestrial cue? is construed to mean ? landscape visual cue,? then Sch?ne et al.?s 
suggestion is clearly wrong. Route-trained bees do not need such cues to set a course and typi-
cally ignore them when released at an unexpected location (Riley et al. 2003). On the other hand, 
terrestrial cues are essential for setting a correct course. For example, knowing which way is east 
or west doesn?t help much if you don?t know whether you are east or west of your destination. 
Sch?ne et al.?s observation, then, is probably related to the bees? attempting to locate themselves 
in relation to familiar landscape features.
10
 Dyer 1985b.
never experienced the corresponding solar azimuth, it nonetheless is able to fairly 
accurately map those times to azimuthal positions. As we?ll see (? 7.5), the ability 
to freely generalize certain universally quantified functions appears to require 
the exercise of rules that operate on instances of variables. 
Honeybees are capable of relating the solar ephemeris to different groups of 
landscape features.
11
 This is necessary, for example, if the colony moves to a dis-
tant new nest site. And there are further complications. The landscape features 
visible at (or along the way to) one foraging site will often be different from those 
associated with another site. Terrain features visible from two different sites will 
most likely be seen from those places from different perspectives. And the land-
scape features visible from the hive will not be the same as those visible from 
relatively distant foraging sites. This fact is especially pertinent in the case of the 
Asian honeybee Apis florea. On overcast days, members of this species orient their 
horizontal waggle runs in relation to the panorama of vegetation near the nest; 
but that panorama is not available to foragers when away from the nest.
12
 
It is worth noting that members of A. mellifora can be trained to orient their 
waggle runs in relation to landscape features, even though they normally use 
only gravity or celestial cues as a reference. In fact, they are just as good at this as 
121
11
 Dyer 1987.
12
 Dyer 1985a.
members of A. florea.
13
 As I point out later (? 7.2.1) the presence of such unexer-
cised capacities is just what one should expect if related capacities exhibit a cer-
tain form of systematicity.
On the basis of the data reviewed above and in the previous section, it is 
clear that honeybees can acquire information about a large number of relations 
important for navigation. These include (among others):
The distance between the nest and a particular goal site. 
The distance from the bee?s current location to the hive.
For each ?time of day? and location of the bee, the solar bearing of a par-
ticular goal site (nest, foraging site, etc.) at that time and place.
For each time of day, the location of the solar azimuth in relation to the 
surrounding landscape. Also, there are different sets of these relations for 
(sufficiently) different places. 
For each time of day, the bearing of a particular goal site with respect to 
the landscape feature(s) associated with the solar azimuth at that time. 
This information is necessary in order to learn the location of a foraging 
site on the basis of gravity-referenced waggle runs on heavily overcast 
days.
14
So what a honeybee can learn about, say, a particular foraging site, consists of 
information about quite a number of relations involving that site. This is evident, 
even though we have yet to examine other navigational capacities; for instance, 
122
13
 Capaldi and Dyer 1995.
14
 That the waggle dance communicates the distance and direction to a goal site, with a fair de-
gree of precision, has been recently confirmed by Riley et al. 2005.
how honeybees can employ landmarks to locate a goal. Nor have we considered 
certain capacities which are not strictly navigational, such as the capacity to learn 
the type of a certain foraging site (resin, pollen, nectar, water) and that site?s cur-
rent value to the colony.
6.1.3  Updating Previously Learned Relationships
Naturally, bees are sensitive to changes in their environment. What isn?t obvious 
is what they learn when they acquire information about such changes. How is 
the new information related to the old information? What information is updated 
and what information remains the same?
Some light is shed on these issues by the results of station shift experiments. 
In such a study, the hive and feeder are placed along an extended, straight land-
mark, such as a tree line, an edge of a field, or a row of artificial markers. The 
bees are trained to the feeder under sunny conditions. If a natural landmark is 
used, a test site that is just like the training site, except with respect to the com-
pass orientation of the extended landmark, will have been chosen. After training, 
the hive and feeder arrangement is displaced to the test site. Since the bees will 
be unaware of their change in venue, they will be faced with conflicting informa-
tion. Their memory of the landmark?s compass orientation, acquired at the 
training site, will differ from their experience of the landmark?s compass orienta-
tion at the testing site. The experimenter can gain insight into how the bees re-
123
spond to that conflict by observing their waggle dances when they?ve returned to 
the hive from the feeder under sunny or overcast conditions.
Gould
15
 performed a station shift experiment in an open field, in which a 
row of artificial markers led from the hive to the feeder. Under sunny conditions, 
the arrangement was suddenly rotated by about 30?. The bees located the feeder 
by flying along the row of markers. Over a period of about 40 min, the direction 
indicated by their waggle dances gradually shifted from the solar bearing of the 
feeder as learned during training to the new solar bearing. That the shift was 
gradual suggests that the permutation of the training setup resulted in a corre-
sponding permutation in how the bees represented that setup. In other words, it 
suggests that the bees altered their previously acquired information about the 
orientation of the extended landmark and the direction of the feeder. They did 
not simply acquire new, additional information.
Dyer
16
 performed a series of station shift experiments in which the orienta-
tion, at the test site, of a field?s edge, differed by 90? from the orientation, at the 
training site, of the corresponding field edge. Tests began with the hive being 
opened under heavy overcast. The bees found the feeder by flying along the 
field?s edge, as they would have done during training. As long as the sky re-
124
15
 Gould 1984.
16
 Dyer 1987.
mained overcast, their waggle dances indicated where the solar azimuth would 
have been if the hive had not been displaced, and thus they were off by about 90?.
Once the sky opened enough for the bees to use their solar compass, Dyer 
observed five distinct types of responses. Most bees immediately adjusted their 
waggle dances to indicate the correct solar bearing of the feeder and continued to 
indicate the correct bearing when the sky again became heavily overcast. These 
bees disregarded the orientation of the training site field edge and learned the 
orientation of the test site field edge. Their doing so is compatible with their ac-
quiring new, additional information as opposed to their altering previously ac-
quired information.
Some bees ignored the compass bearing of the visible sun for at least one 
trip. Sometimes many foraging excursions under bright sunlight were required 
before the bees? dances indicated the correct solar bearing of the feeding station.
Some bees, like those in Gould?s study, exhibited gradual reorientation. The 
solar bearing indicated by their dances progressively shifted from the one correct 
for the training site to the one correct for their current location. Again, this sort of 
response suggests that the permutation of the training setup resulted in a corre-
sponding permutation in how the bees represented that setup.
Some bees indicated the correct solar bearing in the presence of sunlight but 
indicated the incorrect, previously acquired solar bearing when the sky again 
125
became overcast. Also, some bees exhibited bimodal dances once the sun became 
visible. That is, they indicated both the old and new solar bearings on alternate 
waggle runs. The bees that exhibited one of these two responses must have 
stored their information about the original and new solar bearings separately. 
Further, the new and the old information could independently direct behavior.
Bimodal waggle dances are particularly intriguing. One possibility is that 
the mechanism responsible for producing the angle of the waggle run with re-
spect to gravity has simultaneous access to conflicting compass information but 
does not resolve the conflict. Another possibility is that the mechanism alter-
nately accesses the conflicting information but does not detect the conflict. In ei-
ther case, the ability of bees to perform bimodal dances would seem to require 
the ability to acquire respective items of information that ?predicate? two con-
flicting attributes to one and the same thing.
One way to investigate how bees deal with environmental changes on a 
smaller scale is to study how they modulate their learning flights under various 
conditions. To learn the precise locations of newly discovered foraging sites, bees 
will perform specialized learning flights on their departure. While facing the site, 
the bee slowly backs away, flying side-to-side in increasingly larger arcs roughly 
centered on the place of interest. Such a flight pattern is ideal for learning, via 
motion cues, the position of a site relative to nearby landmarks. The duration of a 
126
bee?s learning flight declines over subsequent visits to the same location, until 
the flight pattern is no longer performed.
Wei et al.
17
 (hereafter, Wei) examined the factors that influence changes in 
learning flight duration. He introduced bees to an inconspicuous feeder situated 
near a tetrad of black cylinders, which served as proximal landmarks. The feeder 
and cylinders were contained within an oblong arena having 0.5-m-high walls, 
which blocked most external landmarks from the bees? view while they were in-
side the apparatus. 
In one experiment, Wei measured the durations of the learning flights of in-
dividual bees on repeat visits, beginning with their initial departure from the 
feeder. The duration of learning flights gradually decreased until the amount of 
time from when a bee left the feeder to when it left the arena stabilized. Once de-
parture flight duration for an individual bee had stabilized, Wei imposed delays 
of various lengths between the time the bee arrived in the arena and the time it 
found the feeder. This was done by removing the feeder before the bee arrived 
and replacing it once the intended delay had been effected. Under natural condi-
tions, an increase in search time might be the result of changes in the appearance 
or location of the local landmarks used to pinpoint the goal.
127
17
 Wei et al. 2002.
Wei found that the bees increased the duration of their learning flights after 
an enforced delay. The longer the delay between arrival at the arena and location 
of the feeder, the greater the increase in learning flight duration. Post-delay 
learning flights were briefer and exhibited a more rapid duration decay rate than 
post-initial departure learning flights. Some possible factors other than prior 
learning on the modulation of learning flight duration were ruled out.
In another experiment, the entire formation of the cylinders and feeder was 
moved to a different place within the arena for each visit by an individual bee, 
beginning after their first departure flight. Consequently, the relationship be-
tween arena-external and -internal cues was altered. The longest learning flights 
of the bees tested under these conditions tended to occur on the second or third 
departure, whereas the longest learning flights of the bees tested under stable 
landmark conditions occurred on the first departure. This suggests that the bees 
that encountered a change in scene upon their return performed a learning flight 
longer in duration than the one they would have performed if the scene had re-
mained stable.
Wei also showed that learning flight durations increased when bees were 
introduced to a new, qualitatively similar feeding site having a higher sucrose 
concentration. The bees modulated their learning flight durations in accordance 
128
with the difference between the new concentration and the old one, rather than 
in accordance with the absolute value of the concentration. 
Barring the influence of unknown factors, Wei?s results indicate that bees 
update or acquire their information about feeding sites in light of their past expe-
rience. Their behavior in this case does not follow a rigid, inflexible pattern. They 
do not mindlessly repeat the original learning process in response to perceived 
differences, as if learning about the site for the very first time. The delay and 
moving configuration experiments indicate that bees integrate remembered and 
current information in response to certain changes. Further, comparison of the 
learning flight durations in those experiments suggests that bees are capable of 
modifying their information about a particular place in a way that corresponds to 
the changes that occur at that very place.
I should also note that Wei?s experiments provide examples of behavior that 
are difficult to explain without appealing to a notion like ?expectation? or ?pre-
diction?. The behavior thus reveals the operation of learning mechanisms that go 
beyond those of simple association. For it would seem that the bees modified 
their learning flights when they encountered a delay that was longer than ex-
pected or a location of the landmark array that was different than expected. This 
is just one example of why it is becoming increasingly difficult to explain honey-
bees? behavior in nonrepresentational terms (? 1.5).
129
Chittka et al.
18
 have provided a somewhat different sort of example with 
similar implications. They trained bees to a feeder in a flat, open area devoid of 
prominent landmarks, except for a car parked near the feeder They found that 
when the compass direction, from the hive, of the landmark-feeder configuration 
was slightly changed from its direction during training, but the relative positions 
of the landmark and feeder remained unchanged, bees did not correct their 
return-to-hive flight vectors. They flew along a vector that would have taken 
them directly to the hive during training. When the landmark remained in its 
training position, but the feeder was moved, the bees did correct their return-to-
hive flight vectors. As in Wei?s experiments, learning appeared to be influenced 
by the occurrence of something unanticipated, either an increase in search time, a 
change in the location of the feeder in relation to the landmark, or both.
19
6.2  Complex Structures: Sequences, Rules, and Maps
Honeybees are capable of learning about structures more complex than distance 
and direction relations between locations. They can learn vector sequences: or-
dered lists of flight segments, each of which specifies a certain direction and dis-
tance of travel. They can learn the correct path through a maze, their learning of 
which at least sometimes involves learning part of the maze?s structure. They can 
130
18
 Chittka et al. 1995b.
19
 Some experiments with desert ants have similar implications. See Wehner et al. 2002.
learn rules for negotiating mazes. There is even recent evidence that they can 
learn (derive) novel routes on the basis of stored place information and local 
cues, suggesting that some of the information they acquire can at least serve as a 
kind of map.
6.2.1  Vector Sequences
Honeybees and other insects are able to learn a specific, landmark-based route 
from the nest to a familiar foraging site and to learn a specific route from the for-
aging site back to the nest (the two routes might differ).
20
 The same insect typi-
cally takes the same routes each trip, while different insects may take different 
routes. Such learned routes are often complex, consisting of multiple segments of 
various distances and directions. Thus, a typical route might consist of segments 
such as: (1) a flight from the nest to a prominent landmark; (2) a flight from the 
vicinity of that landmark to another near the feeding site; and (3) a flight from 
the vicinity of the latter landmark to others very close to the site, with respect to 
which its location can be pinpointed.
21
 Also, it is not unusual for a honeybee to 
visit more than one foraging site on a single excursion. Bees that do so can learn a 
131
20
 Collett (T. S.) and Zeil 1998.
21
 Cartwright and Collett 1983; Chittka et al. 1992; Collett (T. S.) 1992, 1996.
specific route connecting them. When they return to the sites, they will visit them 
in the same order along the same ?trapline? route.
22
 
What do honeybees learn that enables them to follow complex routes? They 
do learn route segments; but how are the memories about different route seg-
ments related to each other? In particular, are they stored completely independ-
ently of one another, or are they somehow combined into a memory of the entire 
route? If the latter, then bees do not simply acquire sets of memories that happen 
to get triggered in the appropriate sequence; rather, they acquire memories of 
sequences. That would suggest that the content of such memories is complex, 
having as constituents memories of individual route segments. The experiments 
I?m about to discuss suggest that bees do in fact acquire memories of sequences.
23
 
I?ll argue (???7.1.2 and 7.2.1) that bees? capacities to acquire information about the 
different sorts of sequences discussed in this section and below (? 6.2.2.1) exhibit 
systematicity. This section and others (?? 6.2.3.2 and 6.2.4) also serve as the basis 
for an argument that some bee representational constituents play a semantic role 
akin to that of indexicals (? 7.3.3).
Collett et al.
24
 (hereafter, Collett) performed a series of experiments designed 
to provide insight into what honeybees learn when they learn complex routes. 
132
22
 Heinrich 1976, Janzen 1971, Kratzsch et al. 1998, Manning 1956.
23
 See also Chittka et al. 1995b.
24
 Collett (T. S.) et al. 1993.
One set of experiments specifically addressed the issue of whether bees learn an 
ordered list of flight vectors?that is, whether they learn something roughly like 
the following: first, fly n distance units in direction d; second, fly m distance units 
in direction d*; and so on. Collett trained bees to fly an obstacle course contained 
within a box (Fig. 6.1). The course required the bees to fly in a zig-zag pattern, 
through holes in transparent plexiglas partitions, in order to reach a sucrose re-
ward. The holes were very difficult for the bees to see, so they were sometimes 
marked with either small, black disks (just above them) or black rings (around 
them). The markers, however, were periodically removed during training to pre-
vent the bees from becoming too reliant upon them. The walls and floor of the 
133
Figure 6.1. Plan view of the principal train and test course configurations employed in Collett?s 
vector sequence experiments. The entrance and partition holes were 15 cm in diameter. The 
feeder entrance hole was 2 cm in diameter. Coordinates of the holes for training: entrance hole, 
(130, 30); first partition hole, (160, 60); second partition hole, (90, 120); feeder hole, (160, 180). Co-
ordinates of the entrance hole for displacement tests: (0, 90). Axes units are centimeters. e, en-
trance hole. f, feeder hole. Redrawn from Journal of Comparative Physiology A, vol. 172, 1993, pp. 
693?706, ?Sequence learning by honeybees,? Collett, T. S., Fry, S. N., and Wehner., R., Figure 1,
? Springer-Verlag 1993, with kind permission of Springer Science and Business Media.
box were white with random dark marks. The marks provided stabilizing visual 
cues (input for optic flow and perhaps some distance-to-wall information).
Tests were conducted with the plexiglas partitions removed. In some tests 
(standard tests), the entrance hole remained at its training location. In other tests 
(displacement tests), the position of the entrance hole was shifted, enough so that 
the bees might be able to detect some of the resulting differences in their location 
as they flew through the box (Fig. 6.1). The displacement tests were performed to 
see whether the bees would fly to specific places in the box, rather than princi-
pally rely on remembered vectors.
The standard-test trajectories turned out to be significantly different from the 
displacement-test trajectories. In standard tests, the locations of the bees? first and 
second turns inside the box were approximately the same as they had been dur-
ing training. In displacement tests, the resultant shift in each turn?s location, 
along each axis, was approximately the same as the amount of displacement of 
the entrance hole. Furthermore, in each sort of test, when the position of the first 
turn in an individual bee?s flight path differed from the correct location, there 
was a slight tendency for the position of the second turn to differ from the correct 
location by the same amount. The second flight segment, then, did not appear to 
correct for any inaccuracies in the first.
134
Thus, the results were consistent with the hypothesis that the bees learned 
an ordered list of vectors, and they were significantly different from what they 
would have been had the bees learned only to fly to specific locations in the box. 
In other words, the only apparent cause of the bees? adopting the second flight 
segment in the training sequence was the playing out of the first flight vector. So 
it would seem that the vector memories for the two flight segments must have 
been directly linked in some way. Similar results were obtained with a different 
zig-zag route and with a route consisting of two turns in the same direction.
Although the above experiments suggest that the bees relied principally on 
their memories of the appropriate flight vector sequence, Collett acknowledges 
that there probably were some relatively weak effects of place information on the 
bees? trajectories. Those effects could be explained in terms of slight differences 
between the visual scenes at the two partition hole locations. For although the 
panorama within the box was fairly uniform, a bee relatively close to one wall 
would quite likely experience a visual scene that was, in detail, a bit different 
from the one it would experience if it were close to the opposite wall. The black 
marks on the close wall would likely appear to be larger and more distinct than 
those on the opposite wall.
135
Collett
25
 recently investigated the effects of panoramic context on honeybees? 
performance of route flight segments. His results confirmed the hypothesis that 
bees are capable of learning vector sequences. He trained bees to a food reward 
situated within a channel that contained two landmarks between its entrance and 
the feeder?s entrance. The experiments made use of two types of landmarks, 
boundary landmarks and isolated landmarks. Boundary landmarks are sharp 
transitions between two perceptibly different panoramic contexts. An example of 
such a landmark would be anywhere along the border between an open meadow 
and a forest. Isolated landmarks are prominent, localized landscape features for 
which the panoramic context encountered before the feature (relative to a line of 
travel) appears similar to the context encountered after the feature. An example 
of this type of landmark would be a solitary tree in an area of grassland. 
Different groups of bees were trained with, respectively, two different types 
of channels (Fig. 6.2A). One type of channel contained two boundary landmarks, 
each of which was an abrupt change in wall pattern from randomly distributed 
black and white squares to alternating black and white vertical stripes. The first 
boundary occurred 1?m beyond the channel entrance, and the second occurred 
1?m beyond the first. The feeder entrance (a 10-mm round hole in one of the 
walls) was positioned 1?m beyond the second boundary. The other type of chan-
136
25
 Collett (M.) et al. 2002.
nel contained one boundary landmark and one isolated landmark, a baffle 
through which the bees flew. The boundary occurred 1?m beyond the channel 
entrance, and the baffle was situated 1?m beyond the boundary. The feeder en-
trance was another meter past the baffle. Throughout training, the channels ex-
tended at least 4?m beyond the feeder entrance (the entrance?landmark?feeder 
configuration was regularly moved along the channel walls in order to control 
for various cues).
137
Figure 6.2. Train and test configurations in Collett?s channel experiments. Wall patterns are 
shown together with the locations of boundary landmarks (open triangles), baffles (filled trian-
gles), and the feeder entrance (filled arrows; open arrows indicate the training position of the 
feeder in relation to the last landmark). (A) Training configurations for bees trained with two 
boundary landmarks (top) and with one boundary landmark and one baffle (bottom). (B) Test 
configurations for boundary-only-trained bees with the distance from the channel entrance to the 
first boundary increased by 1?m from the training distance (top) and for baffle-trained bees with 
the distance from the entrance to the boundary increased by 2?m from the training distance (bot-
tom). (C) Test configurations for boundary-only-trained bees (top) and baffle-trained bees (bot-
tom) with the distance between the landmarks increased by 1?m from the training distance. (D) 
Test configurations for boundary-only-trained bees (top) and baffle-trained bees (bottom) with 
the final landmark removed. Adapted from various figures in Collett et al. 2002.
Collett performed three series of tests. For each test, the wall segment that, 
during training, contained the feeder entrance was replaced with an identical 
segment that did not contain a hole. In one series of tests, the relative positions of 
the landmarks remained as they were during training (Fig. 6.2B). The distance 
from the channel entrance to the first landmark either was the same as in training 
or was increased. For all tests in this series, and regardless of the types of land-
marks employed, bees searched at the training distance from the final landmark. 
That they did so, regardless of the distance from the channel entrance to the first 
landmark, confirmed earlier findings
26
 that bees? searches are sometimes con-
trolled by a local vector extending from a particular landmark to the place, rela-
tive to that landmark, where the goal had been.
In the second series of tests, the second landmark was placed 2?m (rather 
than 1?m) beyond the first landmark, where the feeder entrance had been relative 
to the first landmark during training (Fig. 6.2C). Bees trained with two boundary 
landmarks searched at the training distance from the final landmark, as they had 
done in the previous series of tests. Bees trained with the baffle, however, exhib-
ited a search pattern centered at the baffle. The baffle, then, did not activate a 
baffle-to-goal vector. Still, like the bees trained with two boundary landmarks, 
138
26
 Srinivasan et al. 1997.
they did search at the appropriate location in relation to the final (and only) 
boundary landmark. 
In the final series of tests, the second landmark was removed (Fig. 6.2D). 
Bees trained with the baffle searched about 2?m after the boundary landmark (at 
the appropriate location in relation to that landmark), as they had done in the 
second series of tests. Their search, however, was a bit broader, and its focus was 
a bit less well defined, than the searches of baffle-trained bees in the first series of 
tests. Adding 2?m to the distance between the channel entrance and the boundary 
shifted the focus of the search farther into the channel, well beyond where it 
should have been had the bees been guided by an estimate of the distance of the 
feeder from the hive or from the channel entrance (rather than an estimate or es-
timates in some way related to the boundary). Bees trained with only boundary 
landmarks exhibited a search which was much less constrained than that of the 
baffle-trained bees. It also lacked a well-defined focal point. Clearly, though, the 
focus of their searches was well past the correct location. Most of the bees flew 
until they were close to the end of the channel before turning back.
So how are the above results pertinent to the issue of whether the bees 
learned a sequence of vectors? The results for boundary-only-trained bees, by 
themselves, are consistent with their having learned independent flight seg-
139
ments, each associated with one of the boundary landmarks.
27
 They might have 
learned, in effect, merely to fly to the second boundary upon encountering the 
first, and to fly to the feeder upon encountering the second. Perhaps their flight 
segment memories were triggered by only their having seen the landmarks asso-
ciated with them, with the role of active local vectors having been only to sup-
press the vector recall system until they were played out. In short, their perform-
ance, taken alone, could be explained in terms of their having relied upon memo-
ries recalled in sequence as opposed to a recalled memory of a sequence.
The performance of the baffle-trained bees, however, cannot be explained in 
quite the same fashion.
28
 For, unlike the boundary-only-trained bees, they did not 
use a landmark-to-feeder vector upon encountering the second landmark (for 
them, the baffle) in tests in which it was moved (Fig. 6.2C). Nor did they prema-
turely search near the training location of the baffle (relative to the first land-
mark) in tests in which it was moved or removed, even though (i) the available 
visual stimuli at that location were more consistent with the feeder?s training 
location than the baffle?s training location (Fig. 6.2C,D), (ii) the learned land-
mark-to-feeder vector couldn?t have been triggered by the baffle (which wasn?t 
there), and (iii) there consequently wasn?t a baffle-triggered active vector that 
140
27
 Collett does not explicitly make this point, though I assume he would accept it. He says nothing 
to the contrary.
28
 This claim is not explicit in Collett (M.) et al. 2002. However, it?s truth, or the truth of another  
claim to the same effect, appears to be necessary in order for their argument to go through.
could have suppressed a search for the goal. Moreover, it is unlikely that the 
baffle-trained bees employed either only a baffle-to-feeder vector or only a 
boundary-to-baffle vector. Otherwise, in one or more tests, they presumably 
would have searched at a much shorter distance from the boundary than they in 
fact did.
What remains to be determined, then, is whether the baffle-trained bees re-
lied upon a memory of a single boundary-to-feeder vector or upon a memory of 
a sequence of vectors (from the landmark to the location of the baffle, then from 
the location of the baffle to the location of the feeder entrance). The former alter-
native is compatible with the bees? having learned the difference between the 
global coordinates of the boundary and those of the feeder entrance.
Collett argues that the baffle-trained bees relied upon a remembered se-
quence of vectors. First, when the test configuration of landmarks was the same 
as in training (the first series of tests), the searches for the two groups of bees 
were quite similarly focused. This suggests that each group relied upon a vector 
from the second landmark to the feeder location. For spread of search is posi-
tively correlated with local-vector length.
29
 That the baffle-trained bees relied 
upon a baffle (location)-to-feeder vector is further supported by the fact that their 
141
29
 Srinivasan et al. 1997.
searches were much more focused when the baffle was at its training location (in 
relation to the boundary) than when it was removed.
Second, the searches of bees trained with only boundary landmarks were 
not controlled by a single, remembered local vector from the first boundary 
landmark to the feeder entrance. Otherwise, in tests in which the second bound-
ary landmark was shifted to the location (relative to the first landmark) of the 
feeder entrance during training, they would have searched at the second bound-
ary landmark. Instead, they searched at the trained distance from that landmark. 
The absence, in boundary-only-trained bees, of the operation of a single vector 
from the first landmark to the feeder strongly suggests the absence of the opera-
tion of such a vector in baffle-trained bees. The only remaining alternative is that 
their searches were produced by a recalled sequence of local vectors.
Collett considers his major finding to be that, in every test, the two groups of 
bees searched at the trained distance along the panoramic context that contained 
the feeder. Furthermore, the only case in which bees did not search at the trained 
distance from the last-encountered boundary landmark was when boundary-
only-trained bees were tested with the final training landmark removed. In that 
test, the panoramic context of the feeder occurred nowhere along the channel. As 
Collett points out, this suggests that the correct panoramic context is necessary 
for activation of the appropriate local vector.
142
Panoramic contexts, though, cannot be relied upon to precisely specify loca-
tions. They are, by definition, much the same over a wide area. And for baffle-
trained bees, in every test, the panoramic context was the same from just past the 
boundary landmark to near the end of the channel. (Recall also that, in Collett?s 
vector sequence experiments, the panorama was fairly uniform throughout the 
box.) Furthermore, there is no reason to think that any other sensory information 
relevant to where the baffle should have been in tests was acquired by the bees at 
or near that point. It is highly likely, then, that the principal cause of the activa-
tion of the baffle-trained bees? baffle (location)-to-feeder vector was the playing 
out of their boundary-to-baffle vector. Again, it appears that those vectors must 
have been connected in memory.
6.2.2  Maze Learning
The capacity of honeybees to learn to correctly negotiate various sorts of mazes 
has implications regarding what they are able to represent. Several studies sug-
gest that bees are able to represent maze configurations, or sequences of sensory 
stimuli or motor commands. Other studies suggest that bees can represent rules 
for navigating mazes.
143
6.2.2.1  Configurations and Sequences
Honeybees can learn to fly the correct path through a maze containing several 
decision points, without the help of markers to guide them. Zhang et al.
30
 (here-
after, Zhang) successfully trained bees to follow the correct path through one of 
either of two such mazes (Fig. 6.3).
Zhang took his results to suggest that the bees learned either the spatial lay-
out of the maze or the sequence of the correct turns through it. Unfortunately, for 
144
30
 Zhang et al. 1996.
Figure 6.3. Two mazes used by Zhang in his maze learning experiments. The width and height of 
each box was 30 cm. Every box had four 4-cm diameter holes, one in the center of each wall, with 
one or two of the exit holes blocked. The interior walls were effectively textureless. Boxes with 
two exit holes (decision boxes) are numbered. A solid line indicates the correct path through the 
maze; a broken line indicates an incorrect path. e, maze entrance; f, feeder. Reprinted from Neuro-
biology of Learning and Memory, vol. 66, Zhang, S. W., Bartsch, K., and Srinivasan, M. V., ?Maze 
learning by honeybees,? 267?282, ? Copyright 1996, with permission from Elsevier.
my purposes, the results in question can be explained without appealing to the 
bees? having learned either sort of structure.
First, consider maze 1
31
 (Fig. 6.3). Decision boxes 1 and 3 have the same 
compass orientation and the same exit hole locations. Also, the correct turn is to 
the left in decision box 1 and to the right in decision box 3. So it is true that bees 
repeatedly navigating the maze without making errors (at above-chance levels) 
would require their having information which enables them to treat the two 
boxes differently. As we are about to see, however, the required information 
could be in the form of a sequence of memories, rather than a memory of either a 
sequence or a spatial layout.
Notice that the entrance to the maze, which is also the entrance to decision 
box 1, could be taken, not implausibly, to be a boundary landmark, marking the 
transition from the panorama of the lab to the panorama of the box?s interior. On 
the other hand, the entrance to decision box 3 does not mark a transition of dis-
tinct panoramas. Furthermore, decision box 2 is in detail visually distinct from 
the other boxes, in that one of its exit holes is to the left and the other is straight 
ahead. So the bees? performance could be explained by their having learned to do 
the following:
To turn left just after encountering the lab?maze boundary landmark.
145
31
 What I call maze 1 is called path 2 in Zhang et al. 1996.
Upon entering a box having a single exit hole, to fly through it.
Upon entering a box having an exit hole to the left and one straight ahead, 
to fly through the one straight ahead.
Upon entering (without encountering a boundary landmark) a box having 
an exit to the left and one to the right, to turn right.
Moreover, it would seem that the above acquired information need not be linked 
in memory in order for the bees? to correctly navigate the maze.
Second, consider maze 2
32
 (Fig. 6.3). Decision box 3 is in detail visually dis-
tinct from the other boxes in that one of its exit holes is to the left and the other is 
straight ahead. Also, each of the remaining decision boxes differ in compass ori-
entation. Since it is known that honeybees learn the compass orientation (relative 
to their line of flight) of landmarks along a route,
33
 it is not implausible that the 
bees in Zhang?s experiment learned the respective compass orientations of the 
relevant boxes in the maze. So the bees? performance could be explained by their 
simply having associated the appropriate behaviors with the relevant visual 
stimuli and compass information. They need not have acquired a memory of a 
sequence or of a spatial layout.
For one of the experiments with maze 1, Zhang did control for compass in-
formation by frequently rotating it during training. However, as we?ve seen, 
146
32
 What I call maze 2 is called path 9 in Zhang et al. 1996.
33
 Cartwright and Collett 1983, Collett (T. S.) and Baron 1994, Dickinson 1994, Dyer 1987, Gould 1984.
compass information is not required to reliably negotiate maze 1. Compass in-
formation could be crucial for learning maze 2. But Zhang did not control for 
compass information with maze 2; he controlled only for odors by exchanging 
some of the boxes for one test.
I?ve assumed that, for the bees in Zhang?s experiments, open holes were 
visually distinct from blocked holes. This assumption could be questioned, since 
it appears that blocked holes were covered on only the exterior surfaces of the 
boxes. Zhang, though, does not address this issue. Nor does he relate how often, 
if ever, bees attempted to fly through blocked holes. If, however, my assumption 
is incorrect, then his claim that his results suggest that the bees learned either the 
spatial layout of the maze or the sequence of the correct turns through it would 
become more plausible. I would welcome that outcome, for we would then have 
a plausible case of bees? having learned another kind of complex structure, in 
addition to vector sequences.
In fact, Collett
34
 has provided such a case, one in which honeybees appear to 
have acquired information about a maze?s configuration.
35
 His experiments not 
only support the claim that bees can acquire semantically complex information 
(and that the relevant capacities exhibit systematicity) but also raise the possibil-
ity that bees are capable of transitive reasoning (? 7.4).
147
34
 Collett (T. S.) et al. 1993.
35
 Pastergue-Ruiz and Beugnon (1994) obtained similar results with ants.
Collett trained honeybees to negotiate a relatively simple, maze-like appa-
ratus comprised of three boxes, placed end to end (Fig. 6.4). For most of the ex-
periments, the training configuration was as follows. Two distinct patterns were 
fixed to the back wall of each box, one on the left and one on the right. A hole 
2?cm in diameter occurred in the center of each pattern. The hole in one of the 
two patterns (the positive stimulus) led either to the next box (if any) or to a su-
crose reward. The hole in the other pattern (the negative stimulus) led to a small, 
blocked-off compartment. The left ?right positions of the patterns were frequently 
switched, whereas the same two patterns always occurred in the same boxes 
(and no pattern occurred in more than one box). Thus, a bee could learn to fly 
148
Figure 6.4. Plan views of sample train (top) and test (bottom) configurations of the apparatus in 
Collett?s visual-sequence learning experiments. Bees flew from left to right. Each box was 40 cm 
high, 60 cm wide, and 50 cm long. The entrance to the first box was 5 cm in diameter. All patterns 
were 25 by 25 cm. The walls and floor of each box were white, with random dark marks. W, 
white; Blk, black; Y, yellow; Blu, blue; H, black?white horizontal stripes; V, black?white vertical 
stripes; +, positive stimulus, ?, negative stimulus. Training configuration redrawn from Journal of 
Comparative Physiology A, vol. 172, 1993, pp. 693?706, ?Sequence learning by honeybees,? Collett, 
T. S., Fry, S. N., and Wehner., R., Figure 10, ? Springer-Verlag 1993, with kind permission of 
Springer Science and Business Media.
directly through the apparatus to the reward only if it learned, for each box, 
which of the two patterns was the positive stimulus.
In tests, the negative pattern in one of the boxes was replaced with the posi-
tive pattern from one of the other boxes, resulting in a box having two positive 
stimuli. The remaining patterns were left unchanged (Fig. 6.4). As in training, the
left ?right positions of the patterns were frequently switched, so that each pattern 
was on a particular side of the back wall for half of the trials.
In one experiment, the pairs of training patterns in the front, middle, and 
back boxes were, respectively, white paper (positive) and black paper (negative), 
blue paper (positive) and yellow paper (negative), and black?white vertical 
stripes (positive) and black?white horizontal stripes (negative). After the bees 
had learned which pattern of each pair identified the way to the reward, they 
were tested with both the white pattern and the vertical pattern (positive for the 
front box and back box, respectively) in either the front box or the back box. The 
bees preferred the white pattern in the front box, and they preferred the vertical 
pattern in the back box. Similar results were obtained in four other tests that 
paired the positive stimuli from the front and back boxes. 
Results were different when bees were tested in the middle box, with the 
positive stimulus of that box set beside the positive stimulus from one of the 
other boxes. For all such tests, the bees either preferred the positive stimulus 
149
from the front box or the back box or showed no preference. Nonetheless, the 
way they treated a pair of test stimuli in the middle box was different from the 
way they treated the same pair of test stimuli in one of the other boxes. In three 
of six experiments, the preference for the positive stimulus of either the front box 
or the back box, when bees were tested in one of those boxes, was significantly 
stronger than it was when bees were tested in the middle box.
The results of Collett?s experiments, then, suggest that the bees learned the 
order in which they encountered the relevant positive stimuli. They certainly did 
not learn merely to fly through the opening in any positive pattern. 
Furthermore, Collett attempted to gain insight into what cues told the bees 
where they were in the sequence. A ?box swapping? experiment ruled out the 
possibility that the bees discovered or created differences among the boxes them-
selves, independent of their position in the series. And the following experiment 
told against the possibility that the bees simply associated the positive stimulus 
in (or the appearance of) one box with the positive stimulus in the next.
Bees were trained with yellow paper marking the entrance to the boxes 
(which was always on the left), white (positive) and black (negative) in the first 
box, blue (marking the only exit and always on the right) in the second, and ver-
tical (positive) and horizontal (negative) in the third (Fig. 6.5). As in the other ex-
periments, the left ?right positions of the positive and negative stimuli were fre-
150
quently switched. In two respective tests, bees chose between white and vertical 
in the front box and the back box. As expected, they preferred white in the front 
box and vertical in the back box. In a further test, bees chose between white and 
vertical in the middle box (Fig. 6.5). The back box remained the same as in train-
ing, whereas the front box was made to look as similar as possible to the middle 
box in training, with blue on the right marking the only exit. Nonetheless, the 
bees preferred white in the middle box and vertical in the back box. They did not, 
then, simply associate the perceived characteristics of the middle box in training 
with the succeeding, vertical positive stimulus.
Collett did not discuss the possibility that the bees (in this experiment) 
learned to correctly navigate the apparatus by their having associated only the 
151
Figure 6.5. Plan views of the train (top) and one of the test (bottom) configurations of the appa-
ratus for Collett?s ?blue?single exit? sequence learning experiment. For further details, see the 
caption to Figure 6.4. W, white; Blk, black; Y, yellow; Blu, blue; H, black?white horizontal stripes; 
V, black?white vertical stripes; +, positive stimulus, ?, negative stimulus. Adapted from Journal of 
Comparative Physiology A, vol. 172, 1993, pp. 693?706, ?Sequence learning by honeybees,? Collett, 
T. S., Fry, S. N., and Wehner., R., Figure 10, ? Springer-Verlag 1993, with kind permission of 
Springer Science and Business Media.
global or local positions of the boxes with the appropriate positive stimuli. That 
is, one possible explanation of the results is that the bees associated certain 
ranges of distance?their estimates of their distance from, say, the entrance of the 
apparatus?with the respective correct choices. In other words, they might have 
associated the location of the middle box in training with the succeeding, vertical 
positive stimulus.
The results, however, count against this sort of explanatory hypothesis as 
well. It has four possible versions.
(1) The bees? associated their (local or global path integration) coordinates 
with the pattern positive for the currently occupied box, and,
(a) upon entering the front box (testing), they reset their coordinates to 
those appropriate to the middle box (training), or,
(b) they did not reset their coordinates.
(2) The bees? associated their coordinates with the pattern positive for the 
box (if any) which came just after the currently occupied box, and,
(a) upon entering the front box (testing), they reset their coordinates to 
those appropriate to the middle box (training), or,
(b) they did not reset their coordinates.
Every version is consistent with the bees? having flown through the blue-marked 
opening in the front box, it having been the only available alternative. Every ver-
sion is consistent as well with the bees? having chosen vertical over white in the 
152
back box. But it is difficult to reconcile any version with the bees? having pre-
ferred white over vertical in the middle box.
Hypotheses (1a) and (2a) maintain that the bees reset their coordinates upon 
entering the front box. Thus, on (1a), when the bees were in the middle box (test-
ing), their coordinates would have been appropriate to the back box (training). 
Since (1a) requires the bees to have chosen the positive pattern for the box they 
took themselves currently to be in, they should have chosen, in the middle box, 
the positive pattern for the back box. That is, they should have preferred vertical 
over white, contrary to their actually having preferred white. On hypothesis (2a), 
when the bees were in the front box (testing), their coordinates would have been 
appropriate to the middle box (training). Bees in the front box, then, would have 
taken the subsequent box (in reality, the middle box) to be the back box. Since 
(2a) requires the bees to have chosen, in what they took to be the subsequent box, 
the positive pattern for that box, they should have chosen, in the middle box, the 
positive pattern for the back box. Again, they should have preferred vertical over 
white, contrary to their actually having preferred white.
Hypotheses (1b) and (2b) maintain that the bees did not reset their coordi-
nates upon entering the first box. On either hypothesis, then, the bees were 
highly likely to have been correct about which box they were in. Thus, on (1b), 
bees in the middle box would have taken themselves to be in a location interme-
153
diate with respect to that appropriate for choosing white (the front box) and that 
appropriate for choosing vertical (the back box). In that case, they should not 
have shown any significant middle-box pattern preference. Hypothesis (2b) 
would have predicted the same result. The bees would have associated their first-
box location with exiting the second box through the blue pattern. In tests, when 
they arrived in the middle box, no blue pattern was present. So, considering just 
the hypothesis in question, it should not have made any difference to them 
which middle-box pattern to choose.
I hope to have established the plausibility of the possibility that the bees in 
Collett?s experiments acquired a memory of a sequence (the box-to-box sequence 
of positive stimuli) rather than behaved in accordance with sequentially recalled 
memories. For example, in the experiment just examined, they may have stored a 
representation having a content somewhat analogous to [white, then blue, then 
vertical] or [white before blue and blue before vertical].
36
 We thus have another 
plausible case of bees? having learned a kind of complex structure.
154
36
 If in fact this is correct, then there is a possibility that the bees? having preferred white when 
tested in the middle box was a result of a kind of reasoning process. From ?white before blue and 
blue before vertical,? say, the bees might have derived ?white before vertical.? Of course, it is also 
possible that the bees independently learned ?white before vertical.? More on the possibility of 
reasoning in honeybees will be presented below (? 7.4).
6.2.2.2  Rules
Zhang
37
 performed maze experiments in addition to those described in the pre-
ceding section. For many experiments, he trained honeybees to correctly negoti-
ate mazes (such as those shown in figure 6.3) by following marks of a particular 
color. He found, for example, that bees trained on one maze with one color are 
able to accurately navigate a differently configured maze by following either 
marks of the same color or marks of a different color. He also found that bees so 
trained are able to negotiate an identically configured maze without marks. Their 
performance is less accurate than it is in the case of marked mazes, but it is still 
significantly more accurate than the performance of controls.
My intent is not to evaluate or examine the implications of the experiments 
just mentioned. Instead, I focus on another of Zhang?s maze experiments. Using 
maze 3 (Fig. 6.6),
38
 he trained bees to turn right when the wall opposite the en-
trance to a compartment was blue and to turn left when that wall was green. The 
only nonmarked compartments were those both having a single exit hole and not 
requiring a turn. Tests were carried out with mazes 3, 4, and 5 (Fig. 6.6). The test 
with maze 4 was performed immediately after the test with maze 3, and the test 
with maze 5 was performed immediately after the test with maze 4. 
155
37
 Zhang et al. 1996.
38
 What I call mazes 3?5 are called paths 6?8, respectively, in Zhang et al. 1996.
The bees performed very well in every test, and their levels of performance 
in the three tests did not significantly differ from one another. The percentages of 
error-free trials were 92.2% for maze 3, 97.7% for maze 4, and 93.2% for maze 5. 
Two explanations of these results readily come to mind. One is that the bees 
simply associated turning left with green and turning right with blue (and con-
tinuing straight ahead with the color of the bare walls). The other is that the bees 
learned a (nonassociative) rule that caused them to go through the hole right of 
the colored wall when it was blue and to go through the hole left of the colored 
wall when it was green. Zhang?s results do not help us to decide between these 
alternatives.
A key feature of a rule, as I here employ the notion, is that it allows its pos-
sessor to generalize over a broad range of different stimuli, where that range in-
156
Figure 6.6. The maze configurations used by Zhang to test the ability of bees to turn left or right 
in response to color cues. A solid line indicates the correct path through the maze; a broken line 
indicates an incorrect path. For further details, see the caption to Figure 6.3. Reprinted from Neu-
robiology of Learning and Memory, vol. 66, Zhang, S. W., Bartsch, K., and Srinivasan, M. V., ?Maze 
learning by honeybees,? 267?282, ? Copyright 1996, with permission from Elsevier.
cludes stimuli that bear no apparent resemblance to those included in the train-
ing set. Ruled-based generalization, then, is different than association-based gen-
eralization, in that the latter involves generalizing only over stimuli that are 
similar to those used in the training set. As remarked in the first paragraph of 
this section, bees trained to negotiate mazes by following marks of a single color 
did at least appear to exhibit some ability to generalize beyond the training con-
ditions. But Zhang did not perform experiments designed to assess whether or 
not the bees had the ability to generalize beyond the training conditions of the 
experiment currently in question. 
On the other hand, Giurfa et al.
39
 (hereafter, Giurfa) did perform simple-
maze experiments which showed that honeybees are indeed able to generalize a 
learned task to novel, dissimilar stimuli. Thus, it is likely that they acquired a 
rule, rather than an association. As we?ll see in the next chapter, Giurfa?s results 
are relevant to the issue of whether different types of bee representations have 
different semantic roles (? 7.3.2). They also bear on whether bees implement rules 
that operate on the values of variables (? 7.5) and on whether some honeybee 
cognitive processes are sensitive to the constituent-structure of the representa-
tions on which they operate (?? 7.4 and 7.5).
157
39
 Giurfa et al. 2001.
In the first stage of Giurfa?s experiments, he successfully trained six respec-
tive groups of bees to solve four delayed matching-to-sample tasks and two de-
layed non-matching-to-sample tasks. A Y-maze served as the experimental appa-
ratus (the configuration of the maze for one experiment is shown in Figure 6.7). 
In the delayed matching-to-sample experiments, the bees encountered one of a 
pair of stimuli (the sample stimulus) at the maze entrance. (Which of the two 
served as the sample was varied.) The entrance arm of the maze ended at a 
chamber in which the bees had to decide between the two remaining arms. One 
arm contained the sample, or matching, stimulus, while the other arm contained 
the nonmatching stimulus. The bees were rewarded only if they chose the arm 
which contained the matching stimulus. (Which arm served as the ?matching? 
arm also was varied.) The training procedure for the delayed non-matching-to-
158
Figure 6.7. Configuration of the Y-maze use by Giurfa in a delayed matching-to-sample experi-
ment in which bees were trained with odors and tested with colors. The odors were presented by 
means of odorant-soaked tissues in perforated vials. Exhaust fans prevented odor mixing in the 
decision chamber and removed feeder odors. Baffles prevented the bees from experiencing the 
stimuli present in a chamber until they had entered it. In the transfer test, the scented vials were 
replaced with visually identical, odorless vials. b, baffles; c, colors; d, dummy vials; e, entrance; o, 
odor vials; f, feeder; x, exhaust fan. (Adapted by permission from Macmillan Publishers Ltd: Na-
ture, vol. 410, pp. 930?933, Giurfa, M., Zhang, S., Jenett, A., Menzel, R., and Srinivasan, M. V., 
?The concepts of ?sameness? and ?difference? in an insect,? ? Copyright 2001.)
sample experiments was the same, except the bees were rewarded only if they 
chose the arm which contained the nonmatching stimulus.
In each experiment, after the bees had learned the relevant discrimination, 
Giurfa performed a test to determine whether or not the bees would transfer 
what they had learned to a pair of novel stimuli. The pairs of train and transfer 
test stimuli used in the experiments are given in Table 6.1. The levels of perform-
ance of the bees in transfer tests were about the same as the respective levels of 
performance they had achieved in training.
40
 Thus, the bees not only learned the 
159
40
 The one exception was experiment 3, in which bees were trained on radial and circular gratings 
and tested on oriented (45? and ?45?) linear gratings. Nonetheless, the bees? preference for the 
appropriate test grating was highly significant (P < .001).
Experiment
Stimulus Pairs
Train Test
Experiment 1 Blue
Yellow
Vertical grating
Horizontal grating
Experiment 2 Vertical grating
Horizontal grating
Blue
Yellow
Experiment 3 Radial grating
Circular grating
Oriented (45?) linear grating
Oriented (?45?) linear grating
Experiment 4 Lemon odor
Mango odor
Blue
Yellow
Experiment 5 Blue
Yellow
Vertical grating
Horizontal grating
Experiment 6 Vertical grating
Horizontal grating
Blue
Yellow
Table 6.1. Stimulus pairs used in Giurfa et al.?s delayed matching-to-sample (1?4) and delayed 
non-matching-to-sample experiments (5 and 6). All gratings were black?white.
matching and nonmatching tasks but also transfered what they learned to novel 
stimuli. Furthermore, they exhibited transference not only between different 
sorts of visual stimuli but also from olfactory stimuli to visual stimuli.
Giurfa took his results to strongly suggest that honeybees have the capacity 
to acquire (or make use of) sameness and difference concepts. Depending on 
what one means by ?concept,? that may or may not be the case. For example, it?s 
not clear that the bees? could have solved the tasks only if they had made use of 
representations with the content [same] or [different]. What Giurfa?s results more 
clearly suggest is that the bees acquired a rule that operates on a variable. In par-
ticular, in the ?matching? transfer tests, they seem to have made use of a rule 
something like, ?Choose the x-marked arm if x was at the entrance,? where x 
ranges over (at least) colors, patterns, and odors. It?s possible that neither the ac-
quisition nor the execution of such a rule requires explicit judgments about 
whether what is now present is the same as what was present at the entrance.
The bees? exhibited capacity to generalize to novel stimuli is key. They cer-
tainly did not simply associate which of two specific stimuli occurred at the maze 
entrance with the reward arm of the maze. Otherwise, they would not have been 
able to transfer their learning across different sorts of visual stimuli, much less 
across different sensory modalities. One might be tempted to suggest that the 
bees associated ?whatever stimulus? was (or was not) at the entrance with the 
160
reward arm. But it should take at most only a bit of reflection to see that ?what-
ever stimulus? is a variable. The question then arises what it could be to ?associ-
ate? a variable with something, if not to acquire a rule that operates on a variable.
6.2.3  Novel Shortcuts and Vector Averaging
Displacement experiments are those in which bees are captured at the hive or a 
foraging site, transported to a familiar or unfamiliar location, and then released. 
Their subsequent course is then recorded. Such experiments have proven useful 
for revealing what bees are capable of learning about the layout of their foraging 
territory. They have also proven useful for illuminating some of the ways in 
which bees? current motivations interact with their current sensory information 
and their recalled and stored locational information. 
6.2.3.1  Novel Shortcuts to the Hive
Menzel et al.
41
 (hereafter, Menzel) performed a series of experiments that demon-
strated (among other things) the capacity of honeybees to take a novel route 
when displaced to an unfamiliar location. Menzel accounts for his results by ar-
guing that the novel-shortcut bees averaged known site-to-hive vectors to obtain 
a novel site-to-hive vector. I argue that Menzel?s hypothesis is indeed the best 
explanation of his results (? 6.2.3.2). If that?s the case, then it appears that bees are 
161
41
 Menzel et al. 1998.
capable of performing operations defined over certain semantic constituents of 
complex representations (? 7.4). Menzel?s results also have implications regard-
ing the circumstance- and motivation-independence of certain bee representa-
tional constituents (? 7.1.2). The Classicist can explain such independence by 
positing context-independent syntactic constituents.
Menzel trained bees to forage at two feeding stations, one in the morning 
and the other in the afternoon (Fig. 6.8). The area chosen for the experiments was 
unfamiliar to the bees, and there were no natural food sources in the regions 
around or between the hive, the feeding sites, and the two release-only sites. 
162
Figure 6.8. A map of the area chosen by Menzel for his displacement experiments. The landscape 
was dominated by a large, cone-shaped hill, surrounded by flat farmland. Sm, morning site; Sa, 
afternoon site; S3, Site 3;, S4, Site 4. Reprinted from Animal Behaviour, vol. 55, Menzel, R., Geiger, 
K., Jourges, J., M?ller, U., and Chittka, L., ?Bees travel novel homeward routes by integrating 
separately acquired vector memories,? pp. 139?152, ? Copyright 1998, with permission from El-
sevier and Randolf Menzel.
Consequently, the trained routes were the only routes established by the bees in 
the experimental area. 
The morning site was situated within an area of harvested agricultural 
fields, with no apparent local landmarks within a 150-m radius. The afternoon 
site was about 60 m from a low bush, which was visible to the bees. Site 3, a 
release-only location, was situated within an area of uniform grassland. A clump 
of trees and a few scattered trees should have been just visible to bees at the spot. 
Site 4, another release-only location, was in a pasture. A row of bushes along a 
creek and some scattered tall trees were nearby landmarks.
In experiment 1, bees were captured at the hive upon arrival from one of the 
feeding sites. The bees were expected to be motivated to get back to the hive to 
discharge their foraging load. Bees arriving from the morning site were displaced 
to either the afternoon site or Site 3. Bees arriving from the afternoon site were 
transferred to either the morning site, Site 3, or Site 4. Bees (controls) that had 
visited only the afternoon site were transported to Site 3. Sites 3 and 4 were very 
unlikely to have been visited by the bees. As in all of the experiments, all bees 
were released within 20 min of capture. The direction in which a bee departed 
from a release site was estimated by recording its vanishing bearing, or the com-
pass direction of its flight at the point at which it disappeared from view.
163
The bees which had learned the locations of both feeding sites and were dis-
placed to the morning site, the afternoon site, or Site 3 flew toward the hive upon 
release (Fig. 6.9). The bees displaced from the afternoon site to Site 4 flew in the 
direction that would have taken them from the afternoon site to the hive. The 
bees which had visited only the afternoon site, when released at Site 3, also flew 
in the direction that would have taken them from the afternoon site to the hive.
The results showed that bees familiar with the two feeding sites could recall 
the homeward vector, learned at a different time of day, appropriate to the feed-
164
Figure 6.9. Distributions of vanishing bearings of bees captured at the hive upon arrival from the 
afternoon site (open circles) and from the morning site (filled circles), in Menzel?s vanishing 
bearing, displacement study. Thick arrows indicate the means of the distributions. Thin arrows 
indicate specific headings that the bees might have adopted. H, hive; Sa, afternoon site; Sm, 
morning site; S3, Site 3; S4, Site 4. Reprinted from Animal Behaviour, vol. 55, Menzel, R., Geiger, K., 
Jourges, J., M?ller, U., and Chittka, L., ?Bees travel novel homeward routes by integrating sepa-
rately acquired vector memories,? pp. 139?152, ? Copyright 1998, with permission from Elsevier 
and Randolf Menzel.
ing site to which they had been transported. Also, the group of such bees re-
leased at Site 3 took a novel shortcut from there to the hive. Landmarks near the 
hive were thought to be imperceptible to bees at Site 3 (based on what is known 
about their visual resolution). The fact that the bees which had visited only the 
afternoon site adopted the afternoon-site-to-hive direction when released at Site 3 
controlled for both the possibility that the novel-route bees steered toward a bea-
con near the hive and the possibility that they relied on learned-route-associated 
landscape features. It also suggests that having learned the two feeder?hive 
routes was necessary for having been able to take the novel shortcut. The novel-
route bees, then, must have somehow combined the two respective route memo-
ries. In addition, the fact that the bees released at Site 4 adopted the afternoon-
site-to-hive direction suggests that bees don?t combine route or vector memories 
whenever they are released at an unfamiliar location.
In experiment 2, bees were captured at the hive when they were about to 
depart to one of the feeding stations. The bees were expected to be motivated to 
get to the feeding site appropriate to the period of time, morning or afternoon, 
during which they were captured. They were displaced to either the morning site 
if captured in the afternoon), the afternoon site (if captured in the morning), Site 
3, or Site 4. Bees (controls) that had visited only the morning site were trans-
ported to Site 3.
165
The bees displaced to the afternoon site in the morning flew toward the hive; 
they did not adopt the course that would have taken them to the morning site 
(their original destination) had they not been displaced, nor did they set off in the 
actual direction of the morning site. As in experiment 1, they recalled the home-
ward vector appropriate to the site to which they had been transported.
The bees displaced to the morning site in the afternoon adopted one of two 
headings. About half of them flew toward the hive, whereas as the other half 
headed in the direction that would have taken them from the hive to the after-
noon site, their original target. They did not take the actual course to the after-
noon site.
Menzel explains the difference in behavior between the bees displaced to the 
morning site and those displaced to the afternoon site in terms of differences in 
the local cues available at the two locations. The landmarks visible from the af-
ternoon site were more prominent than those visible from the morning site. The 
former was characterized by a nearby bush, and it was much closer to the large 
hill than the latter. Thus, the bees transferred to the afternoon site were more 
likely to recognize their location than those transferred to the morning site. By 
the same token, they were also more likely to change their original feeder-direct-
ed motivation (and corresponding flight vector) to a hive-directed one.
166
The vanishing bearings of the hive departing bees transported to Site 3 ex-
hibited a bimodal distribution. This was the case for both the morning-displaced 
bees and the afternoon-displaced bees. Most of the bees in each group departed 
Site 3 on a course that would have taken them from the hive to the time-appro-
priate feeder. A significant proportion of the bees in each group took the novel 
course toward the hive. However, none of the peaks in the distribution of van-
ishing bearings corresponded to the actual direction from Site 3 to the time-
appropriate feeder.
The bees transferred to Site 4 in the morning chose the hive-to-morning-site 
compass direction and hence behaved as if they had not been displaced. The bees 
which had visited only the morning site, when released at Site 3, also behaved as 
if they had not been displaced?they, too, opted for the hive-to-morning-site di-
rection. These results reaffirm the implications of experiment 1. The possibility 
that the novel-route bees homed toward a beacon near the hive, as well as the 
possibility that they were attracted to familiar-route-associated landscape fea-
tures, was excluded. Also, learning the two feeder?hive routes appears to have 
been necessary for taking the novel shortcut. So it again appears that the novel-
route bees somehow combined the two respective route memories. Furthermore, 
the fact that the bees released at Site 4 behaved as if they had not been displaced 
167
suggests that bees don?t automatically combine route or vector memories when-
ever they are released at an unfamiliar location.
In experiment 3, bees were captured at the feeders, some upon arrival and 
some upon departure. Bees arriving at or departing from the morning site were 
displaced to either the afternoon site or Site 3. Bees arriving at or departing from 
the afternoon site were displaced to the morning site. Feeder arriving bees were 
expected to be motivated to feed and thus to return to the feeding site at which 
they were captured, if able to do so. Feeder departing bees were expected to be 
motivated to return to the hive.
All of the bees taken from the morning site set a course that, in the absence 
of displacement, would have taken them back to the hive from the morning site. 
This was the case, regardless of whether they were captured upon arrival or de-
parture, and regardless of whether they were displaced to the afternoon site or 
Site 3. Menzel found this result surprising, since hive arriving and hive departing 
bees were able to fly directly home from the feeding sites and Site 3, and since 
feeder departing bees should have been at least as motivated to return to the hive 
as bees from either of those two groups. Menzel suggests both that the home-
ward vector is loaded into working memory upon arrival at a feeding site and 
that it is strong enough to override local landmark information. (I?ll suggest a dif-
ferent explanation shortly.)
168
Bees captured at the afternoon site and released at the morning site showed 
a bimodal distribution of vanishing bearings, presumably uncorrelated with 
whether the bees were captured upon arrival or departure. About half of them 
oriented their flights in the afternoon-site-to-hive direction, whereas the remain-
der oriented their flights toward the hive. The bees captured at the afternoon site, 
then, behaved in a different manner than those captured at the morning site.
Menzel?s explanation of the behavior of the bees captured at the morning 
site doesn?t explain the behavior of the bees captured at the afternoon sight. If the 
homeward vector in working memory was strong enough to override local 
landmark information in the former case, then it should have been strong 
enough to override local landmark information in the latter case as well, since the 
landmarks at the morning sight were less prominent than those at the afternoon 
site. The bees released at the morning site would have had less local information 
to override than the bees released at the afternoon site. 
An alternative explanation focuses on the prominence of local cues at the 
capture site, rather than the strength of the homeward vector and the prominence 
of local cues at the release site. Since the morning site was characterized by rela-
tively few local cues, the bees captured at that site were likely to be disposed to 
rely upon compass information, rather than local cues, to set their course upon 
departure. This explains their failure to notice that they had been displaced to 
169
either Site 3 or the afternoon site. The afternoon site, on the other hand, did have 
at least one prominent local feature, a nearby bush. So it is reasonable to suppose 
that they were more disposed than morning-captured bees to rely upon local 
cues to set their course upon departure. Consequently, they should have been 
more likely than morning-captured bees to recognize where they were once re-
leased and to set the correct homeward course. This explanatory hypothesis pre-
dicts that, if the experiment is repeated under the same conditions, then a signifi-
cant portion of bees captured at a location corresponding to the afternoon site 
will fly directly to the hive if displaced to a location corresponding to Site 3.
Note that the just-offered explanation comports well with the idea that bees 
learn sequences of route segments (?? 6.2.1?6.2.2.1). For if that idea is correct, it 
would explain how bees could already be disposed to fly in the feeding-site-to-
hive direction upon arrival at that site. A route from the hive to a feeding site and 
back to the hive can be viewed as one journey with multiple route segments, just 
as well as a route from the hive to a foraging site, or one from the hive to a for-
aging site and then to another foraging site.
From his results (summarized in Table 6.2), Menzel infers that the course a 
displaced bee sets upon release depends upon both its motivation when captured 
and the information it acquires at the release site. It does appear likely that the 
bees linked their memories of the feeder-to-hive vectors to cues available at the 
170
feeding sites.
42
 This explains why hive departing bees and hive arriving bees 
were able to set the correct course from either feeding site to the hive. It explains 
why hive departing bees transported to the afternoon site were more likely than 
those transported to the morning site to choose the correct course home. And, as 
we?ve just seen, the behavior of feeder arriving bees and feeder departing bees 
can be explained in terms of the relative prominence of local cues at the two 
feeding sites.
171
42
 This finding corroborates Wehner et al. 1990.
Release 
site
Hive arriving Hive departing Feeder arriving Feeder departing
Morn Aft Morn Aft Morn Aft Morn Aft
Sm Sm-to-H Sm-to-H
H-to-Sa
Sa-to-H
Sm-to-H
Sa-to-H
Sm-to-H
Sa Sa-to-H Sa-to-H Sm-to-H Sm-to-H
S3 S3-to-H S3-to-H S3-to-H
H-to-Sm
S3-to-H
H-to-Sa
Sm-to-H Sm-to-H
S4 Sa-to-H H-to-Sm
S3
a
Sa-to-H H-to-Sm
Table 6.2. Courses set by the bees in Menzel?s vanishing bearing, displacement experiments. 
morn, captured in the morning at the hive or the morning feeding site; aft, captured in the after-
noon at the hive or the afternoon feeding site; Sm, morning site; Sa, afternoon site, S3, Site 3; S4, 
Site 4; H, hive.
a
 Control experiments in which the bees had visited only the site at which they were captured.
6.2.3.2  Explanations of Novel-Shortcut Behavior
Menzel demonstrated that bees are capable of setting a novel course from an un-
familiar site to the hive, without homing to recognized visual cues near or along 
the way to the hive. There are several possible explanations of this finding. 
Image matching  The novel route was the result of the bees? traveling so as 
to match their stored image of distant landscape features, as seen from the 
hive, with their current image.
43
Noninferential Interpolation  The similarity between the distant visual cues 
at Site 3 and those at each of the two feeding sites directly caused the bees 
in question to compromise between their established homeward vectors; 
no inferential processes were involved.
44
Sequential Memory Referral  The novel route was the result of the bees? al-
ternately relying upon the two remembered feeder-to-hive vectors.
45
 
General Landscape Memory  The novel-route bees did not rely upon their 
feeder-to-hive vectors; rather, they employed their ?general landscape? 
memory, established during their exploration, or orientation, flights.
46
Cognitive Map  The bees were able to set a novel course by locating them-
selves on their cognitive map, which encoded the coordinates of the two 
feeding sites, and other places in the bees? explored territory, in a common 
frame of reference centered on the hive.
47
172
43
 Collett (T. S.) and Collett 2002; Wehner et al. 1996.
44
 Menzel et al. 1998, p. 149.
45
 Menzel et al. 2000b.
46
 I mention this as a possible explanation based on the results of Menzel et al. (2000a, 2005), 
which I present below.
47
 Giurfa and Capaldi 1999; Menzel and Giurfa 2001; Menzel et al. 1998, 2000b.
Vector Averaging  The similarity between the distant visual cues at Site 3 
and those at each of the two feeding sites caused the bees to recall both of 
the acquired feeder-to-hive vectors, which they then averaged to obtain a 
Site 3-to-hive vector.
48 
Menzel favors the vector averaging hypothesis. It does, in fact, currently provide 
the best explanation of his results, as I will now argue.
There is a great deal of evidence confirmatory of the idea that honeybees 
pinpoint the location of their goal by matching their stored image(s) of land-
marks near the goal with their current image.
49
 One could argue, then, as do 
Wehner et al.,
50
 that bees might employ such a landmark-based guidance strat-
egy on larger scales, at least when the relevant visual cues do not have to com-
pete with vector information in working memory.
It is not at all clear, however, that this large-scale image matching hypothesis 
could adequately explain Menzel?s results. It is prima facie incompatible with 
several of them. First, bees captured upon arrival at the hive have just played out 
their feeder-to-hive vector. Thus, on the image matching hypothesis, they should 
be quite capable of employing landmark-based information in order to return to 
the hive via a novel shortcut. Now, in Menzel?s experiment 1, the hive arriving 
bees which had visited both feeding sites did depart from Site 3 in the direction 
173
48
 Collett (T. S.) and Collett 2002; Giurfa and Capaldi 1999; Menzel et al. 1996, 1998, 2000b.
49
 Cartwright and Collett 1983; Collett (T. S.) 1992; Wehner 1992.
50
 Wehner et al. 1996.
of the hive. But the hive arriving bees which had visited only the afternoon site 
departed from Site 3 in the afternoon-site-to-hive direction. This poses a diffi-
culty for advocates of imaging matching, since the bees which had visited only 
the afternoon site should have been just as able to take the novel shortcut by 
means of image matching as the bees which had visited both sites. Moreover, any 
of the bees which had been visiting the afternoon site should have been quite fa-
miliar with the large hill, toward which they flew when traveling to that location 
(Fig. 6.8). 
Similarly, if the image matching hypothesis were correct, the hive arriving 
bees displaced to Site 4 in the afternoon should have been able to set the correct 
homeward course upon departure. Instead, they picked the direction that would 
have taken them from the afternoon site to hive in the absence of displacement. It 
is true that the hill?s compass direction at Site 4 differed by more than 90? from its 
compass direction at the afternoon site and at the hive (Fig. 6.8). So the bees dis-
placed to Site 4 might not have treated that prominent visual cue as the hill with 
which they were familiar. However, this possibility runs counter to at least the 
spirit of the imaging matching hypothesis under consideration, since it purports 
to explain the ability of insects to take novel shortcuts, even when the relevant 
landmarks are viewed from very different perspectives.
51
 
174
51
 Wehner et al. 1996, pp. 133?134, 137?138.
Further, morning and afternoon hive departing bees were able to take the 
novel route back to the hive. But hive departing bees which had visited only the 
morning site did not orient toward the hive when released at Site 3. Again, this is 
a difficulty for the image matching hypothesis. The bees which had visited only 
the morning site should have been just as able to take the novel route by means 
of image matching as the bees which had visited both sites. One could claim that 
the morning-site only bees were somewhat less familiar with the position of the 
hill than bees which had visited the afternoon site, since the hill was not situated 
in line with their hive-to-morning-site route. This, then, might account for their 
failure to take the novel shortcut. But this response is unsupported, given that 
hive arriving, afternoon-site only bees also failed to take the novel route. That is, 
the bees? degree of familiarity with the hill doesn?t account for any differences in 
navigational performance between the two groups.
Relatedly, an appeal to large-scale imaging matching would have to account 
for the fact that the hive departing bees released at Site 3 did not depart toward 
either of the feeders. Rather, they set off either on a heading toward the hive or 
on their original hive-to-site heading. The idea that image matching is not relied 
upon in the presence of a vector in working memory does not help here, since, to 
reiterate, a significant proportion of the hive departing bees released at Site 3 
headed toward the hive. Thus, hive departing bees were able to disregard their 
175
original flight vector (and this is in accord with the results of other studies
52
). 
They were also motivated to forage when captured. It would seem, then, that if 
the novel-route, hive departing bees used imaging matching in order to return to 
the hive, then they should have been able to use image matching in order to lo-
cate their original destination.
The noninferential-interpolation hypothesis avoids some of the problems 
faced by the image matching hypothesis, since it attributes the bees? novel-
shortcut ability, in part, to their having visited both of the feeding sites. Menzel 
mentions the view merely as providing a possible explanation of his results. I?m 
unaware of anyone who actually defends it. Consequently, it?s not clear what the 
claim is, exactly. The idea, I gather, is as follows. The bees associated the visual 
scenes at the two familiar sites with the respective homeward vectors. The scene 
at Site 3 resembled those familiar scenes closely enough that when some of the 
bees released at Site 3 attempted to match the available visual cues with one of 
the familiar scenes, both of the learned associations became active. The vector 
memories then somehow competed for control of the bees? behavior, the result 
having been a compromise flight direction.
The interpolation hypothesis, as I understand it, attempts to occupy a mid-
dle ground between the imaging matching account and the vector averaging ac-
176
52
 Dyer 1991, Menzel 1989, Riley et al. 2003, Sch?ne et al. 1998.
count. It adds to the former an appeal to vector navigation. However, it stops 
short of an appeal to computational processes, relying only on associations and 
association strengths. Vectors come into play, but they are not rule manipulated.
Nonetheless, it is not clear that the hypothesis occupies a stable position. 
First, what sort of process is supposed to yield a compromise between two vec-
tors that is not a kind of inference? And if it is not a kind a inference, how is the 
process to be distinguished from large-scale image matching? In any event, even 
if the hypothesis is coherent, it?s not likely to be adequate. Before I present my 
case for that conclusion, I turn to Menzel?s.
Menzel finds the interpolation hypothesis unlikely because the bees released 
at Site 4 left on the heading they would have taken if they had not been dis-
placed, even though the hill was a prominent visual cue there. That assessment 
suggests that he takes the hypothesis to be a variant of the image matching ac-
count. But his response is not clearly adequate, if emphasis is placed on the pro-
posal?s requirement that the bees? recalled both learned vectors. The hill?s com-
pass direction at Site 4 differed from that at each feeding site by 100?110? (Fig. 
6.8). It could be argued, then, that the bees did not treat Site 4 as sufficiently 
similar to either of the familiar sites. Furthermore, Menzel?s objection undercuts 
his own position. For if it is the case that the distant visual cues at Site 4 were 
similar to those available at the familiar sites, enough so that the interpolation 
177
account should have applied to bees at Site 4, then those cues should have been 
similar enough for Menzel?s vector averaging thesis to have applied as well. 
(Note that the hill?s orientation with respect to the various sites poses a problem 
for image matching, but not clearly for interpolation, since the former, but not 
necessarily the latter, purports to explain novel shortcuts, even when the relevant 
landmarks are viewed from very different perspectives.) 
Since interpolation and vector averaging each appeal to vector navigation, it 
might be thought that they should account for Menzel?s data equally well. But I 
hope to convince you that this is not the case. Again, the interpolation account 
appeals to only associations and association strengths. It explains the behavior of 
the novel-route bees in terms of the strengths of their site?vector associations and 
the degree to which the visual cues at Site 3 stimulate each familiar-site-
associated vector memory. But association strengths and degrees of stimulation 
can be highly variable factors. Moreover, it is more likely than not that two vector 
memories would differ in their associative influence on a bee?s course (perhaps 
with one dominating). Consequently, on the interpolation view, it seems that the 
vanishing-bearing distribution at Site 3, for experiment 1 (hive arriving bees), 
should have been relatively broad, perhaps also with multiple peaks: one be-
tween the afternoon-site-to-hive and Site 3-to-hive directions, and one between 
178
the morning-site-to-hive and Site 3-to-hive directions, possibly with additional 
peaks at each of the feeder-to-hive directions.
However, no such peaks are discernible (Figs. 6.9 and 6.10). Furthermore, 
histograms of Menzel?s vanishing bearing data appear to show that the distribu-
tion of vanishing bearings for hive arriving bees released at Site 3 is not signifi-
cantly different from the distributions for hive arriving bees released at Site 4 and 
afternoon-site only bees released at Site 3 (controls). In fact, the similarity of the 
distributions is fairly close (the Site 3 distribution for hive arriving bees also 
bears some resemblance to the Site 4 distribution for hive departing bees) (Fig. 
6.10). This suggests that similar mechanisms were operative in the three groups 
of bees. We may infer, then, that since interpolation did not occur in the Site 4 or 
control group, it did not occur in the Site 3 group either. Also, since the Site 4 
bees and the control bees showed a tendency to rely upon a single vector after 
displacement, the same is probably also true of the Site?3 bees.
The general-landscape-memory, cognitive-map, and vector averaging ac-
counts each attribute novel-course setting to a single, flight controlling vector. 
The sequential-memory-referral hypothesis, to which I now turn, does not.
As he does in the case of the interpolation hypothesis, Menzel mentions the 
sequential-memory-referral account merely as providing a possible explanation 
of his results. Again, I?m unaware of anyone who actually defends it. In any case, 
179
180
Figure 6.10. Histograms showing the distribution of vanishing bearings for four groups of bees in 
Menzel?s displacement experiments. The distributions shown in panels B?D are normalized so 
that their maximum values are equal to the maximum value of the distribution shown in panel A. 
(A) Hive arriving bees, Site 3. The data for morning- and afternoon-captured bees are combined 
(see Figure 6.9; n = 210). The superimposed black line approximates the distribution curve. (B) 
Hive arriving bees, Site 4 (n = 37). The black line is identical to the one in panel A. (C) Hive de-
parting bees, Site 4 (n = 32). The black line is the left-right mirror of the line in panel A. (D) Hive 
arriving bees that had visited only the afternoon site, Site 3 (n = 55). The black line is identical to 
the one in panels A and B.
it is easy to dispense with. For if the novel-route bees alternately relied upon 
their two route vector memories, their flights should have exhibited a zig-zag 
pattern, with an alternation frequency high enough to allow a vanishing bearing 
distribution directed toward the hive. Otherwise, the distributions would have 
been bimodal, with one peak for each of the two vectors. A difference in flight 
behavior, then, between novel-shortcut bees and others should have been ob-
servable from the release site. However, Menzel reports that the departure flight 
characteristics for novel-route bees did not differ in any discernible way from 
those of any other group of bees. Thus, the hypothesis in question is unsup-
ported. (Notice also that the view leaves for further investigation the question of 
why the distributions of vanishing bearings are similar across the different ex-
perimental groups.)
The hypothesis that appeals to the general landscape memory of bees, unlike 
the hypothesis just examined, is based on known bee navigational abilities. In 
order to provide a clear enough statement of the idea, I first contrast bees? gen-
eral landscape memory with both their landmark-based route memory and their 
vector memory.
Prior to foraging for the first time, or for the first time in a new area, honey-
bees will make a series of exploration, or orientation, flights. Individual bees 
typically will explore multiple regions around the hive, though each flight is 
181
usually limited to a particular sector.
53
 On these excursions, bees learn the local 
solar ephemeris. They also learn the distance and direction, from the hive, of 
various landscape features. The sum of such stored distance and direction infor-
mation is referred to as the bees? general landscape memory.
Once bees begin to forage, they learn routes to and from foraging sites. Ex-
perienced foragers rely on landmark-based route memories and flight vector 
memories as their primary means of navigation. That is why, when such bees are 
released after displacement to a location they are unaccustomed with, they tend 
to depart either along the flight vector they would have adopted had they not 
been displaced or in the direction of their original destination by means of hav-
ing recognized landmarks that lie along the relevant established route.
Bees which have flown only orientation flights are able to return rapidly to 
the hive after displacement, about as rapidly as they would have returned if they 
had learned a direct route connecting the hive and the place of release.
54
 They 
can recognize landscape features near the release point and recall the associated 
homeward vector acquired during exploration.
55
 Because experienced foragers 
primarily rely on established-route memories, they might take significantly 
longer to return to the hive after displacement, depending on where they are re-
182
53
 Capaldi et al. 2000.
54
 Menzel et al. 2000a.
55
 Menzel et al. 2005.
leased in relation to the hive and familiar routes. For example, a bee trained to a 
feeder 200 m north of the hive and displaced from the feeder to a location 200 m 
south of the hive will initially fly the learned southward feeder-to-hive vector, 
taking it farther away from the nest.
But experienced foragers, too, can recall locational information acquired 
during their orientation flights. They are most likely to do so when familiar-route 
vector information is absent from working memory. That is the case for bees 
which have just played out a particular recalled vector. And that occurs when 
they have arrived at the hive or when they have flown the entire length of the 
vector without encountering their destination.
56
 
Since experienced foragers have access to their general landscape memory, 
the possibility arises that the novel-route bees in Menzel?s displacement experi-
ments were able to take a shortcut to the hive because they recalled a Site 3-
associated homeward vector, which they learned during their orientation flights.
Although an appeal to general landscape memory could account for some 
novel-shortcut behavior, it is doubtful that such an account could explain Men-
zel?s results. The fact that the bees which had visited only the morning site or 
only the afternoon site failed to orient toward the hive from Site 3 is a problem 
for the idea. The hive arriving bees which had visited only the afternoon site 
183
56
 Menzel et al. 2005.
failed to exhibit novel-shortcut ability. Also, whereas about half of the hive de-
parting bees displaced to Site 3 adopted the novel homeward course, those which 
had visited only the morning site did not. Again, this suggests that experience 
with both trained routes was necessary in order to be able to take the novel route. 
That would not have been necessary on the general-landscape account, which 
would require only that the bees became acquainted with Site 3?s vicinity on at 
least one of their orientation flights. Moreover, it is highly unlikely that the bees 
which had learned both routes could access their general landscape memory af-
ter displacement to Site 3, but both the morning-only-site bees and the afternoon-
only-site bees could not.
As we have seen (? 6.2.3.1), the behavior of bees that had learned only one of 
the trained routes controlled for the possibility that the novel-route bees homed 
toward landscape features near the hive as well as for the possibility that they 
relied on route-associated landscape features. We see now that it also controlled 
for the possibility that those bees employed a single orientation-flight-acquired 
vector associated with Site 3 local cues.
What could explain why single-route bees did not activate general-land-
scape vector memories when released at Site 3? The landscape features associ-
ated with homeward vectors during orientation excursions are not unlikely to be 
relatively local features along the bee?s line of flight. They certainly need to be 
184
more localized than distant panorama features, since distant cues appear much 
the same over a broad area and hence are not useful for accurate assessment of 
position. Site 3 was situated within a uniform expanse of grassland. It?s plausible, 
then, that the site?s local scene was not distinctive enough for exploring bees to 
have associated the location with a homeward vector in the first place.
It remains to evaluate the vector averaging and cognitive-map hypotheses.
According to the former, the similarity between the distant visual cues at Site 
3 and those at each of the two feeding sites caused the bees to recall the two 
feeder-to-hive vectors, which they then averaged to obtain a Site 3-to-hive vector. 
The first thing to notice about the account is that it?s no worse off than the inter-
polation view with regard to explaining the difference in behavior between the 
groups of bees that did take the novel shortcut and the groups that did not. For 
example, advocates of either claim may appeal to the fact that the hill?s compass 
direction at Site 4 differed from that at each feeding site by 100?110?, in order to 
explain why the bees released at Site 4 failed to orient toward the hive.
Second, the vector-averaging account provides an explanation of the simi-
larity among the vanishing bearing distributions for the different groups of bees 
displaced to unfamiliar sites. For, first, the values of the vectors which are pro-
posed to have been averaged are unlikely to have differed significantly among 
185
individual bees.
57
 Second, there is no reason to suppose that the bees would have 
weighted the vectors differently. Third, the result of an averaging process de-
pends on the values of the vectors averaged, not on their ?strengths.? On the 
other hand, it?s at least quite unclear whether or not noninferential interpolation 
would result in such similarities. In fact, as I?ve argued, we should expect some 
discernible differences.
Since each group of control bees had learned only one route, vector averaging 
explains why neither of them oriented toward the hive from Site 3. It should also 
be clear that vector averaging doesn?t predict unusual departure flight patterns.
What about the lack of novel routes to feeders? The vector averaging ac-
count does not imply that the bees in Menzel?s experiments should have been 
able to take a novel shortcut from Site 3 to the time-appropriate feeding location. 
The proposed computation operates on two hive-directed vectors, neither one of 
which has the bee?s actual location as its point of origin. It requires only that the 
bee recall and average those vectors. Whereas a computation of the heading and 
distance from Site 3 to a feeding site would require that the bee first compute a 
Site 3-to-hive vector, maintain that vector in working memory while recalling a 
hive-to-feeder vector, and then sum them. Clearly, then, the ability to average 
186
57
 Menzel et al. 2005, Riley et al. 2003.
two vectors does not bring with it an ability to compute a course from an unfa-
miliar location to a familiar location other than the hive.
It?s important to recall that the bee?s in Menzel?s experiments either did not 
store a Site 3-to-hive vector in their general landscape memory or could not recall 
such a vector after displacement to that site. For if they had access to such a vec-
tor, then it appears that they could have summed it with the relevant hive-to-
feeder vector in order to set a course toward a feeder location. In fact, as we will 
see below (? 6.2.4), there is evidence that suggests that bees do have this ability.
What about the fact that hive departing bees released at the morning site in 
the afternoon, or at the afternoon site in the morning, did not choose a shortcut to 
the relevant feeder? Couldn?t they have summed the feeder-to-hive vector for 
their release site with the hive-to-feeder vector for the other site? Perhaps. But 
this isn?t a serious worry, given the strong tendency of experienced foragers to 
give primacy to acquired route information. Also, hive departing bees have lim-
ited energy resources.
58
 Consequently, when find themselves at an unexpected 
location, they are apt to return to the hive, along a familiar route, rather than set 
out on a riskier course that would take them over unfamiliar territory.
187
58
 Menzel et al. 2005.
In sum, none of Menzel?s results pose any special problem for his vector av-
eraging hypothesis. Let?s now turn to the cognitive-map account and see how 
well it fares.
On the cognitive-map thesis, the bees in Menzel?s displacement experiments 
recorded the coordinates of the two feeder locations, and other places in their ex-
plored territory, in a common frame of reference centered on the hive. The bees 
acquired this information over the course of their exploration and foraging trips. 
The sum of this information functions as a map, since it enables bees to set a di-
rect course between any two recorded locations. Thus, the bees? were able to set a 
novel course from Site 3 to the hive by first (somehow) locating themselves on 
their mental map. Once they did so, they were able to compute a homeward 
flight vector with the help of information provided by their solar compass.
However, there is a difficulty for the cognitive-map approach. The trouble is 
that if the bees? in Menzel?s experiments did in fact construct a cognitive map of 
their foraging territory (and their having done so was responsible for their novel 
shortcuts), then the hive departing bees, at least, should have been able to take a 
shortcut from their place of release to their original foraging site destination. 
Again, hive departing bees (at least when captured) are motivated to fly to a 
particular foraging place. Moreover they have been shown to be able to set a 
novel course, from the place of their release, to either the hive or, in certain cir-
188
cumstances, their original destination.
59
 Since a significant proportion of the hive 
departing bees in Menzel?s experiments were able to take a shortcut from Site 3 
back to the hive, the cognitive-map view requires that they must have been able 
to estimate the position of their place of displacement in relation to the hive. 
Also, they were informed about the location, with respect to the hive, of the time-
appropriate feeder. Nonetheless, they failed to demonstrate an ability to set a 
Site?3-to-feeder course.
It might be thought that if the novel-route bees navigated using a cognitive 
map, then hive departing bees released at the morning site in the afternoon, or at 
the afternoon site in the morning, also should have been able to choose a shortcut 
to the relevant feeder. For they possessed information about the hive-relative po-
sition of both the release site and the time-appropriate feeding site (this was not 
the case for the control group). But, as in the case of vector averaging, the strong 
tendency of experienced foragers to give primacy to acquired route information 
allays this worry.
Table 6.3 summarizes the conclusions of this section. Menzel?s vector aver-
aging hypothesis is judged to provide the best explanation of his results. Each of 
the alternatives fails to deal adequately with at least one of them.
189
59
 Dyer 1991, Gould 1986.
6.2.4  A Kind of Cognitive Map
The failure to demonstrate a role for a cognitive map in the production of the 
vanishing bearing distributions in Menzel?s displacement experiments does not 
show that the honeybee does not have a cognitive map. It shows only that the 
bees in those experiments probably did not rely upon a cognitive map to set their 
initial course from the release site. As Menzel points out, training bees to specific 
routes might result in reliance on flight vector information for course setting. The 
operation of a cognitive map might not become apparent until a route-trained 
bee finds itself to be lost after a flight vector memory fails to lead it to its destina-
190
Explanandum
Hypothesis
IM SMR GLM NI VA CM
Shapes of vanishing bearing distributions ? ? ? ?
No shortcuts to hive from Site 4 ? ? ? ? ?
Controls
a
: No shortcuts to hive from Site 3
? ? ? ?
No novel routes to feeders ? ? ? ?
Typical departure flight patterns ? ? ? ? ?
Table 6.3. Comparison of explanations of Menzel?s displacement experiment results, based on my 
evaluations. A checkmark indicates either that the hypothesis explains the result relatively well or 
that the result poses no apparent difficulty. Abbreviations: NI, noninferential interpolation; CM, 
cognitive map; GLM, global landscape memory; IM, (large-scale) image matching; SMR, sequen-
tial memory referral; VA, vector averaging.
a
 Control experiments in which the bees had visited only the site at which they were captured.
tion. In fact, the general landscape memory of bees was not revealed until they 
were tested in displacement experiments in which route learning was prevented.
60
The general landscape memory is a kind of map, but it is not as robust as the 
sort of cognitive map we?ve been considering. It consists of multiple, hive point-
ing vectors associated with various respective landscape features. Bees could 
have this sort of ?vector? map without representing the spatial relations between 
any places other than certain landscape features and the hive. Furthermore, bees 
could have this sort of map and yet not be able to integrate any of its vectors with 
any other. In that sense, a vector map could be fragmented and piecemeal.
Menzel and colleagues,
61
 however, claim to have demonstrated the existence 
of a kind of cognitive map in the honeybee, a map that allows bees to take novel 
shortcuts between known locations, neither of which is the hive. I later propose 
that the novel shortcuts flown were the results of novel combinations of flight 
vector memories and their semantic constituents (? 7.4).
Using harmonic radar,
62
 Menzel tracked displaced bees over the entire 
course of their flights. Three groups of bees were tested in the study. One group 
was trained to a stationary feeder situated 200 m east of the hive. A second group
191
60
 Menzel et al. 2000a, Capaldi and Dyer 1999.
61
 Menzel et al. 2005, p. 3045: ?The question now in bee navigation is not so much whether there is 
a map-like spatial memory but rather what structure this map has and how it is used.?
62
 Riley et al. 1996, 1998.
was trained to a feeder that slowly revolved around the hive at a distance of 
10?m. A third group consisted only of bees that had not visited the stationary 
feeder but were recruited to it by a waggle dance. 
The two groups of feeder-trained bees were captured at the feeder after they 
had filled their crops. The dance-recruited bees were captured upon departure 
from the hive. Captured bees were placed in a dark container and transported to 
one of eight sites, where they were released within 15 min of capture.
The experiments were performed in an expanse of flat grassland which 
contained very few natural food sources. Ground patterns due to different 
mowing times and soil conditions provided the only natural landmarks. Two 
groups of tents of various colors served as artificial landmarks. The height of the 
skyline as seen from the hive area varied within a range of less than 1.5?. 
Due to the resolution of the honeybee visual system, no features of the sky-
line were pronounced enough to guide the bees to the area of the hive. Neither 
the hive nor the feeder was visible to the bees beyond a range of 60 m. The tents 
could not be seen by the bees outside a range of 100 m. Of one group of tents, the 
tent closest to the hive was 110-m distant. Of the other group, the nearest tent 
was 190-m distant. Hence, the tents were not suitable for purposes of homing by 
image matching.
192
The bees were experienced foragers, but the study site was new to them. The 
bees tested during one study period were allowed to perform orientation flights 
for 3 days. The bees tested during the other study period were permitted to per-
form orientation flights for 6 days. Tests were carried out with the two groups of 
tents either in their original positions, rotated 120? about the hive, or removed. 
Orientation flights and test flights occurred under conditions well suited for 
solar-compass navigation (with the exception of some of the test flights of dance-
recruited bees). 
Irrespective of release site and test conditions, the bees trained to the sta-
tionary feeder initially flew their feeder-to-hive flight vector (on the heading and 
for the distance they would have flown in the absence of displacement). They 
next performed a search flight, followed by a straight homing flight toward the 
hive or  first toward the feeder and then toward the hive. Hive departing, dance-
recruited bees initially flew their hive-to-feeder vector with very good accuracy.
63
 
After a brief search for the feeder, they flew back toward the release site and ini-
tiated a search for the nest. That search was followed by a straight, homeward 
homing flight. Bees trained to the moving feeder began to search for the hive 
immediately upon release. They too eventually performed a straight homing 
flight toward the hive. Since all groups of bees performed equally well, they 
193
63
 Riley et al. 2005.
must have acquired information sufficient for homing during their orientation 
flights.
For all groups of bees, search flight speed (12.9 ? 3.5 km/h) was significantly 
slower than both vector flight speed (19.1 ? 2.4 km/h) and homing speed (19.4 ? 
1.8 km/h). Search flight paths were curved and highly variable. Searching bees 
often returned to the release site multiple times.
64
 
With very few exceptions, homing flights were initiated at points well out-
side the 60-m-radius ?visibility zone? around the hive. Patterns of small patches 
of slightly differing kinds of vegetation were much the same over the entire 
study area. Also, bees homed toward the hive and approached the point at which 
they initiated their homing flights from all directions. Furthermore, bees released 
at the same site multiple times were able to approach the hive from different 
directions.
65
 So it?s very unlikely that any particular pattern of ground patches 
visible beyond 60 m from the hive was used as a beacon.
194
64
 Bees? search patterns are somewhat reminiscent of the search patterns of desert ants (see Weh-
ner and Srinivasan 1981, and M?ller and Wehner 1994). Similarities include looping trajectories 
out from and back to the search?s point of origin and a continual expansion of the area searched. 
Bees? search patterns, however, are much more irregular than those of desert ants. Bees also ap-
pear to be able to move the focus of their search. At the time of this writing, a variety of examples 
of the entire flight paths of bees in Menzel et al.?s (2005) study are available online 
(http://www.honeybee.neurobiologie.fu-berlin.de/Menzel-Greggers-Smith-PNAS-2005/supple-
ment.html).
65
 Examples of this have been provided online (http://www.honeybee.neurobiologie.fu-berlin 
.de/Menzel-Greggers-Smith-PNAS-2005/supplement.html).
Bees clearly often used the tents as landmarks when they remained in their 
orientation-period locations. The presence of the tents was not essential for accu-
rate homing, since bees homed just as effectively both when the tents were ro-
tated 120? clockwise about the hive and when the tents were removed. Bees also 
homed just as effectively under heavy overcast, when solar cues were not avail-
able. Thus, ground features were sufficient for accurate homing.
A group of 29 stationary-feeder-trained bees were released at one of two 
sites under sunny skies and with the tents in their original positions. Of those 
bees, ten performed homing flights toward the feeder prior to returning to the 
hive. (Some other bees also homed toward the feeder under different conditions.) 
The homing routes taken by the feeder homing bees were certainly novel, as-
suming that all stationary-feeder trained bees never flew outside the direct 
pathway between it and the hive, as Menzel reports. Although its possible that 
some of the hive homing bees, during exploration, had flown along the path of 
their homing flight, Menzel maintains that it?s very likely that at least a signifi-
cant proportion of their homing flights were novel shortcuts.
Menzel argues that his results show that the large-scale spatial memory of 
bees has a map-like organization. The bees took novel shortcuts to the feeder as 
well as the hive. The shortcuts were not a direct result of path integration, since 
the bees could not observe anything during transport to the release sites. Finally, 
195
the shortcuts were not produced by beacon homing or image matching. Thus, 
during their orientation flights, the bees must have associated homeward vectors 
(provided by path integration) with views of various landscape features they en-
countered. Furthermore, since the bees took novel shortcuts to the feeder as well 
as the hive, they must have been able to integrate hive?feeder route vectors into 
that general landscape memory.
66
As I indicated in the introduction (? 1.4) the capacity to take novel shortcuts 
is one that seems to require the capacity to represent various places of interest 
and certain relations (topological, metric, etc.) among them, as well as the capac-
ity to make inferences involving those representations. Indeed, I?ll argue that the 
novel shortcuts flown to the feeder, in the above study, were the results of novel 
combinations of flight vector memories and their semantic constituents (? 7.4).
196
66
 There is another possible instance of the integration of vector information into general land-
scape memory. Seemingly, new recruits do not respond to dancers when they indicate a source of 
food as being situated where there in fact is only water (Gould and Gould 1988, Tautz et al. 2004). 
However, it?s not clear whether the dance observers don?t respond at all, or whether they do re-
spond, but simply can?t find the feeder. Likely cases of bees? integration of reward value infor-
mation into their information about a small-scale layout has been provided by Greggers and 
Mauelshagen (1997) and Fulop and Menzel (2000).
Chapter 7
The Systematicity of Honeybee Navigational Capacities
In this chapter I argue that various honeybee navigational capacities are system-
atically related. Insofar as the systematicity hypotheses I propose involve attri-
butions of content, the meaning and explanatory role of such attributions needs 
to be addressed (? 7.1.1). I spell out some of the semantic roles played by various 
honeybee representations as constituents of complex representations (? 7.3). One 
of these roles is that of an indexical (? 7.3.3). I argue that some honeybee cogni-
tive processes are sensitive to the constituent-structure of the representations on 
which they operate (? 7.4). Relatedly, I argue that honeybees implement opera-
tions defined over variables (? 7.5). Finally, I conclude by tying together the con-
clusions of Chapters 2?5 with those of the present chapter. I propose that honey-
bees have a simple language of thought. I also argue that even if they don?t, we 
have good reason to prefer non-Connectionist explanations of honeybee naviga-
tional capacities over Connectionist ones. 
197
7.1  Systematicity of Information Acquired by Honeybees
Much of the previous chapter is pertinent to whether certain classes of informa-
tion acquired by bees exhibit systematicity. My concern in this section is to pro-
pose and defend the following general hypothesis:
For various classes of information, if a honeybee has the capacity to ac-
quire information I, then it also has the capacity to acquire systematic 
variants of I,
where two items of information are systematic variants just in case they have the 
same informational constituents, have the same informational structure, but are 
formal permutations of each other. I argue for this general hypothesis by arguing 
for specific instances of it. The point of restricting the hypothesis to some classes 
of information is to avoid its having as consequences claims like: if a bee can 
learn that the sun is directly above the crest of the hill, then it can learn that the 
crest of the hill is directly above the sun.
1
 As I mentioned in Chapter 1, there are a 
number of possible varieties of systematicity, and different kinds of cognitive ca-
pacities might be systematically related in different ways. (See also ? 7.2.)
Note that the general sort of systematicity just referred to is the same as that 
discussed in Chapter 2. There I considered two structurally complex thoughts to 
be systematically related just in case they have the same logical and representa-
198
1
 Thus, Dennett?s (1989) supposition that systematicity hypotheses, at least as applied to nonhu-
mans, would require that they have the capacity to think ecologically anomolous thoughts is 
erroneous. Penn and Povinelli (submitted) make the same supposition.
tional constituents and are formal permutations of each other. Thus, whereas the 
thought that Fa ? Gb is a systematic variant of the thought that Ga ? Fb, this is 
true neither of the thought that Fa ? Hb nor the thought that ~ (Fa ? Gb). But dis-
cussions of systematicity are often about one or another somewhat weaker no-
tion, one that does not have a formal-permutation requirement. These weaker 
notions focus on the nonarbitrariness of the semantic relations among represen-
tations. In Section 7.2, I argue that bee navigational capacities also exhibit a par-
ticular type of ?weak? systematicity.
The first specific hypothesis I propose concerns the capacity of bees to ac-
quire information about distance and solar bearing relations between various 
places, such as the hive, landmarks, and foraging sites (?? 6.1.1, 6.1.2, 6.2.1, 6.2.4). 
As I will argue, that capacity does not come in isolated pieces. That is, the capac-
ity of bees to acquire information about some particular distance and direction 
relations comes along with capacities to acquire intrinsically related information 
about other distance and direction relations. In particular,
Systematicity 1  If a honeybee has the capacity to estimate that the solar 
bearing of a particular foraging site from the hive is, say, 45? west of the 
sun, then it also has the capacity to estimate that the solar bearing of the 
hive from that site is 45? west of the sun.
I emphasize that this is a claim about informational content and not a claim 
about the configurational structure of honeybee mental representations. Hy-
199
potheses about the configurational structure and semantics of mental represen-
tations contribute to explanations of the truth of hypotheses like Systematicity 1 
and are not part of such hypotheses themselves. But since such hypotheses do 
involve attributions of content to nonhuman organisms, the issue arises as to 
how such attributions should be understood. It will save us some trouble if I ad-
dress this issue prior to arguing for specific systematicity hypotheses.
7.1.1  Attributions of Content to Insects
Much work on animal cognition concerns how to best characterize the contents 
acquired by various organisms. For example, the debates over whether or not 
various animals, including insects, possess a kind of cognitive map make sense 
only as issues about content. They are debates over how the spatial information 
an organism acquires is semantically organized. Little discussion, if any, is de-
voted to the configurational structure of the bearers of the information.
2
 And 
claims like, 
(1) The bees learned that the feeder is 200 m to the east of the hive.
are common in the literature on insect navigation. But it?s a safe bet that those 
who make such claims would consider the idea than an insect could have a rep-
200
2
 This is not to say that the distinction between the two issues is never ignored or overlooked. 
resentation with the content [meter] (or [sucrose reward], or [200],
3
 or [east], or 
[hive]) to be absurd or baseless. So how are we to understand content attribu-
tions such as those involved in claims like (1) and Systematicity 1?
I won?t attempt here to provide a complete, comprehensive answer to that 
question. For my purposes, it is enough to provide the basic details of a way of 
understanding such attributions that both conforms with scientific practice and 
allows them to be confirmable and disconfirmable within currently possible ex-
perimental paradigms.
The general issue may be framed in terms of the relationship between the 
content of the that-clauses in the attributions and the information thought to be 
actually acquired by the organism. That relationship certainly is (or is at least ex-
tremely unlikely to be) identity. I suspect that any expert on bee cognition would 
admit that there is a sense in which (1) could be true, even though the content of 
the bees? representations would not be [the sucrose reward is 200 m to the east of 
the hive].
Perhaps the extensions of the semantic constituents of the that-clause need to 
be identical with the extensions of the respective constituents of the bees? infor-
201
3
 The case of contents about quantities is somewhat puzzling. It does seem odd to attribute to 
honeybees representations with, say, the content [200]. Nonetheless, the capacity of bees to per-
form the computations required for path integration seems to require the capacity to manipulate 
information about relatively specific quantities. One possible way to remove the tension here 
would be to argue that the information about distance that is manipulated in path integration is a 
kind of nonconceptual content.
mation. This suggestion might seem more promising than the first, but it is still 
off the mark. For it?s not currently possible to determine the actual extensions of 
insect representations. For example, we can?t with any confidence claim that the 
extension of a term like ?sucrose reward? or ?the hive? is the same as the exten-
sion of some piece of information acquired by bees. The extension of ?sucrose 
reward? is very unlikely to be the same as the extension of any bee representa-
tion. Similarly, it?s possible that bees don?t represent the hive per se. Rather, they 
may represent only various parts of it, or features of it, while lacking a represen-
tation of the entire structure. It?s even possible that the extensions of many or all 
bee mental-representational constituents do not include anything external to bees 
at all.
4
 They could turn out to be ?lucky? (though not accidentally successful) 
hallucinators. This could be the case if the correct theory of content for bee men-
tal representations is an internalist theory, rather than an externalist one. Thus, 
bees? representations of the ?hive? might refer to only the relevant aspects of the 
snapshots they take on hive-departing learning flights.
Consider whether bees can be tricked or caused to be mistaken as a result of 
various experimental manipulations. As we?ve seen, bees acquire information 
pertaining to distance by measuring optic flow. When trained to fly through a 
tunnel in order to obtain a reward, the close proximity of the tunnel walls may 
202
4
 Compare Trullier et al. (1997) on neural-network models of navigational capacities.
induce more optic flow than the bees would have experienced had they flown to 
the reward location under normal circumstances.
5
 If that is the case, bees that 
have returned to the hive will signal to recruits, via the waggle dance, a ?dis-
tance? that is farther than the actual distance. But would such dancers literally be 
making a mistake? They would be, if the information they employ in producing 
the dance refers to distance, for then that information will refer to the wrong 
distance. But it?s possible that the information bees employ in the waggle dance 
actually refers to the quantity of optic flow that would be experienced during a 
normal, direct flight to the reward. If that were the case, the dancers would not 
be making a mistake.
Or consider a case in which bees are stimulated to forage at night, and in 
which they rely on an artificial light source for orientation. Would such bees be 
mistaking the light for the sun? They would be, only if the referent of the relevant 
representations is in fact the sun. But perhaps those representations have an ex-
tension that includes any suitable light source. Or perhaps the extension includes 
only certain illumination intensities. In either of these latter cases, the bees would 
not be making a mistake.
Fortunately, for purposes of addressing the issues about systematicity with 
which we?re concerned, we don?t have to decide what are the actual contents and 
203
5
 Esch et al. 2001, Srinivasan et al. 2000.
extensions of honeybee mental representations. For the explanatory purpose of 
attributions of content to bees (and other organisms) can be accomplished prior 
to settling those issues. For it is reasonable to interpret those attributions as hy-
potheses about what features of the environment bees are able to track; and such 
hypotheses can be confirmed or disconfirmed, independently of establishing the 
specific contents and extensions of the information that allows bees to track those 
features. 
Crucially, evidence about what features of the environment bees are capable 
of tracking constrains what the contents and extensions of their acquired infor-
mation could be. Whatever the contents and extensions of bee mental represen-
tations are, they must be such as to permit bees to track what they do. Part of the 
burden of the following arguments for the presence of systematicities in honey-
bee navigation is to support an additional claim: if bees can track certain struc-
tures composed of elements that they can also independently track, then the in-
formational contents by virtue of which they track those structures have seman-
tic constituents by virtue of which they track those elements. 
In what follows, then, I?ll continue to employ nonliteral content attributions 
like claim (1) above. My concern is the semantic relations among items of infor-
mation acquired by bees; and that issue can be addressed without making ten-
dentious assumptions about the actual contents so related.
204
7.1.2  Some Honeybee Systematicies
Various classes of information acquired by bees exhibit systematicity. The argu-
ments for the systematicity hypotheses I propose here each exhibit the same pat-
tern. Each type of systematicity is shown to be a consequence of bees? having a 
particular general capacity.
As promised, I begin with Systematicity 1:
Systematicity 1  If a honeybee has the capacity to estimate
6
 that the solar 
bearing of a particular foraging site from the hive is, say, 45? west of the 
sun, then it also has the capacity to estimate that the solar bearing of the 
hive from that site is 45? west of the sun.
All the evidence at present suggests that bees store information about the 
hive and individual foraging sites. And the ability of bees to use the sun as a 
compass is firmly established. Further, it would be quite difficult to explain the 
navigational abilities of bees if (contrary to overwhelming evidence) they are not 
capable of estimating the solar bearing of a particular foraging site from the hive, 
or of the hive from a particular foraging site.
Crucially, the mechanisms which allow bees to estimate hive-to-site solar 
bearings are the very same mechanisms which allow them to estimate site-to-
hive solar bearings. As we?ve seen, bees employ their internal solar ephemeris to 
accommodate the pattern of movement of the sun?s azimuth. In addition, they 
205
6
 The estimates need not be accurate under all conditions. I?m speaking here of the capacity to 
estimate at all.
are able to estimate the position of the solar azimuth not only during the day but 
also at night. Moreover, bees are capable of relating their solar ephemeris to dif-
ferent groups of landscape features; in particular, those visible from the hive and 
those visible from various foraging sites. Thus, for any solar bearing , bees have 
the capacity to estimate that the solar bearing from a particular familiar site to 
the hive is  (and that the solar bearing from the hive to that site is ), regardless 
of the time of day at which that bearing is . And this gives us Systematicity 1. 
Systematicity 1, then, is a consequence of the capacity of bees to estimate the so-
lar bearing of any familiar place from any other familiar place. That capacity 
comprises a cluster of systematically related capacities.
In light of the discussion of the previous section, the truth of Systematicity 1 
does not require that bees can think a thought with the content [the solar bearing 
of the hive from the foraging site is 45? west of the sun]. Nor does it require that 
for each representational constituent of the bee?s information there is a unique 
constituent of [the solar bearing of the hive from the foraging site is 45? west of 
the sun] that has precisely the same extension. Insofar as the example involves 
direction, it?s being an example of systematicity requires only that bees are capa-
ble of acquiring two distinct items of information that would share a representa-
tional constituent that allows them to track a particular solar bearing, whatever 
the specific content of that constituent. Likewise, insofar as the example involves 
206
the hive, it?s being an example of systematicity requires only that bees are capa-
ble of acquiring two distinct items of information that would share a representa-
tional constituent that allows them to track the hive, whatever the specific con-
tent of that constituent.
A question that might arise at this point is, Why suppose that bees are capa-
ble of acquiring two distinct items of information related in that way? Perhaps 
bees represent places in different ways under different circumstances or different 
motivational states. Thus, a bee might represent the hive one way when it is us-
ing information about the hive?s solar bearing from a certain site but represent it 
in a different way when it is using information about the solar bearing of that site 
(or another) from the hive. So a capacity to estimate that the solar bearing of 
Place 1 from Place 2 is  might bring with it only a capacity to estimate that the 
solar bearing of Place 3 from Place 4 is , even when Place 1 is identical with 
Place 4 and Place 3 is identical with Place 2. Why think otherwise?
Well, for one thing, there is no evidence that suggests that the envisioned 
possibility is actually the case. Second, as far as we know, the view would attrib-
ute to bees much more information than is necessary to explain their behavior. 
Systematicity 1 attributes to bees information about two places, whereas the ob-
jection?s alternative attributes to bees information about four places. Third, as I 
am about to argue, it would be difficult to explain the actual navigational abilities 
207
of bees if we could not assume that the way in which they represented particular 
places normally didn?t vary with changes in the information they have about 
their circumstances or with changes in their internal states, such as motivation.
Consider path integration. Suppose that while scouting for a new foraging 
site, a bee keeps track of its position in the relation to the hive, which it repre-
sents as Place 1. Suppose further that the bee finds a source of nectar, and fills its 
crop. In that case, its motivational state (and presumably its information about 
various particulars of its circumstances) would change. It would become moti-
vated to return to the hive rather than search or forage. But suppose that because 
of its change in motivation and circumstances, the bee then represents the hive as 
Place 2. How could the bee?s information about its position in relation to Place 1, 
provided by its path integration system, help it get to Place 2? Or to put it the 
other way around, How would going to Place 2 help the bee get back to Place 1? 
We would either have to reject the supposition that [Place 1] and [Place 2] are 
distinct ways of representing the hive or maintain that the bee would have to be 
sensitive to the fact that Place 1 is identical with Place 2. 
An advocate of the objection under consideration would have to opt for the 
latter alternative.
7
 However, it would be difficult to explain how the bee could be 
208
7
 He or she could, of course, argue that path integration is a special case. But nothing about my 
response is essentially tied to path integration. In other words, my point is such as to force him or 
her to treat a wide range of navigational capacities as special cases and thereby to concede that 
they are in fact typical, not special, cases.
sensitive to the fact that Place 1 is identical with Place 2 without presupposing 
that it has a way of representing the hive which is (at least with regard to the sort 
of case in question) circumstance and motivation independent. In fact, sensitivity 
to that identity would seem to make both [Place 1] and [Place 2] such ways of 
representing. For then it would seem that the bee would have the capacity to es-
timate its position in relation to ?either? place, regardless of its motivation or cir-
cumstances. In other words, an appeal to sensitivity to identity places the oppo-
nent of Systematicity 1 in the position of having to concede the very kinds of ca-
pacities the existence of which he or she wants to question.
Consider also some of the results of Menzel?s vanishing bearing, displace-
ment experiments (Table 6.2). Hive departing and feeder arriving bees which 
were captured in the afternoon (without having filled their crops) and released at 
the morning site were able to adopt the morning-site-to-hive compass heading 
upon release. This suggests that those bees represented the morning site and the 
hive the same way in which they represented them during previous, morning 
foraging excursions and after they had filled their crops. Neither their having 
flown to the morning site nor their having fed there was necessary in order for 
the bees to call up the appropriate homeward vector. Likewise, the bees which 
took the novel shortcut from Site 3 must have represented the hive and the two 
sites in the same way in which they had on previous foraging excursions. Oth-
209
erwise, it would be hard to see how the bees could treat both (or either) of the 
site-to-hive vectors as relevant to the task of returning to the hive from Site 3.
In short, without coherence in the way bees represent various places under 
various external and internal conditions, it?s hard to see how they could exhibit 
the coherence in their navigational behavior that in fact they do.
There?s another worry about Systematicity 1 that requires attention. Why 
suppose that the related items of information are complex? Perhaps estimating 
the solar bearing of a particular foraging site from the hive doesn?t require in-
formation about that site or the hive. Rather, couldn?t the bee just call up the 
relevant solar bearing? The bee might need to recall only information that we 
might express as ?Go along bearing .?
First, remember that the present discussion is solely about content. So I?m 
not assuming that the configurational structure of the vehicles of the relevant in-
formation in question is complex. Second, the crucial fact that needs to be ex-
plained is a bee?s capacity to call up an appropriate vector in a variety of circum-
stances. For example, displaced bees have the capacity to call up a vector the ori-
gin of which is tied to their location prior to their having been displaced and the 
?tip? of which is tied to their original destination. Also, bees displaced from the 
hive to any familiar location have the capacity to return directly to the hive from 
that place. Moreover, bees can return directly to the hive from any type of forag-
210
ing site (nectar, pollen, etc.), and they can directly return to any type of familiar 
foraging site from the hive, even if they last visited that site at least one-day ago 
and have not just been recruited to it. So when bees decide to fly out toward a 
certain familiar destination (say, to a nectar source, if motivated to obtain nectar), 
they don?t just access any of their many vector memories; rather, they access the 
one which will lead them from (what they take to be) their present location to 
another at which a specific type of resource may presently be available. That can 
be explained, it seems, only if the vector and the connected locations are linked in 
memory. That?s the sense in which the remembered information has to be se-
mantically complex.
Much of what I?ve said about the attributions involved in Sytematicity 1 
should be applicable, mutatis mutandis, to the additional cases of systematicity I 
provide below. They can thus be presented more briefly.
The ability of bees to represent various sorts of complex structures provides 
us with further examples of clusters of systematically related capacities. Collett?s 
vector sequence experiments (? 6.2.1) suggest the following hypothesis:
Systematicity 2  If a honeybee has the capacity to learn the flight vector 
sequence ?distance n in direction d, then distance m in direction d*,? then it 
has the capacity to learn the flight vector sequence ?distance n in direction 
d*, then distance m in direction d?, as well as the capacity to learn ?distance 
m in direction d*, then distance n in direction d?.
211
Bees presumably have the capacity to represent a great variety of two-segment 
vector sequences. That bees have that capacity has Systematicity 2 as a conse-
quence, assuming of course that they can represent the distances n and m and the 
directions d and d*. But that?s guaranteed by Systematicity 2?s antecedent.
The results of Collett?s study on the effects of panoramic context on the per-
formance of route flight segments (? 6.2.1) suggest yet another systematicity hy-
pothesis:
Systematicity 3  If a honeybee has the capacity to learn the route sequence 
?distance n to landmark L, then distance m to landmark L*?, then it has the 
capacity to learn any of the route sequences (i) ?distance m to L, then dis-
tance n to L*?, (ii) ?distance n to L*, then distance m to L?, and (iii) ?distance 
m to L*, then distance n to L?.
The case for this hypothesis proceeds along the same lines as the justifications for 
Systematicies 1 and 2. As long as bees can represent the distances n and m and 
the landmarks L and L*, the consequent of Systematicity 3 follows from the ca-
pacity of bees to learn the lengths of a great variety of route segments. And that 
bees can represent those particular distances and landmarks is guaranteed by the 
antecedent of the hypothesis.
Here are two more systematicity hypotheses:
Systematicity 4  If a honeybee has the capacity to learn the sequence of 
positive stimuli ?white, then blue, then black?white vertical stripes?, then it 
has the capacity to learn any of the sequences ?white, then black?white 
212
vertical stripes, then blue?, ?blue, then white, then black?white vertical 
stripes?, and so on. 
Systematicity 5  If a honeybee has the capacity to learn that the sucrose 
concentration of Feeder 1 is greater than that of Feeder 2, then it has the 
capacity to learn that the sucrose concentration of Feeder 2 is greater than 
that of Feeder 1.
Systematicity 4 is based on the results of Collett?s visual-sequence learning ex-
periments (? 6.2.2.1), which strongly suggest that bees can represent arbitrary 
sequences of visual stimuli. Systematicity 5 is based on Wei?s study of learning 
flight modulation (? 6.1.3), which suggests that bees can represent arbitrary rela-
tive levels of sucrose concentration.
8
 As with Systematicities 1?3, for each of these 
two hypotheses, the existence of systematically related specific capacities is in-
ferred from the existence of a more general capacity.
7.2  Weak Systematicity and the Tracking Argument
So far I?ve restricted my discussion to a relatively strict form of systematicity, 
requiring that systematic variants be formal permutations of each other. But dis-
cussions of systematicity are often about one or another somewhat weaker no-
tion, one that does not have a formal-permutation requirement. These weaker 
notions focus on the nonarbitrariness of the semantic relations among represen-
213
8
 Bees are also capable of learning the relative flow rates of different feeders as well as the relative 
amounts of reward available from different feeders. See Greggers and Menzel 1993 and Greggers 
and Mauelshagen 1997.
tations. The central idea is that an organism?s capacity to acquire information 
about a certain domain exhibits systematicity if the following is the case:
If the organism has the capacity to acquire the information that a certain 
individual has a certain property (or stands in a certain relation), then it 
has both the capacity to acquire the information that that individual has 
any of a variety of different properties (or stands in any of a variety of dif-
ferent relations) and the capacity to acquire the information that any of a 
variety of individuals has that property (or stands in that relation).
More formally,
If the organism has the capacity to represent that a has the property (or 
stands in the relation) F, then there are other properties (or relations), G
1
, 
G
2
, ?, G
n
, and other individuals, b
1
, b
2
, ?, b
m
, such that it has the capacity 
to represent that a is G
1
, that a is G
2
, ?, and that a is G
n
, and that b
1
 is F, 
that b
2
 is F, ?, and that b
m
 is F.
In short, an organism?s capacity to acquire information about a certain domain 
exhibits systematicity if it comprises specific capacities to acquire any of a plu-
rality of items of information having a common semantic constituent in the same 
semantic structural role. Call this sort of systematicity ?weak? systematicity. Note 
that weak systematicity is not the claim that for any a, b, F, and G, if an individual 
can represent that a is F and that b is G, then it can also represent that b is F and 
that a is G. This stronger claim, applied to humans, has the questionable conse-
quence that if someone can think both that John plays guitar and that the number 
two is an even number, then they can thereby think both that the number two 
plays guitar and that John is an even number. Weak systematicity, on the other 
214
hand, does not require that if a bee can learn that a nectar source is 200?m from 
the hive, then it can learn that a nectar source is 200?m from the brood chamber. The 
information that bees use to find their way around the hive might not be accessi-
ble to their large-scale navigational systems; but that has no bearing on whether 
their large-scale navigational capacities are systematically related. 
Note that the explanations of systematicity presented in Chapter 2 apply, 
mutatis mutandis, to weak systematicity as well. For those explanations are fun-
damentally explanations of how it is that mental representations have various 
types of constituent structures and of how it is that the semantic relations among 
them are nonarbitrary.
7.2.1  The Tracking Argument
Horgan and Tienson?s tracking argument for a ?language? of thought
9
 may be 
viewed as appealing to weak systematicity. They argue that some organisms 
have to have at least some representations which are semantically complex. Fur-
thermore, in terms of Cummins? distinction between pure encodings, structural 
encodings, and structural representations (? 5.3), they argue, in effect, that such 
representations cannot be pure encodings but must be either structural encodings 
or structural representations of what they represent.
215
9
 Horgan and Tienson 1996, pp. 81?83.
Note that, although Horgan and Tienson mean to show that there must be 
mental representations having ?language-like,? or ?syntactic,? structure, under 
the rubric ?language-like,? Horgan and Tienson include non-Classical representa-
tions, such as tensor products. Roughly speaking, on their use of the term, a sys-
tem of representation is language-like if it can be used to encode syntactic struc-
ture in a way that allows the encoded structures to be recoverable from the repre-
sentations (but does not require that they ever be recovered). Again, I reserve the 
use of terms such as ?syntactic? for the actual configuration of representations at 
the representational level of description.
One of Horgan and Tienson?s favorite ways to state the tracking argument is 
in terms of navigational capacities. Any organism that exhibits complex and 
flexible navigational behavior must acquire a great deal of information about 
many particular things and places in its locale, such as landmarks and foraging 
sites. It must have information about their locations in relation to itself and to 
certain other objects. It also needs information about many of their other proper-
ties, such as appearance and value as a resource. Furthermore, such an organism 
must be able to acquire new information as circumstances warrant. For resource 
values change; some landmarks move, become temporarily hidden, or disappear; 
and the organism itself might move to an altogether different area. So the organ-
ism would have to have the capacity to attribute different properties and rela-
216
tions to the same objects at different times. It would also need the capacity to at-
tribute to newly encountered objects the same properties and relations it has at-
tributed to other objects. In addition, every item of acquired information must 
have a content-appropriate causal role. It does no good to learn that the position 
of a landmark has changed if that information, in relation to other information 
possessed by the organism, is not appropriately efficacious in guiding its behav-
ior. Finally, note that the organism must have such capacities not only for the en-
vironment it actually inhabits but also for any possible environment it might 
have found itself in.
Horgan and Tienson maintain that all this is possible only if the mental rep-
resentations that encode the information have some sort of ?language-like,? 
representational-constituent structure, whether it be concatenative or noncon-
catenative. The only way for the organism to acquire all the information it needs 
on an ongoing basis, while reliably maintaining the content-appropriate causal 
efficacy of its information bearing states, is to have the corresponding represen-
tations be ?constructed,? as needed, out of representational constituents.
From many of the findings examined in the preceding chapter, it should be 
clear that the navigational abilities of the honeybee are sophisticated and flexible 
enough for it to be among the organisms to which the tracking argument applies. 
Those abilities do indeed depend on weak-systematically related capacities to 
217
acquire information relevant to wayfinding. Thus, bees can track the location of a 
place of interest, even though its solar bearing in relation to the hive continually 
changes. Also, by means of path integration, bees in flight can keep track of their 
continually changing location in relation to the hive, a landmark, or the place of 
their release. They can learn to relate the solar ephemeris for their locale to the 
different landscape features visible at different locations. Apis Mellifora has the 
capacity to reference its waggle runs to landscape features, though this capacity 
is exercised, as far as we know, only under experimental conditions. Local, iso-
lated changes in the area of a goal (say, the appearance or location of nearby 
landmarks) need not prevent bees from searching at the correct location. Fur-
thermore, as long as they have a means of individuating certain reward sites, 
bees can track changes in the relative value of those rewards.
The capacity of A. Mellifora to learn to reference its waggle runs to landscape 
features illustrates the fact that current capacities need not match up with current 
abilities. Without training, A. mellifora presumably is unable to orient its waggle 
runs to landscape features. Nonetheless, its ability to learn the task shows that it 
has the prior capacity to do so. Note that such unexercised capacities of an or-
ganism are just what one should expect if related capacities of that organism ex-
hibit a certain form of systematicity.
218
I won?t bother to spell out more formally all of the weakly systematic way-
finding capacities of bees. Here are just two:
If a bee has the capacity to learn that a feeding site is at certain direction 
and distance from the hive, it also has the capacity to learn that that very 
site is at a different direction and distance from the hive.
If a bee has the capacity to learn that the sun?s azimuth is at one location 
(in relation to the landscape) at a given time, it also has the capacity to 
learn that it is at a different location at that time.
Clearly, there are many other plausible hypotheses of this sort.
The station shift experiments of Gould and Dyer provide particularly good 
support for the weak systematicity of bee navigational capacities (? 6.1.3). Recall 
that when Gould changed the compass direction of the feeding station by about 
30?, the bees adjusted their waggle dances gradually, until they correctly indi-
cated the new solar bearing. Some of the bees in Dyer?s experiments (which em-
ployed a 90? shift in the direction of the feeding station) also showed gradual re-
orientation. This suggests that the bees updated their information about the lo-
cation of the site by updating their information about the location of what for 
them was one and the same site.
The bimodal dances reported by Dyer have the same implication. The bees 
that performed bimodal dances had returned from just the one site. So its quite 
likely that their dances communicated what for them was the location of that one 
site. Yet the dances alternately indicated two very different solar bearings, one 
219
presumably based on their memory of the solar bearing of that site in relation to 
the landscape and the other based on their very recent experience of its actual 
solar bearing. It?s possible that the dances were a result of the bee?s memory of 
the location of the ?old? site competing with their newly acquired information 
about the location of the ?new? site. That is, the bees might have been confused 
about which of two sites?what from their point of view were two sites?they 
had just visited, rather than about the location of the one site. But I find this pos-
sibility to be highly unlikely. For not only the station but also the field edge 
would have changed in orientation. The bees flew along the landmark that had 
always led to the station, and they found the station at its usual place in relation 
to that landmark. Further, there?s no other evidence that bees which have just 
returned from a successful foraging trip ever dance to indicate the location of a 
site other than the one from which they have just returned.
Wei?s learning flight modulation study also provides good support for the 
weak systematicity of bee navigational capacities. That the learning flights of the 
bees increased in duration after an imposed increase in search time, and that the 
decay rate of their learning flights after such increases was significantly faster 
than the decay rate of their initial learning flights, suggests that the bees updated 
their information about the location of the feeder in light of their past experience 
of it. After an increase in search time or a change in location of the land-
220
mark?feeder array, they did not treat the feeder and associated landmarks as if 
they were situated at a newly discovered site; rather, they behaved as if they in-
tegrated remembered and newly acquired information about what for them was 
one and the same place.
The important point here is that a bee?s remembered information about a 
particular place (or object) and any of it?s newly acquired information about 
what we would say is the same place, really are, for the bee, two pieces of infor-
mation about the same place. Which is to say that the semantic relations between 
such remembered and newly acquired information are nonarbitrary.
To see the force of the tracking argument, just consider how difficult it 
would be to explain certain behaviors if the semantic relations between remem-
bered information and new information about what is, in reality, one and the 
same object or place were arbitrary. Suppose I remember that my coffee mug is 
on my desk. But when I go to get it, I see that it is no longer there. Believing that 
it was washed and put up, I go to the kitchen and find it in the cupboard. Now 
suppose that the content of my memory about the location of the mug was [my 
mug is on my desk], but that the content of my newly acquired information 
about (what is in reality) the mug, when I found that it was no longer on my 
desk, was [Paul?s copy of The Last of the Mohicans is probably somewhere in Aus-
tralia]. If that?s the case, then it would appear to be a bit difficult to explain why I 
221
went to the kitchen to look for my mug, rather than to Australia to look for Paul?s 
book. Clearly, it would be quite difficult for someone to consistently find their 
way to important resources or places if the semantic relations among their items 
of information about important locations were arbitrary.
But how does the need for flexible navigational capacities to be weakly sys-
tematic support the claim that the organism?s mental representations need to be 
structural encodings or structural representations of what they represent, rather 
than pure encodings? The trouble with pure encodings is that any correspon-
dence between their nonsemantic, physical properties and their contents is 
purely accidental. So even if the items of informational content acquired by an 
organism happen to be systematically related, if its mental representations are 
pure encodings, the presence of that systematicity would also be purely acci-
dental. It could not be explained in terms of the nonsemantic, physical properties 
of its pure encodings. 
To spell this out just a bit more, suppose that my mental representations are 
pure encodings. Suppose further that the bearer of the content of my belief that 
my mug is on my desk is , and  that the bearer of the content of my belief that 
my mug is in the kitchen is . How could my cognitive system know which be-
lief to act on, or even that they conflict? For, by hypothesis, those two representa-
tions need not share any cognitively efficacious, nonsemantic, physical proper-
222
ties. Thus, there need not be any way for my cognitive system to detect that those 
two representations share a representational constituent. The semantic relations 
between  and  might as well be arbitrary, even if they are not. 
7.3  Systematicity and Semantic Structural Roles
The ?strong? systematicity of honeybees? capacities to acquire various sorts of 
navigation-related information is possible only if the mental representations that 
encode the information have some sort of representational-constituent structure, 
whether it be concatenative or nonconcatenative. The same is the case for weak 
systematicity. As Horgan and Tienson maintain, the only way for the organism to 
acquire all the information it needs on an ongoing basis, while reliably main-
taining the content-appropriate causal efficacy of its information bearing states, is 
to have the corresponding representations be ?constructed,? as needed, out of 
representational constituents.
Complex semantic structure requires that representational constituents have 
certain semantic structural roles. This should be relatively noncontroversial, 
though it?s worth emphasizing in order to see some of the sorts of structural roles 
bee representational constituents need to play. In Section 7.4, I argue that honey-
bee information processing is sensitive to those structural roles.
223
7.3.1  Distinguishing Systematic Variants
Consider Systematicities 1 and 5:
Systematicity 1 If a honeybee has the capacity to estimate that the solar 
bearing of a particular foraging site from the hive is, say, 45? west of the 
sun, then it also has the capacity to estimate that the solar bearing of the 
hive from that site is 45? west of the sun.
Systematicity 5 If a honeybee has the capacity to learn that the sucrose con-
centration of Feeder 1 is greater than that of Feeder 2, then it has the ca-
pacity to learn that the sucrose concentration of Feeder 2 is greater than 
that of Feeder 1.
It should be clear that the semantic structure of representations that are system-
atically related in either of the above two ways must be something other than the 
structure of a non-ordered set, such as {hive, Site S, 45?, west, sun}. For such a 
structure wouldn?t allow the bee to distinguish [The solar bearing of Site S from 
the hive is 45? west of the sun] from [The solar bearing of the hive from Site S is 
45? west of the sun]. Since solar bearing is an asymmetrical relation, the constitu-
ents [hive] and [Site S] must play different structural roles in those contents. 
Since having greater sucrose concentration is also an asymmetrical relation, the 
constituents [Feeder 1] and [Feeder 2] must also play different structural roles in 
[Feeder 1 has a greater sucrose concentration than Feeder 2].
The need to distinguish weakly systematic variants also requires that repre-
sentational constituents have certain structural roles. Suppose that a bee acquires 
the information [The bearing of Site S from the hive at time t is 45? west of the 
224
sun]. The bee must be sensitive to the fact that that information is distinct from 
both [The bearing of the hive from Site S at time t is 225? west of the sun] and 
[The bearing of Site S from from the hive at time t is 225? west of the sun]. For 
only the second of the three can guide the bee back to the hive from Site S.
Consider also Systematicities 2?4:
Systematicity 2 If a honeybee has the capacity to learn the flight vector se-
quence ?distance n in direction d, then distance m in direction d*,? then it 
has the capacity to learn the flight vector sequence ?distance n in direction 
d*, then distance m in direction d?, as well as the capacity to learn ?distance 
m in direction d*, then distance n in direction d?.
Systematicity 3 If a honeybee has the capacity to learn the route sequence 
?distance n to landmark L, then distance m to landmark L*?, then it has the 
capacity to learn any of the route sequences (i) ?distance m to L, then dis-
tance n to L*?, (ii) ?distance n to L*, then distance m to L?, and (iii) ?distance 
m to L*, then distance n to L?.
Systematicity 4 If a honeybee has the capacity to learn the sequence of 
positive stimuli ?white, then blue, then black?white vertical stripes?, then it 
has the capacity to learn any of the sequences ?white, then black?white 
vertical stripes, then blue?, ?blue, then white, then black?white vertical 
stripes?, and so on. 
Each of these systematicities concerns a capacity to acquire information about a 
certain kind of sequence. For sequences, order is crucial. The bee needs to be sen-
sitive to which element of the sequence is first, second, or third, and so on. And 
that could be the case only if each constituent of the relevant information plays a 
certain place-in-the-sequence structural role.
225
7.3.2  ?What? and ?Where?
There are other sorts of structural roles for bee representational constituents. For 
example, the representation forming processes responsible for producing infor-
mation about the location of a particular place in relation to another must com-
bine two constituents about those two respective places with a constituent about 
a certain direction and a constituent about a certain distance. Those processes 
must combine, as it were, two ?what? constituents with two ?where? constitu-
ents, rather than two what constituents with two more what constituents, or one 
what constituent with three where constituents, and so on.
Likewise, that there are certain bee psychological processes dedicated to 
manipulating information about direction (and not about distance, resource 
value, color, and so on) suggests that different bits of information about different 
directions share a special property to which those processes are sensitive. In or-
der to be reliable, such processes must be able to distinguish information about 
direction from other kinds of spatial information as well as from non-spatial in-
formation. Consider further the ability of bees to solve matching- (and non-
matching-) to-sample tasks (? 6.2.2.2). A rule such as, ?Choose the x-marked arm 
if x was at the entrance,? plausibly could not operate on, say, information strictly 
about distance. For example, the variable in such a rule is quite unlikely to be 
replaceable by the content [200?m]. 
226
7.3.3  Indexicals
In the case of humans, the contents of mental representations that we express 
through the use of proper names or indexicals have a different sort of semantic 
structural role than the contents we express through the use of predicates. Might 
there be anything like this distinction in the case of bee mental representations? 
It?s plausible that there is. In this section I propose that some bee representations 
have an indexical-like element as a semantic constituent.
In the last chapter we saw that bees have the capacity to learn a variety of 
route segments. They can learn vector sequences as well as landmark-to-land-
mark and landmark-to-foraging-site route segments (? 6.2.1). They can learn the 
distance and direction from the hive of various local landmarks?their general 
landscape memory (? 6.2.3.2). Also, when released at an unfamiliar location, they 
are able to track their location with respect to it by means of path integration, al-
lowing them to periodically return there during their search flight (? 6.2.4). To-
gether, all this evidence clearly indicates that bees are capable of keeping track of 
their location with respect to an arbitrarily broad range of types of places.
Its perhaps universally acknowledged that path integration requires an ac-
cumulator that tracks a foraging or exploring bee?s distance and direction from 
the hive. However, in light of the sort of evidence just mentioned, there?s also a 
need for one or more local accumulators that work in tandem with the main, 
227
global accumulator.
10
 A local accumulator might work just like a global accumu-
lator, except that its origin can be set at a variety of locations, rather than just at 
the current hive location. Alternatively, local vector information could be the 
product of a system that monitors the global accumulator, comparing its values 
at different places along a route, and deriving the distances and directions be-
tween them. Whatever the case, local vector information needs to be tied to vari-
ous specific locations, such as a salient local landmark or the place of release after 
displacement. 
Now, consider a bee that is learning a multisegment route, one that takes it 
from the hive to solitary tree in a clearing, then to a large boulder, and then to a 
landmark array that marks the foraging site. While learning this route, the bee 
also learns the tree-to-boulder flight vector and the boulder-to-site flight vector. 
That is, in addition to learning to fly to the tree, then to the boulder, and then to 
the site, it also learns the distance and direction of the tree from the hive, the 
boulder from the tree, and the site from the boulder. In each case, the origin of its 
local accumulator is tied to a different place. Since the bee is learning the flight 
vectors in question, it would appear that it needs to explicitly represent informa-
tion such as [100?m and 45? east of the sun from tree] while in flight. That is, local 
vector information must be tied to specific place information. The learning mech-
228
10
 Collett (T. S.) and Zeil 1998, Collett (M.) et al. 2002.
anism in question, then, would appear to require representations that provide 
distance and direction information in relation to the value a variable whose in-
stances are representations of places, representations of the (semantic) form: 
[distance n and direction d from place x]. 
It is not unreasonable to suppose that the value of the place variable in such 
a representation sometimes has an indexical-like semantic role. For it?s possible 
for a bee?s local-vector learning mechanism to be active without it?s being tied to 
any specific place features. That might occur if the bee is released at a featureless, 
uniform, unfamiliar location. Or that might occur when a displaced bee, after 
playing out its (say) feeder-to-hive vector, arrives at a featureless, uniform loca-
tion that would have been the location of the hive in the absence of displace- 
ment.
11
 (Another possible occurrence is presented in Section 7.4.) It seems to be a 
live hypothesis, then, that some bee representations are of the form [distance n 
and direction d from there].
Indeed, one might well wonder how vector navigation is possible without 
(semantic) indexicals. Information about the distances and directions between 
various places is not going to be useful to you unless you know where you?re at. 
Thus, a bee might have stored the information [Site S to hive: 200 m and 30? west 
of the sun]. But if the bee, upon departing from the hive for Site S, is displaced to 
229
11
 Compare the search behavior of desert ants in just such circumstances (Wehner and Srinivasan 
1981, M?ller and Wehner 1994).
Site S, that information won?t help it get back to the hive unless it can also ac-
quire the information [here is at Site S]. Moreover, it?s not acquiring that infor-
mation, but retaining the information [here is at hive], would explain it?s setting a 
course, upon release at Site S, that would have taken it from the hive to Site S in 
the absence of displacement.
7.4  Operations on Semantic Constituents of Complex Representations
As I argued in the previous section, bee representational constituents have vari-
ous sorts of semantic structural roles. There is a corollary to this claim regarding 
information processing in the honeybee, namely, that some of those processes 
must be structure sensitive. They must be sensitive to the structural roles of rep-
resentational constituents. In this section, I provide what I take to be specific 
examples of such processes.
Recall that Menzel has shown that bees are capable of adopting novel routes 
to a feeder upon determining their location in relation to the hive (? 6.2.4). A sig-
nificant fraction of the novel flight trajectories to the feeder were straight, 
whereas a majority consisted of two flight segments (Fig. 7.1). The initial segment 
of two-segment flights resembled the trained hive-to-feeder vector. The second 
segment resembled the vector that would have led the bee to the hive from the 
homing flight?s point of origin. 
230
Straight shortcuts to the feeder are explanable by the hypothesis that the 
bees summed their present-location-to-hive vector with their hive-to-feeder vec-
tor. Two-segment novel routes are explanable by the hypothesis that the bees 
flew those two vectors rather than summed them. What?s particularly intriguing 
about the latter possibility is that the bees would have first flown their hive-to-
feeder vector from a place that was not the location of the hive to a place that was 
not the location of the feeder (Fig. 7.1). Furthermore, they would then have flown 
along a vector that was originally hive directed but was now feeder directed. So, 
as I am about to propose in more detail, not only was the route flown a novel 
shortcut, it was, on the present hypothesis, a result of a novel combination of 
flight vector memories and their semantic constituents.
12
 
231
12
 Results of earlier experiments by Collett (T. S.) et al. (1993) hinted at the possibility that bees 
have the capacity to combine memories of route segments in novel ways.
Figure 7.1. Novel metric shortcuts contrasted with novel complex routes. (Left) A straight short-
cut (solid arrow) from a recognized landmark (L) to the feeder (F) is the sum of the landmark-to-
hive (H) vector (V
1
) and the hive-to-feeder vector (V
2
). (Right) The first leg of a two-segment 
novel route, from a recognized landmark to the feeder, is the original hive-to-feeder vector (V
2
). 
Since the bee starts at the landmark rather than the hive, the first leg leads the bee to a place (x) 
that is neither the hive nor the feeder. The second leg is the vector that would have led from the 
landmark to the hive (V
1
).
Suppose that a bee, while searching for the hive, encounters a landmark the 
perception of which causes the bee to recall, from its general landscape memory, 
the vector that leads from that landmark to the hive. Say that the content of that 
memory is [landmark L-to-hive: 100 m northeast]. However, the bee has become 
motivated to find the feeder (perhaps because its energy reserves are becoming 
depleted). So the bee?s new motivational state causes it also to recall its hive-to-
feeder flight vector, the content of which we may express as [hive-to-feeder: 
200?m east]. But the bee doesn?t merely fly the hive-to-feeder vector and search 
for the feeder upon its completion. It flies that vector and then the vector than 
would have led it to the hive from the recognized landmark. The hypothesis, 
then, is that from the stored information,
[landmark L-to-hive: 100m northeast]
[hive-to-feeder: 200m east]
the bee constructs the ?flight plan,?
[landmark L-to-x: 200m east, then x-to-feeder: 100m northeast].
That is, the bee learns how to get to the feeder from its location at the landmark 
by recombining, in a novel way, some of the semantic constituents of information 
previously acquired. Correlatively, there must be information manipulating 
processes that operate on the remembered information in question. Note that if 
232
this hypothesis is correct, the bee?s flight plan has an indexical-like element as a 
semantic constituent, in accordance with the possibility, mentioned above
(? 7.3.2), that a bee?s local-vector learning mechanism can be active without it?s 
being tied to any specific place features. The bee?s construction of the flight plan 
on the basis of its stored information would also seem to require that the bee rely 
on information such as [here is at landmark L].
Another possibility is that the bee arrives at the feeder by combining con-
stituents of the stored information,
[landmark L-to-hive: 100m northeast]
[hive-to-feeder: 200m east]
so as to construct the flight plan,
[200m east, then 100m northeast].
But, crucially, even on this weaker hypothesis, the derived vector is a combina-
tion of semantic constituents of the stored vectors. 
Vector averaging also involves manipulation of vector memory semantic 
constituents. First, vector information operations such as vector averaging and 
vector addition (as in, for example, path integration [? 6.1.1]) require manipula-
tion of the distance and direction semantic constituents of the relevant vectors. 
For it?s only by manipulation of those constituents that the resultant vector can 
233
be derived. But vector averaging, as hypothesized to have been performed by the 
novel-shortcut bees in Menzel?s vanishing bearing, displacement study (? 6.2.3), 
might also involve further alterations. The bees could have manipulated the two 
feeder-to-hive vectors so as to obtain a present-location-to-hive vector. Or they 
could have averaged the two hive-to-feeder vectors and then reversed the direc-
tion of the result to obtain a present-location-to-hive vector. 
Giurfa?s Y-maze experiments provide evidence in support of the claim that 
bees can acquire constitutent-structure sensitive rules (see also below [? 7.5.2]). 
Recall that the bees appeared to acquire rules along the lines of ?Choose the x-
marked arm if x is at the entrance.? If that?s correct, it?s reasonable to propose 
that the bees, in performing the delayed matching-to-sample task, relied on a 
rule and representations with the following contents:
Learned rule [Choose the x-marked arm if x is at the entrance.]
Current information [Odor O is at the entrance.]
Instantiated rule [Choose the O-marked arm if O is at the entrance.]
Motor command [Choose the O-marked arm.]
This would be a clear example of structure-sensitive reasoning, regardless of 
whether or not the representations having the last three contents are thought of 
as being processed strictly in sequence or, to some extent, in parallel. 
234
One of Collett?s maze experiments, together with other available evidence, 
makes the possibility that honeybees are capable of transitive reasoning worthy 
of investigation. Recall that Collett trained bees to negotiate a three-compartment 
maze by choosing the correct stimulus for each compartment (? 6.2.2.1). Collett?s 
results strongly suggest that the bee?s learned the compartment-to-compartment 
sequence of positive stimuli, rather than behaved in accordance with sequentially 
recalled memories. Now, recall that, for one set of experiments, bees were trained 
with yellow paper marking the entrance to the boxes (which was always on the 
left), white (positive) and black (negative) in the first box, blue (marking the only 
exit and always on the right) in the second, and vertical (positive) and horizontal 
(negative) in the third (Fig. 6.5). The test I draw your attention to is the one in 
which bees chose between white and vertical in the middle box. The back box 
remained the same as in training, whereas the front box was made to look as 
similar as possible to the middle box in training, with blue on the right marking 
the only exit. Nonetheless, the bees preferred white in the middle box and verti-
cal in the back box. They did not, then, simply associate the perceived character-
istics of the middle box in training with the succeeding, vertical positive stimu-
lus. Rather, they appear to have stored a representation having a content corre-
sponding to [white before blue and blue before vertical].
235
If in fact this is correct, then there is a possibility that the bees? having pre-
ferred white when tested in the middle box was a result of a kind of transitive 
reasoning process. From [white before blue and blue before vertical], the bees 
might have derived [white before vertical]. Of course, it is also possible that the 
bees independently learned, rather than derived, [white before vertical]. How-
ever, what makes the possibility of transitive reasoning here one to be taken seri-
ously is that, although bees learn route-segment sequences, they appear to learn, 
and certainly perform, individual route segments independently.
For example, in Collett?s channel experiments (? 6.2.1), the bees learned the 
landmark-to-landmark route segment and the landmark-to-feeder route seg-
ment, but didn?t appear to learn the first-boundary-to-feeder route segment. 
Note also that, for all tests in the first series, regardless of the types of landmarks 
employed, the bees searched at the training distance from the final landmark. 
That they did so, regardless of the distance from the channel entrance to the first 
landmark, confirmed earlier findings
13
 that bees? searches are sometimes con-
trolled by a local vector extending from a particular landmark to the place, rela-
tive to that landmark, where the goal had been. And, in Collett?s vector sequence 
experiments (? 6.2.1), in both standard and displacement tests, when the position 
of the first turn in an individual bee?s flight path differed from the correct loca-
236
13
 Srinivasan et al. 1997.
tion, there was a slight tendency for the position of the second turn to differ from 
the correct location by the same amount. The second flight segment, then, did not 
appear to correct for any inaccuracies in the first. 
The following appears to be a clear case of explicit goal information inter-
acting with additional, explicit locational information in order to yield an action. 
A recruitee reads a dance indicating?[200 meters from the hive, at 30? west of the 
sun]. So it acquires, as an explicit goal,?[200 meters from the hive, at 30??west of 
the sun]. Noncontroversially, this needs to be explicit. The bee then heads for the 
stated location, only to find the way blocked, perhaps by a high, steep bluff. It 
then detours around the obstacle. Its path-integration accumulator coordinates 
will give it its current position with respect to the hive (also explicit), which must 
be compared with the explicit goal coordinates, in order to give the bee the nec-
essary heading and direction to take once clear of the obstacle. We thus have cur-
rent information interacting with explicit goal information to yield an action. We 
also another example of a process operating on representational constituents.
14
7.5  Algebraic Rules: An Introduction to Modelling Issues
In Chapters 3?5, I argued that Connectionist-style explanations of systematicity 
do not have an explanation of systematicity per se, and that they are unprinci-
237
14
 For evidence supporting the occurrence of this sort of vector subtraction in hamsters, see Eti-
enne et al. 1998. For such evidence in the case of ants, see Schmidt et al. 1992.
pled in the sense that they appeal to mechanisms that are arbitrary with respect 
to Connectionism. Smolensky architectures, for example, appeal to structural-role 
vectors and operations defined over them. Such architectures are, in that sense, 
nonstandard Connectionist architectures. I?ve argued that an appeal to nonstan-
dard Connectionist mechanisms is necessary in order to explain systematicity
(?? 4.2 and 5.3). Connectionist theorists, though, will no doubt persist in at-
tempting to capture systematicity with more standard architectures. Whether or 
not they will succeed without implementing Classical representations or rules is 
an empirical issue. So far, they have not succeeded;
15
 and there may be principled 
explanations for the lack of their success.
16
I leave a full discussion of modelling issues for a later occasion. But it?s 
worth taking a look at one important issue that needs to be addressed, namely, 
whether standard Connectionist architectures are capable of freely generalizing 
universally quantified one-to-one mappings. (We?ll see what this issue is about 
shortly.) For, first, the issue of systematicity is related to issues of generalization. 
In accordance with a point made by Hadley,
17
 systematically related capacities 
require (or perhaps are) capacities to generalize previously acquired informa-
238
15
 Hadley (2002, 2004) shows that the most successful models (including his own) employ Classi-
cal representations or rules.
16
 Phillips 1998, Phillips and Halford 1997.
17
 Hadley 1994.
tional structures to novel informational constituents. For example, if you?ve ac-
quired the capacity to think that Andy loves Betty, and you later acquire an ad-
ditional concept with the content [Carol], then you also acquire the capacity to 
think that Andy loves Carol. Second, as I?ll make clear below, honeybees have the 
capacity to freely generalize certain universally quantified one-to-one mappings.
7.5.1  Algebraic Rules and Free Generalization
Marcus reminds us that there is much evidence that people can freely generalize 
universally quantified one-to-one mappings.
18
 Such a mapping is a function that 
yields a unique value for every item in its domain. The identity function, f(x) = x, 
is a clear example. To say that people can freely generalize such a function is to 
say that they can determine it?s value for any item in its domain, regardless of 
whether or not they have previously encountered that item. For example, English 
speakers can form the progressive of any English verb stem by suffixing ?-ing? to 
it, even if the verb stem is entirely new to them.
Free generalization of a universal one-to-one function seems to require exe-
cution of a rule that operates on instances of variables, what Marcus calls an al-
gebraic rule. Operations that rely on encoded one-to-one mappings between 
particulars (such as could be contained in a look-up table, for example) would 
not suffice. Such operations simply do not permit generalization to novel par-
239
18
 Marcus 2001, pp. 36?39.
ticulars. For novel particulars, by definition, are just those for which there is no 
prior encoded mapping.
On the other hand, free generalization comes naturally to a system that exe-
cutes algebraic rules. For such a rule is applicable to any input-variable instance, 
regardless of whether or not the instance is novel to the system. As long as the 
rule is a good one, it will yield appropriate outputs for novel inputs.
Bees, it seems, are also able to freely generalize universally quantified one-
to-one mappings. We?ve seen that bees can freely generalize the solar ephemeris 
for their locale (? 6.1.2). That is, on the basis of limited exposure to the sun, their 
solar ephemeris learning mechanism produces a record that allows them to esti-
mate the azimuthal position of the sun at times when have not seen it or never 
can see it. Also, Guirfa?s Y-maze experiments showed that bees can solve delayed 
matching-to-sample tasks and delayed non-matching-to-sample tasks, where 
their solutions allow them to generalize to novel stimuli, even across sensory 
modalities (? 6.2.2.2). Again, his results suggest that the bees can acquire rules 
that operate on instances of a variable. Furthermore, rules such as ?Choose the x-
marked arm if x was at the entrance? and ?Choose the non-x-marked arm if x 
was at the entrance? are universally quantified one-to-one functions.
Marcus provides a strong case for his thesis that standard connectionist net-
works (whether local or distributed), trained by standard connectionist learning 
240
algorithms, cannot freely generalize universal one-to-one functions unless they 
implement algebraic rules. He first provides theoretical considerations in support 
of his thesis. He then examines various models which attempt to account for ex-
perimental results with respect to a variety of human cognitive tasks (such as 
linguistic inflection), where successful performance appears to require the ability 
to freely generalize. He argues that the most successful models implement rules 
for computing universal one-to-one functions, whereas the unsuccessful models 
do not. Here I present only his theoretical argument. I then show that his argu-
ment applies fairly straightforwardly to a network model of solar ephemeris 
learning proposed by Dickinson and Dyer.
19
 I also briefly discuss the implica-
tions of his argument for modeling Giurfa?s Y-maze results.
Marcus? theoretical thesis is that the training independence exhibited by 
standard connectionist networks entails that a multiple-node-per-input-variable
20
 
connectionist model can learn to compute a certain universal one-to-one function 
only if every input node and output node is exposed, during training, to at least 
some items in that function?s domain. Roughly, training independence exists 
241
19
 Dickinson and Dyer 1996.
20
 Marcus treats single-node-per-input-variable models separately. He shows that such models are 
natural candidates as hypotheses about how algebraic rules could be implemented in networks. 
As such, they do not constitute an alternative to models having Classical architecture. Smolensky 
makes a similar claim about local connectionist models: ?The theory of ? local connectionist 
networks is so intimately associated with the classical theory of computation and automata that 
drawing any principled boundary between them may well be impossible? (1995c, p. 231).
when: (1) adjustment of the connection weights (training) for some input nodes 
occurs independently of adjustment of the connection weights for other input 
nodes (input independence); and (2) adjustment of the connection weights for some 
output nodes occurs independently of adjustment of the connection weights for 
other output nodes (output independence).
Training independence, according to Marcus, is a logical consequence of the 
nature of the standard connectionist learning algorithms, such as backpropaga-
tion and Hebbian algorithms. Learning that occurs through the use of such algo-
rithms is local. During training, the weight of a given connection is altered as a 
function of information that is locally available to that connection. Connections 
are not given access to the activation values of nodes to which they do not con-
nect, nor are they given access to the weights of other connections. As a result, 
successful training adjustment of the connection weights for some subset of a 
network?s input (or output) nodes need not transfer to the connection weights for 
its other input (or output) nodes. As Marcus puts it, standard connectionist net-
works are unable to generalize universal one-to-one functions between nodes.
7.5.2  Free Generalization in Bees
Dickinson and Dyer claim to have provided what they consider to be a nonim-
plementational connectionist model of how bees learn the local solar ephem- 
242
eris.
21
 The connectivity structure of the core of Dickinson and Dyer?s model is 
partially illustrated in Figure 7.2. The most active node in the inner ring repre-
sents the time of day. The most active node in the outer ring represents the azi-
muth. The outer ring receives its inputs from the visual system. The inner ring 
receives its inputs from the circadian clock. Each time node is connected with 
every azimuth node. There are also connections within each ring.
During the learning process, the connection between the most active time 
node and the most active azimuth node is strengthened relative to the other time-
azimuth connections (a Hebbian learning algorithm seems sufficient for this pur-
pose). Also, the connection within each ring between its most active node and its 
180? (12-h) opposite is strengthened relative to the other connections within that 
ring. The relative strengthening of intra-ring connections, according to Dickinson 
243
21
 Dickinson and Dyer provisionally devised a multilayer-perceptron model of solar-azimuth 
learning. Unlike bees, the model could not learn to estimate the position of the solar azimuth at 
night. That is, it could not generalize beyond times of day that did not occur within its training 
set. Dickinson and Dyer regarded this as ?a fatal flaw of the model, and of any model that re-
quires exposure to examples of complete patterns to be able to recognize incomplete patterns? 
(1996, p. 200).
Figure 7.2. Connectivity structure of Dickin-
son and Dyer?s model of solar ephemeris 
learning (not all connections are shown).
and Dyer, allows the network to learn the local ephemeris and to use it to esti-
mate the azimuth for any time of day or night.
22
Dickinson and Dyer claim that this sort of model can learn any solar ephem-
eris function. They also claim that it is nonimplementational.
23
 It may be con-
ceded that a model of the sort proposed by Dickinson and Dyer can learn any 
particular, local solar ephemeris function. However, it appears that such a model 
could learn such a function only if it builds in constraints that amount to an im-
plementation of a general function which, via learning (perhaps some sort of 
parameter setting), yields a particular solar ephemeris.
Consider such a model repeatedly exposed to the local solar azimuth only 
for the same couple of hours in the afternoon. How can it learn to estimate the 
complete local solar ephemeris for its locale? First, as Dickinson and Dyer realize, 
in order to learn the ephemeris for the corresponding time of night, the time-of-
day nodes need to be most strongly connected to the time-of-night nodes that 
correspond to one-half of a day later. But it should be clear that this is a con-
straint on weights that partially builds in a general solar ephemeris function. 
Clearly, this constraint must be built-in by the modeller, since weights are simply 
not determined by connectivity alone. Second, the portion of the solar ephemeris 
244
22
 Notice that Dickinson and Dyer get around the problem of training independence by designing 
out one of its preconditions: independent input and output nodes.
23
 Dickinson and Dyer 1996, p. 201.
that a network with such connectivity will learn, based on limited exposure, will 
be consistent with an infinite variety of complete solar ephemeris functions. For 
example, for all such a network might learn, the sun is never visible outside that 
part of the sky in which it has been observed. Thus, further constraints on the 
weights of its connections will be necessary. Again, such constraints must be built 
in by the modeller. In short, if such a model can learn any local solar ephemeris, 
that will be possible only if the modeller builds in what he or she already knows 
about the ?shapes? (a graph of) a local solar ephemeris can actually take as well 
as how the entire shape of a particular ephemeris depends on the shapes of cer-
tain of its parts.
Dickinson and Dyer?s network model, then, won?t be able to freely general-
ize a local solar ephemeris unless it implements a generalized solar ephemeris 
function that operates on the value of a variable (time of day). Thus, it?s not a de-
finitive example of a nonimplementational connectionist model of solar ephem-
eris learning. At best, their model shows that if a universally quantified one-to-
one mapping has a sufficiently limited domain, then it can be implemented with 
what amounts to a kind of look-up table. That?s something a Classical theorist 
should have no qualms about.
I now turn to the question of whether network models of the learning of de-
layed matching-to-sample tasks or delayed non-matching-to-sample tasks could 
245
be adequate without implementing an algebraic rule. I won?t attempt to provide 
a complete answer this question. (Again, I leave a thorough examination of spe-
cific modelling issues for future work.
24
) Rather, I limit my discussion to a recent 
argument for an affirmative, though qualifiedly affirmative, answer. I?ll then say 
a few words about Giurfa?s Y-maze experiments.
Learning the tasks in question involves learning a first-order sameness or 
difference relation. Penn and Povinelli
25
 argue that non-Classical architectures 
are capable of learning such relations. They point to a network model by Gasser 
and Colunga
26
 as a clear example of such a network. Their model employs ?mi-
cro-relational units? to detect, roughly, the similarity or difference between two
246
24
 In my preliminary research on this issue, I?ve yet to find an example of a strictly Connectionist 
network model of sophisticated navigational capacities. All of the network models reviewed by 
Trullier et al. (1997) that are capable of anything approaching the flexibility of bee navigation 
(none have the capability of taking novel shortcuts) implement representations for which the con-
stituency relation is concatenative (typically, configurationally complex maps; they also employ 
traditional graph-search algorithms). The same is true of the more recent network model pro-
posed by Voicu and Schmajuk (2000). The network model developed by McNaughton and col-
leagues (McNaughton et al. 1991, 1996; Samsonovich and McNaughton 1997) performs path inte-
gration, but does so by implementing a look-up table, and thus can?t serve as a definitive exam-
ple of a non-Classical approach. Their model also implements a configurationally complex map. 
(A problem with the model is that it is incapable of returning to the coordinates of a stored loca-
tion, since it has no mechanism for storing such coordinates.) Mittelstaedt (2000) extends their 
model. Unlike McNaughton et al.?s version, Mittelstaedt?s model can return to a previously vis-
ited location. But, crucially, it leaves unspecified the mechanism by which locational information 
is tied to goal information. In effect, the model posits complex information without explaining 
how it is to be implemented. I should note that McNaughton and Mittelstaedt don?t appear to 
have a Connectionist axe to grind. Their goal is to provide network models of hippocampal func-
tion and mammalian navigation, and network models need not have an entirely non-Classical 
architecture.
25
 Penn and Povinelli (submitted).
26
 Gasser and Colunga 1999.
numeric inputs that encode respective features. However, it?s somewhat puzzling 
that Penn and Povinelli go on to admit that a micro-relational unit can plausibly 
be interpreted as implementing a rule that operates on the values of variables. 
Why, then, do they claim that Gasser and Colunga?s solution is non-Classical?
The principal answer is that Penn and Povinelli require of Classical rules 
that they be implemented in the form of explicit information. Since micro-
relational units do their job without employing explicit information about either 
the sameness or difference relation, Gasser and Colunga?s solution is non-
Classical. 
Apart from the fact that Classical rules need not be implemented in the form 
of explicit information (they can be hardwired, for example), there?s a distinction 
between a solution that is not definitively Classical and one that does not imple-
ment a rule that operates on the values of a variable. Some ways of implementing 
such rules are compatible with both Classical architectures and Connectionist 
ones. Gasser and Colunga?s use of multi-relational units appears to be one such 
way. Thus, insofar as the model employs such units, it cannot serve as a defini-
tive example of a non-Classical implementation of an algebraic rule.
It is also true, by the same token, that insofar as the model employs such 
units, it cannot serve as a definitive example of a Classical implementation. How-
ever, the ability to learn a first-order sameness or difference relation, while per-
247
haps necessary for performing delayed matching-to-sample tasks or delaying 
non-matching to sample tasks, is not sufficient.
27
 The rules learned by the bees in 
Giurfa?s experiments??Choose the x-marked arm if x was at the entrance,? and 
?Choose the non-x-marked arm if x was at the entrance??make use of sameness 
or difference information and thus require more than the implementation of a 
rule merely for detecting sameness and difference. The bees learned to detect not 
only sameness or difference but also the sameness or difference between two dif-
ferent kinds of features: the sample stimulus and the matching or nonmatching 
stimulus. In terms of variables, the information about the sample stimulus had to 
have been bound to a different variable than the information about either of the 
later-encountered stimuli. Which is another way of saying that the values of the 
respective variables had to have different semantic roles, and the learned rules 
had to have been sensitive to those roles. Thus, it appears that an adequate 
model of Giurfa?s results would have to do more than simply implement alge-
braic rules. It would have to implement rules that are sensitive to semantic 
structure.
28
 
248
27
 Penn and Povinelli devote no discussion to the modelling of such tasks.
28
 As I noted above, Dickinson and Dyer avoid training independence in part by employing con-
nections between input nodes. But Giurfa?s bees generalized across sensory modalities that have 
independent input channels. So, prima facie, it would appear that connectionist models of the 
bees? performance would have difficulty generalizing across input modalites, due to training in-
dependence.
7.6  Summary and Conclusion
I?ve argued in this chapter (based on the evidence presented in Chapter 6) that 
certain navigational capacities of honeybees exhibit what I?ve called strong sys-
tematicity (? 7.1) and that certain navigational capacities of honeybees exhibit 
what I?ve called weak systematicity (? 7.2). I?ve also argued that the representa-
tional constituents of systematically related honeybee mental representations 
have various structural roles (? 7.3). Among these are subject- and object-of-
relation roles (? 7.3.1), place-in-sequence roles (? 7.3.1), and ?what? and ?where? 
roles (? 7.3.2). Furthermore, a case can be made for the hypothesis that among the 
constituents of bee representations are indexical-like constituents (?? 7.3.3, 7.4, 
and 7.5). Finally, I?ve argued that honeybee information processing must be sen-
sitive to the structural roles of representational constituents.
The question that connects Chapters 2?5 and Chapters 6 and 7 is, ?What 
kind of theory of honeybee mental representations and processes would best ex-
plain the systematicity of the relevant honeybee navigational capacities?? Classi-
cal theorists would hypothesize, in light of the evidence (Chapter 6), that honey-
bees have mental representations that are complex, having representations as 
constituents. They would also hypothesize that the constituency relation for the 
relevant bee mental representations is concatenative and that the configurational 
structure of those representations is governed by a combinatorial syntax and se-
249
mantics. As in the case of human thought, the specific kinds of constituents?the 
specific contents and extensions of atomic and complex constituents?would be 
left open, for the present.
29
 Classical theorists would further hypothesize that the 
relevant honeybee cognitive processes have representational constituents in their 
domains and are causally sensitive to syntactic structure.
As I argued in Chapters 3?5, such an explanation of systematicity would be 
a good one. On the other hand, a Connectionist explanation would not be a good 
one, in that (1) it would provide neither a causal explanation of systematicity 
(Chapter 3) nor an acausal explanation of systematicity (Chapter 4) (and thus 
would not really explain systematicity at all), and (2) it would be unprincipled if 
construed as an explanation of systematicity (Chapter 5) (though it would not be 
be unprincipled if construed as an explanation of how a Connectionist system 
could mimic a Classical system that exhibits systematicity). Therefore, we have 
good (though defeasible) reasons to prefer Classical theories of certain honeybee 
navigational capacities over Connectionist theories.
One objection to the Classical explanation of systematicity is that it?s not at 
all clear whether systematicity requires that the configurational structure of 
250
29
 That the Classical explanation of systematicity doesn?t provide a syntax and semantics for men-
talese is not a good reason to regard the explanation as inadequate (contra Matthews 1997). It 
only points out the fact that the Classical view is not yet confirmed. Connectionism is no better 
off on this matter.
mental representations be syntactic.
30
 Perhaps positing map-like structure (for 
example) rather than syntactic structure would work as well. In regard specifi-
cally to the systematicity of honeybee navigational capacities, it might seem that 
map-like representational structure could account for the relevant systematicies. 
After all, we?re talking about certain capacities of honeybees to acquire informa-
tion about their navigational domain. Furthermore (it might be thought), assum-
ing that the structure of the relevant honeybee mental representations is map-like 
provides the best explanation of the fact that those representations preserve in-
formation about about the layout of their environment (which is also map-like).
This objection, I acknowledge, does raise serious issues that would need to 
be adequately addressed by anyone concerned to defend the Classical language 
of thought hypothesis, especially by anyone concerned to defend the view that 
honeybees have a language of thought. Fortunately, however, for my principal 
purpose, it?s not necessary for me to attempt to refute the kind of view under 
consideration. For one thing, as I pointed out earlier (? 5.4; Appendix A), a sys-
tem of mental representation can be both map-like and language-like. Further-
more, and this is the key point, my main conclusion is that we have (defeasible) 
good reasons to favor explanations of systematicity that posit a system of mental 
representation for which the constituency relation is concatenative over explana-
251
30
 See, for example, Block 1995, Copeland 1993, pp. 200?204, and Penn and Povinelli (submitted).
tions that do not; and the constituency relation for maps and other sorts of 
structural representations is in fact concatenative. So even if the objection in 
question could be worked out (and even if it turns out that the vehicles of hon-
eybee mental representations are map-like), that would be of no solace to a Con-
nectionist. For distributed representations are configurationally simple. Their 
contents can be complex, but the Connectionist constituency relation is noncon-
catenative. And, as I argued in Chapters 3?5, it?s that feature of Connectionism 
that makes its explanation of systematicity problematic.
252
Appendix A
A Limited Representational System which is both Map- and 
Language-Like
Here I demonstrate by means of a simple, artificial example, the possibility of a 
system of representation that is map-like, in that its representations have spatial 
structure, and language-like, in that it has a combinatorial syntax and semantics. 
I make no claims about the theoretical usefulness of the system.
A.1  Lexicon for Map Legend L
The map legend L consists of the following terms:
A finite set of individual constants, I: a set of 12 unique, uniform patterns.
A finite set of 1-place predicates, P: a set of 12 distinct colors.
A finite set of 3-place predicates, G: a set of 12 grids of two square, non-
overlapping regions having the same area and contiguous along a vertical 
side (  n).
On the intended interpretation, the members of G express something like the 
following: x is that minimal region of the world such that y is situated in the left 
half of x, and z is situated in the right half of x.
253
A.2  Syntax for L
All patterns, colors, and grids are to be understood as members of the relevant 
set of the terms of L.
1. For any uniform (colored or noncolored) pattern , and for any grid
 n,  n and n are wffs. (Here, ?noncolored? means having a 
color that is distinct from each member of P.)
2. For any uniform (colored or noncolored) patterns  and , and for 
any grid  n,  n is a wff.
3. If P and Q are wffs by clause 1 or 2, then the stack P/Q (P stacked on Q) 
is a wff.
4. There are no other wffs.
Regarding 1, when just one of the pattern variables is instanced, the other may be 
considered bound by an implicit existential quantifier whose domain is I. 
A.3  Semantics for L
A.3.1  L-Models
An L-model is an ordered 4-tuple, < , ?, ?, ?>, where
1.  is a square region consisting of 16 contiguous, nonoverlapping, and 
numbered square regions, arranged in a 4-by-4 grid:
??????
254
2. ? is a one-to-one mapping of G onto the set of the 12 smallest, horizon-
tally oriented, rectangular regions of , S: {1-2, 2-3, 3-4, 5-6, ?, 15-16}.
3. ? assigns to each member of I one member of P.
4. ? is a one-to-one mapping of I onto the set of the 16 numbered subre-
gions of  .
A.3.2 Truth Conditions for wffs of L
1. If  is a noncolored pattern, then n is true iff ?(  n) = k-(k+1) 
and ?( ) = k.
2. If  is a noncolored pattern, then n is true iff ?(  n) = k-(k+1) 
and ?( ) = k+1.
3. If  is a pattern of color c ? P, then n is true iff ?(  n) = k-(k+1),
?( ) = k, and ?( ) = c.
4. If  is a pattern of color c ? P, then n is true iff ?(  n) = k-(k+1),
?( ) = k+1, and ?( ) = c.
5. If  and  are (colored or noncolored) patterns, then n is true iff 
n and n are true.
6. A stack, P/Q, is true iff P is true, Q is true, and the grid constituents of P 
and Q are mapped by ? onto two members of S, a and b (respectively), 
such that the bottom side of a is contiguous with the top side of b. 
255
References
Aizawa, K. 1997. ?Explaining systematicity.? Mind and Language 12: 115?136.
Anderson, J. A. 1995. An Introduction to Neural Networks. MIT Press.
Barsalou, L. W. 1992. ?Frames, concepts, and conceptual fields.? In Frames, Fields, 
and Contrasts: New Essays in Semantic and Lexical Organization, ed. E. Kittay 
and A. Lehrer. Erlbaum.
Barsalou, L. W. 1993. ?Flexibility, structure, and linguistic vagary in concepts: 
Manifestations of a compositional system of perceptual symbols.? In Theories 
of Memories, ed. A. C. Collins, S. E. Gathercole, and M. A. Conway. Erlbaum.
Beer, R. D. 2000. ?Dynamical approaches to cognitive science.? Trends in Cognitive 
Sciences 4: 91?99.
Berg, R. E., and Stork, D. G. 1995. The Physics of Sound, second edition. Prentice 
Hall.
Blakemore, R. P., and Frankel, R. B. 1981. ?Magnetic Navigation in Bacteria.? Sci-
entific American 245: 58?65.
Block, N. 1995. ?The mind as the software of the brain.? In An Invitation to Cogni-
tive Science, 2nd ed., vol. 3, Thinking, ed. D. Osherson. MIT Press.
Browne, A., and Sun, R. 1999. ?Connectionist variable binding.? Expert Systems 
16: 189?207.
Butler, K. 1991. ?Towards a connectionist cognitive architecture.? Mind and Lan-
guage 6: 252?272.
Capaldi, E. A., and Dyer, F. C. 1995. ?Landmarks and dance orientation in the 
honeybee Apis mellifera.? Naturwissenschaften 82: 245?247.
Capaldi, E. A., and Dyer, F. C. 1999. ?The role of orientation flights on homing 
performance in honeybees.? The Journal of Experimental Biology 202: 1655-1666.
256
Capaldi, E. A., Smith, A. D., Osborne, J. L., Fahrbach, S. E., Farris, S. M., Rey-
nolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., and 
Riley, J. R. 2000. ?Ontogeny of orientation flight in the honeybee revealed by 
harmonic radar.? Nature 403: 537?540.
Carruthers, P. 2005. ?On being simple-minded.? In Consciousness: Essays from an 
Higher-Order Perspective. Oxford University Press.
Cartwright, B. A., and Collett, T. S. 1983. ?Landmark learning in bees: Experi-
ments and models.? Journal of Comparative Physiology 151: 521?543.
Casati, R., and Varzi, A. C. 1999. Parts and Places: The Structures of Spatial Represen-
tation. MIT Press.
Chittka, L., Bonn, A., Geiger, K., Hellstern, F., Klein, J., Koch, G., Meuser, S., and 
Menzel, R. 1992. ?Do bees navigate by means of snapshot memory pictures? 
In Proceedings of the 20th G?ttingen Neurobiology Conference, ed. N. Elsner and 
D. W. Richter. Georg Thieme Verlag.
Chittka, L., Geiger, K., and Kunze, J. 1995a. ?The influence of landmarks on dis-
tance estimation of honey bees.? Animal Behaviour 50: 23?31.
Chittka, L., Kunze, J., Shipman, C., and Buchmann, S. L. 1995b. ?The significance 
of landmarks for path integration in homing honeybee foragers.? Naturwis-
senschaften 82: 341?343.
Churchland, P. S. 1986. Neurophilosophy: Toward a Unified Science of the Mind Brain. 
MIT Press.
Clark, A. 1988. ?Thoughts, sentences and cognitive science.? Philosophical Psychol-
ogy 1: 263?278.
Collett, M., and Collett, T. S. 2000. ?How do insects use path integration for their 
navigation?? Biological Cybernetics 83: 245?259.
Collett, M., Collett, T. S., Bischi, S, and Wehner, R. 1998. ?Local and global vectors 
in desert ant navigation.? Nature 394: 269?272.
Collett, M., Harland, D., and Collett, T. S. 2002. ?The use of landmarks and pano-
ramic context in the performance of local vectors by navigating honeybees.? 
The Journal of Experimental Biology 205: 807?814.
257
Collett, T. S. 1992. ?Landmark learning and guidance in insects.? Philosophical 
Transactions of the Royal Society of London B 337: 295?303.
Collett, T. S. 1996. ?Insect navigation en route to the goal: Multiple strategies for 
the use of landmarks. The Journal of Experimental Biology 199: 227?235.
Collett, T. S., and Baron, J. 1994. ?Biological compasses and the coordinate frame 
of landmark memories in honeybees.? Nature 368: 137?140.
Collett, T. S., and Zeil, J. 1998. ?Places and landmarks: An arthropod perspec-
tive.? In Spatial Representation in Animals, ed. S. Healy. Oxford University 
Press.
Collett, T. S., Fry, S. N., and Wehner, R. 1993. ?Sequence learning by honey bees.? 
Journal of Comparative Physiology A 172: 693?706.
Collett, T. S., Baron, J., and Sellen, K. 1996. ?On the encoding of movement vec-
tors by honeybees. Are distance and direction represented separately?? Jour-
nal of Comparative Physiology A 179: 395?406.
Collett, T. S., and Collett, M. 2000. ?Path integration in insects.? Current Opinion 
in Neurobiology 10: 757?762.
Collett, T. S., and Collett, M. 2002. ?Memory use in insect visual navigation.? Na-
ture Reviews Neuroscience 3: 542?552.
Copeland, J. 1993. Artifical Intelligence: A Philosophical Introduction. Blackwell.
Cummins, R. 1996. ?Systematicity.? The Journal of Philosophy 93: 591?614.
Cummins, R., Blackmon, J., Byrd, D., Poirier, P., Roth, M., and Schwarz, G. 2001. 
?Systematicity and the cognition of structured domains.? The Journal of Phi-
losophy 98: 167?185.
Darwin, C. 1985. The Origin of Species. Penguin Classics.
Dennett, D. C. 1989. ?Mother nature versus the walking encyclopedia.? In Phi-
losophy and Connectionist Theory, ed. W. M. Ramsey, S. P. Stich, and D. E. Ru-
melhart. L. Erlbaum Associates.
Dickinson, J. 1994. ?Bees link local landmarks with celestial compass cues.? Na-
turwissenschaften 81: 465?467.
258
Dickinson, J., and Dyer, F. C. 1996. ?How insects learn about the sun?s course: Al-
ternative modeling approaches.? In From Animals to Animats 4, ed. P. Maes, M. 
J. Mataric, J.-A. Meyer, J. Pollack, and S. W. Wilson. MIT Press.
Dyer, F. C. 1985a. ?Mechanisms of dance orientation by the Asian honey bee Apis 
florea.? Journal of Comparative Physiology A 157: 183?198.
Dyer, F. C. 1985b. ?Nocturnal orientation by the Asian honey bee, Apis dorsata.? 
Animal Behaviour 33: 769?774.
Dyer, F. C. 1987. ?Memory and sun compensation by honey bees.? Journal of 
Comparative Physiology A 160: 621?633.
Dyer, F. C. 1991. ?Bees acquire route-based memories but not cognitive maps in a 
familiar landscape.? Animal Behaviour 41: 239?246.
Dyer, F. C. 2002. ?The biology of the dance language.? Annual Review of Entomol-
ogy 47: 917?949.
Dyer, F. C., and Dickinson, J. A. 1994. ?Development of sun compensation by 
honey bees: How partially experienced bees estimate the sun?s course.? Pro-
ceedings of the National Academy of Sciences USA 91: 4471?4474.
Dyer, F. C., and Dickinson, J. A. 1996. ?Sun-compass learning in insects: Repre-
sentation in a simple mind.? Current Directions in Psychological Science 5: 67?
72.
Esch, H. E., and Burns, J. E. 1996. ?Distance estimation by foraging honeybees.? 
The Journal of Experimental Biology 199: 155?162.
Esch, H. E., Zhang, S. W., Srinivasan, M. V., and Tautz, J. 2001. ?Honeybee dances 
communicate distances measured by optic flow.? Nature 411: 581?583.
Etienne, A., Maurer, R., Berlie, J., Reverdin, B., Rowe, T., Georgakopoulos, J., and 
S?guinot, V. 1998. ?Navigation through vector addition.? Nature 396: 161?164.
Fodor, J. A. 1990. A Theory of Content and Other Essays. MIT Press.
Fodor, J. A. 1998. ?Connectionism and the problem of systematicity (continued): 
Why Smolensky?s solution still doesn?t work.? In J. A. Fodor, In Critical Condi-
tion: Polemical Essays on Cognitive Science and the Philosophy of Mind. MIT Press.
259
Fodor, J. A. 2000. The Mind Doesn?t Work That Way: The Scope and Limits of Compu-
tational Psychology. MIT Press.
Fodor, J. A., and McLaughlin, B. P. 1995. ?Connectionism and the problem of sys-
tematicity: Why Smolensky?s solution doesn?t work.? In Connectionism: De-
bates on Psychological Explanation, ed. C. MacDonald and G. MacDonald. 
Blackwell.
Fodor, J. A., and Pylyshyn, Z. W. 1995. ?Connectionism and cognitive architec-
ture: A critical analysis.? In Connectionism: Debates on Psychological Explanation, 
ed. C. MacDonald and G. MacDonald. Blackwell.
F?l?p, A., and Menzel, R. 2000. ?Risk-indifferent foraging behaviour in honey-
bees.? Animal Behaviour 60: 657?666.
Gallistel, C. R. 1998. ?Symbolic processes in the brain: the case of insect naviga-
tion.? In An Invitation to Cognitive Science, 2nd ed., vol. 4, Methods, Models, and 
Conceptual Issues, ed. D. Osherson. MIT Press. 
Garson, J. W. 1997. ?Syntax in a dynamic brain.? Synthese 110: 343?355.
Giurfa, M., and Capaldi, E. A. 1999. ?Vectors, routes and maps: New discoveries 
about navigation in insects.? Trends in Neurosciences 22: 237?242.
Giurfa, M., Zhang, S., Jenett, A., Menzel, R., and Srinivasan, M. V. 2001. ?The 
concepts of ?sameness? and ?difference? in an insect.? Nature 410: 930?933.
Golledge, R. G., ed. 1999. Wayfinding Behavior: Cognitive Mapping and Other Spatial 
Processes. The Johns Hopkins University Press.
Gould, J. L. 1984. ?Processing of sun-azimuth information by honey bees.? Ani-
mal Behaviour 32: 149?152.
Gould, J. L. 1986. ?The locale map of honey bees: Do insects have cognitive 
maps?? Science 232: 861?863.
Gould, J. L., and Gould, C. G. 1988. The Honey Bee. W. H. Freeman.
Greggers, U., and Mauelshagen, J. 1997. ?Matching behavior of honeybees in a 
multiple-choice situation: The differential effect of environmental stimuli on 
the choice process.? Animal Learning and Behaviour 25: 458?472.
260
Greggers, U., and Menzel, R. 1993. ?Memory dynamics and foraging strategies of 
honeybees.? Behavioral Ecology and Sociobiology 32: 17?29.
Hadley, R. F. 1994. ?Systematicity in connectionist language learning.? Mind and 
Language 9: 247?272.
Hadley, R. F. 1997. ?Cognition, systematicity, and nomic necessity.? Mind and 
Language 12: 137-153.
Hadley, R. F. 2002. ?Systematicity in Connectionist Generalization,? In The Hand-
book of Brain Theory and Neural Networks, 2nd ed., ed. M.A. Arbib. MIT Press.
Hadley, R. F. 2004. ?On the Proper Treatment of Semantic Systematicity.? Minds 
and Machines 14: 145?172.
Haugeland, J., ed. 1997. Mind Design II: Philosophy, Psychology, Artificial Intelli-
gence. MIT Press.
Healy, S., ed. 1998. Spatial Representation in Animals. Oxford University Press.
Heinrich, B. 1976. ?Foraging specializations of individual bumblebees.? [Jrnl 
name? Ecol Monogr] 46: 105?128.
Horgan, T., and Tienson, J. 1996. Connectionism and the Philosophy of Psychology. 
MIT Press.
Hummel, J. E., and Holyoak, K. J. 2001. ?A process model of human transitive 
inference.? In Spatial Schemas and Abstract Thought, ed. M. Gattis. MIT Press.
Janzen, D. H. 1971. ?Euglossine bees as long-distance pollinators of tropical 
plants.? Science 171: 203?205.
Joerges, J., K?ttner, A., Galizia, C. G., and Menzel, R. 1997. ?Representation of 
odours and odour mixtures visualized in the honeybee brain.? Nature 387: 
285?288.
Kratzsch, D., Giurfa, M., and Menzel, R. 1998. ?Sequence learning by honey-
bees.? Abstract 296, Fifth International Congress of Neuroethology, University of 
California, San Diego.
MacDonald, C., and MacDonald, G., ed. 1995. Connectionism: Debates on Psycho-
logical Explanation. Blackwell.
261
Manning, A. 1956. ?Some aspects of the foraging behaviour of bumblebees.? Be-
haviour 9: 164?201.
Marcus, G. 2001. The Algebraic Mind: Integrating Connectionism and Cognitive Sci-
ence. MIT Press.
Matthews, R. J. 1996. ?Can connectionists explain systematicity.? Mind and Lan-
guage 12: 154?157.
McLaughlin, B. P. 1993. ?The connectionism/classicism battle to win souls.? 
Philosophical Studies 71: 163-190.
McNaughton, B. L., Chen, L. L., and Markus, E. J. 1991. ? ?Dead reckoning?, 
landmark learning, and the sense of direction: A neurophysiological and 
computational hypothesis.? Journal of Cognitive Neuroscience 3: 190?202.
McNaughton, B. L., Barnes, C. A., Gerrard, J. L., Gothard, K., Jung, M. W., 
Knierim, J. J., Kudrimoti, H., Qin, Y., Skaggs, W. E., Suster, M., and Weaver, 
K.?L. 1996. ?Deciphering the hippocampal polyglot: The hippocampus as a 
path integration system.? The Journal of Experimental Biology 199: 173?185.
Menzel, R. 1989. ?Bee-havior and the neural systems and behavior course.? In 
Perspectives in Neural Systems and Behavior, ed. T. J. Carew and D. Kelley. Alan 
R. Liss.
Menzel, R. 1999. ?Memory dynamics in the honeybee.? Journal of Comparative 
Physiology A 185: 323?340.
Menzel, R., and Giurfa, M. 2001. ?Cognitive architecture of a mini-brain: The 
honeybee.? Trends in Cognitive Science 5: 62?71.
Menzel, R, and M?ller, U. 1996. ?Learning and memory in honeybees: From be-
havior to neural substrates.? Annual Review of Neuroscience 19: 379?404.
Menzel, R., Geiger, K., Chittka, L, Joerges, J., Kunze, J., and M?ller, U. 1996. ?The 
knowledge base of bee navigation.? The Journal of Experimental Biology 199: 
141?146.
Menzel, R., Geiger, K., Joerges, J., M?ller, U., and Chittka L. 1998. ?Bees travel 
novel homeward routes by integrating separately acquired vector memories.? 
Animal Behaviour 55: 139?152.
262
Menzel, R., Brandt, R., Gumbert, A., Komischke, B., and Kunze, J. 2000a. ?Two 
spatial memories for honeybee navigation.? Proceedings of the Royal Society of 
London B 267: 961?968.
Menzel, R., Giurfa, M., Gerber, B., and Hellstern, F. 2000b. ?Cognition in insects: 
The honeybee as a study case.? In Brain Evolution and Cognition, ed. G. Roth 
and M. F. Wulliman. Wiley.
Menzel, R., Greggers, U., Smith, A., Berger , S., Brandt, R., Brunke, S., Bundrock, G., 
H?lse, S., Pl?mpe, T., Schaupp, F., Sch?ttler, E., Stach, S., Stindt, J., Stollhoff, N., 
and Watzl, S. 2005. ?Honey bees navigate according to a map-like spatial memory.? 
Proceedings of the National Academy of Sciences USA 102: 3040?3045.
Michelson, A. 1999. ?The dance language of honey bees: Recent findings and 
problems.? In The Design of Animal Communication, ed. M. Hauser and M. 
Konishi. MIT Press.
Mittelstaedt, H. 2000. ?Triple-loop model of path control by head direction and 
place cells.? Biological Cybernetics 83: 261?270.
M?ller, M., and Wehner, R. 1994. ?The hidden spiral: Systematic search and path 
integration in desert ants, Cataglyphis fortis.? Journal of Comparative Physiology 
A 175: 525?530.
Niklasson, L. F., and van Gelder, T. 1994. ?On being systematically connection-
ist.? Mind and Language 9: 288?302.
Pastergue-Ruiz, I., and Beugnon, G. 1994. ?Spatial sequential memory in the ant 
Cataglyphis cursor. In Les Insectes Sociaux. Proceedings of the 12th Congress of the 
International Union. Study social insects, ed. A Lenoir, G. Arnold, and M. 
Lepage. University Paris Nord, Paris.
Penn, D., and Povinelli, D. J. (submitted.) ?Do animals really have a language of 
thought?? Behavioral and Brain Sciences.
Phillips, S. 1998. ?Are feedforward and recurrent networks systematic? Analysis 
and implications for a connectionist cognitive architecture.? Connection Sci-
ence 10: 137?160.
Phillips, S., and Halford, G. S. 1997. ?Systematicity: Psychological evidence with 
connectionist implications.? In Proceedings of the Nineteenth Annual Conference 
263
of the Cognitive Science Society, eds. M. G. Shafto and P. Langley. Stanford Uni-
versity.
Pinker, S. 1997. How the Mind Works. Norton.
Povinelli, D. J., and Bering, J. M. 2002. ?The mentality of apes revisited.? Current 
Directions in Psychological Science 11: 115?119.
Povinelli, D. J., Bering, J. M., and Giambrone, S. 2000. ?Toward a science of other 
minds: Escaping the argument by analogy.? Cognitive Science 24: 509?541.
Povinelli, D. J., and Giambrone, S. 2001. ?Reasoning about beliefs: A human spe-
cialization?? Child Development 72: 691?695.
Povinelli, D. J., and Vonk, J. 2003. ?Chimpanzee minds: Suspiciously human?? 
Trends in Cognitive Sciences 7: 157?160.
Rey, G. 1997. Contemporary Philosophy of Mind. Blackwell.
Rey, G. 2003. ?Chomsky, Intentionality, and a CRTT.? In Chomsky and His Critics, 
ed. L. M. Antony and N. Hornstein. Blackwell.
Riley, J. R., Smith, A. D., Reynolds, D. R., Edwards, A. S., Osborne, J. L., Williams, 
I. H., Carreck, N. L., and Poppy, G. M. 1996. ?Tracking bees with harmonic 
radar.? Nature 379: 29?30.
Riley, J. R., Valeur, P., Smith, A. D., Reynolds, D. R., Poppy, G. M., and L?fstedt, 
C. 1998. ?Harmonic radar as a means of tracking the pheromone-finding and 
pheromone-following flight of male moths.? Journal of Insect Behavior 11: 287?
296.
Riley, J. R., Greggers, U., Smith, A. D., Stach, S., Reynolds, D. R., Stollhoff, N., 
Brandt, R., Schaupp, F., and Menzel, R. 2003. ?The automatic pilot of honey-
bees.? Proceedings of the Royal Society of London B 270: 2421?2424.
Riley, J. R., Greggers, U., Smith, A. D., Reynolds, D. R., and Menzel, R. 2005. ?The 
flight paths of honeybees recruited by the waggle dance.? Nature 435: 205?
207.
Robinson, W. S. 1995. ?Direct representation.? Philosophical Studies 80: 305?322.
264
Ronacher, B., and Wehner, R. 1995. ?Desert ants Cataglyphis fortis use self-
induced optic flow to measure distances travelled.? Journal of Comparative 
Physiology A 177: 21?27.
Samsonovich, A., and McNaughton, B. L. 1997. ?Path integration and cognitive 
mapping in a continuous attractor neural network model.? The Journal of Neu-
roscience 17: 5900?5920.
Schmidt, I., Collett, T. S., Dillier, F.-X., and Wehner, R. 1992. ?How desert ants 
cope with enforced detours on their way home.? Journal of Comparative Physi-
ology A 173: 103?133.
Sch?ne, H., Westermayr, P., K?hme, D., K?hme, L., Sch?ne, M., and Sch?ne, R. 
1998. ?Searching behaviour and direction finding of differently motivated 
displaced honeybees ? an ?etho-psychological? study of release behaviour.? 
Ethology 104: 1039?1055.
Servan-Schreiber, D., Cleeremans, A., and McClelland, J. 1991. ?Graded state ma-
chines: The representation of temporal contingencies in simple recurrent net-
works.? In Connectionist Approaches to Language Learning, ed. D. Touretzky. 
Kluwer.
Si, A., Srinivasan, M. V., and Zhang, S. 2003. 
?
Honeybee navigation: Properties of 
the visually driven ?odometer?.? The Journal of Experimental Biology 206: 1265?
1273
Schmidt, I., Collett, T. S., Dillier, F.-X., and Wehner, R. 1992. ?How desert ants 
cope with enforced detours on their way home.? Journal of Comparative Physi-
ology A 171: 285?288.
Smolensky, P. 1995a. ?Connectionism, constituency, and the language of 
thought.? In Connectionism: Debates on Psychological Explanation, ed. C. Mac-
Donald and G. MacDonald. Blackwell.
Smolensky, P. 1995b. ?On the proper treatment of connectionism.? In Connection-
ism: Debates on Psychological Explanation, ed. C. MacDonald and G. MacDon-
ald. Blackwell.
Smolensky, P. 1995c. ?Reply: Constituent structure and explanation in an inte-
grated connectionist/symbolic cognitive architecture.? In Connectionism: De-
265
bates on Psychological Explanation, ed. C. MacDonald and G. MacDonald. 
Blackwell.
Srinivasan, M. V., Zhang, S. W., and Bidwell, N. J. 1997. ?Visually mediated 
odometry in honeybees navigation en route to the goal: Visual flight control 
and odometry.? The Journal of Experimental Biology 200: 2513?2522.
Srinivasan, M. V., Zhang, S., Altwein, M., and Tautz, J. 2000. ?Honeybee naviga-
tion: Nature and calibration of the ?odometer?.? Science 287: 851?853.
Sterelny, K. 1990. The Representational Theory of Mind: An Introduction. Blackwell.
Tautz, J., Zhang, S., Spaethe, J., Brockmann, A., Si, A., and Srinivasan, M. 2004. 
?Honeybee odometry: Performance in varying natural terrain.? PLoS Biology 
2: 915?922.
Touretzky, D. S. 1986. ?BoltzCONS: Reconciling connectionism with the recursive 
nature of stacks and trees.? Proceedings of the Eighth Annual Conference of the 
Cognitive Science Society. Amherst, Mass.
Trullier, O., Wiener, S. I., Berthoz, A., and Meyer, J.-A. 1997. ?Biologically based 
artificial navigation systems: review and prospects.? Progress in Neurobiology 
51: 483?544.
van Gelder, T. 1990. ?Compositionality: A connectionist variation on a classical 
theme.? Cognitive Science 14: 355?384.
van Gelder, T. 1991. ?Classical questions, radical answers: Connectionism and the 
structure of mental representations.? In Connectionism and the Philosophy of 
Mind, ed. T. Horgan and J. Tienson. Kluwer.
van Gelder, T. 1995. ?What might cognition be, if not computation?? Journal of 
Philosophy 91: 345?381.
van Gelder, T. 1998. ?The dynamical hypothesis in cognitive science.? Behavioral 
and Brain Sciences 21: 615?665.
Voicu, H., and Schmajuk, N. 2000. ?Exploration, navigation and cognitive map-
ping.? Adaptive Behavior 8: 207?224.
von Frisch, K. 1967. The Dance Language and Orientation of Bees. Belknap/Harvard.
266
Wehner, R. 1983. ?Celestial and terrestrial navigation: Human strategies ? insect 
strategies.? In Neuroethology and Behavioral Physiology, ed. F. Huber and H. 
Markl. Springer-Verlag.
Wehner, R. 1984. ?Astronavigation in insects.? Annual Review of Entomology 29: 
277?298.
Wehner, R. 1992. ?Arthropods.? In Animal Homing, ed. F. Papi. Chapman & Hall.
Wehner, R., and Srinivasan, M. V. 1981. ?Searching behavior of desert ants, genus 
Cataglyphis (Formicidae, Hymenoptera).? Journal of Comparative Physiology 
142: 315?338.
Wehner, R., Bleuler, S., Nievergelt, C., and Shah, D. 1990. ?Bees navigate by using 
vectors and routes rather than maps.? Naturwissenschaften 77: 479?482.
Wehner, R., Michel, B., and Antonsen, P. 1996. ?Visual navigation in insects: Cou-
pling of egocentric and geocentric information.? The Journal of Experimental 
Biology 199: 129?140.
Wehner, R., Gallizzi, K., Frei, C., and Vesely, M. 2002. ?Calibration processes in 
desert ant navigation: vector courses and systematic search.? Journal of Com-
parative Physiology A 188: 683?693.
Wei, C. A., Rafalko, S. L., and Dyer, F. C. 2002. ?Deciding to learn: Modulation of 
learning flights in honeybees, Apis mellifera.? Journal of Comparative Physiology 
A 188: 725?737.
Wohlgemuth, S., Ronacher, B., and Wehner, R. 2001. ?Ant odometry in the third 
dimension.? Nature 411: 795?798.
Zhang, S. W., Bartsch, K., and Srinivasan, M. V. 1996. ?Maze learning by honey-
bees.? Neurobiology of Learning and Memory 66: 267?282.
267