Goal Reasoning: 
Papers from the ACS Workshop  
 (http://mcox.org/g-reason) 
 
Baltimore, MD 
14 December 2013 
 
Workshop Chair and Editors 
 
David W. Aha (Chair), Naval Research Laboratory (USA) 
Michael T. Cox, University of Maryland (USA) 
H?ctor Mu?oz-Avila, Lehigh University (USA) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Aha, D.W., Cox, M.T., & Mu?oz-Avila, H. (Eds.) (2013). Goal reasoning: Papers from the ACS workshop (Technical Report 
CS-TR-5029). College Park, MD: University of Maryland, Department of Computer Science.   
i
 
 
 
 
 
 
Preface 
 
This technical report contains the 11 accepted papers presented at the Workshop on Goal Reasoning, 
which was held as part of the 2013 Conference on Advances in Cognitive Systems (ACS-13) in 
Baltimore, Maryland on 14 December 2013. This is the third in a series of workshops related to this 
topic, the first of which was the AAAI-10 Workshop on Goal-Directed Autonomy while the second was 
the Self-Motivated Agents (SeMoA) Workshop, held at Lehigh University in November 2012.  
Our objective for holding this meeting was to encourage researchers to share information on the study, 
development, integration, evaluation, and application of techniques related to goal reasoning, which 
concerns the ability of an intelligent agent to reason about, formulate, select, and manage its 
goals/objectives. Goal reasoning differs from frameworks in which agents are told what goals to 
achieve, and possibly how goals can be decomposed into subgoals, but not how to dynamically and 
autonomously decide what goals they should pursue. This constraint can be limiting for agents that solve 
tasks in complex environments when it is not feasible to manually engineer/encode complete knowledge 
of what goal(s) should be pursued for every conceivable state. Yet, in such environments, states can be 
reached in which actions can fail, opportunities can arise, and events can otherwise take place that 
strongly motivate changing the goal(s) that the agent is currently trying to achieve. 
This topic is not new; researchers in several areas have studied goal reasoning (e.g., in the context of 
cognitive architectures, automated planning, game AI, and robotics). However, it has infrequently been 
the focus of intensive study, and (to our knowledge) no other series of meetings has focused specifically 
on goal reasoning. As shown in these papers, providing an agent with the ability to reason about its goals 
can increase performance measures for some tasks. Recent advances in hardware and software platforms 
(involving the availability of interesting/complex simulators or databases) have increasingly permitted 
the application of intelligent agents to tasks that involve partially observable and dynamically-updated 
states (e.g., due to unpredictable exogenous events), stochastic actions, multiple (cooperating, neutral, or 
adversarial) agents, and other complexities. Thus, this is an appropriate time to foster dialogue among 
researchers with interests in goal reasoning.  
Research on goal reasoning is still in its early stages; no mature application of it yet exists (e.g., for 
controlling autonomous unmanned vehicles or in a deployed decision aid). However, it appears to have a 
bright future. For example, leaders in the automated planning community have specifically 
acknowledged that goal reasoning has a prominent role among intelligent agents that act on their own 
plans, and it is gathering increasing attention from roboticists and cognitive systems researchers. 
In addition to a survey, the papers in this workshop relate to, among other topics, cognitive architectures 
and models, environment modeling, game AI, machine learning, meta-reasoning, planning, self-
 motivated systems, simulation, and vehicle control. The authors discuss a wide range of issues 
pertaining to goal reasoning, including representations and reasoning methods for dynamically revising 
goal priorities. We hope that readers will find that this theme for enhancing agent autonomy to be 
appealing and relevant to their own interests, and that these papers will spur further investigations on 
this important yet (mostly) understudied topic. 
Many thanks to the participants and ACS for making this event happen! 
David W. Aha 
Baltimore, Maryland (USA) 
14 December 2013 
ii
 
 
 
 
 
 
Table of Contents 
 
Title Page i 
Preface ii 
Table of Contents iii 
 
Goal Substitution in Response to Surprises 1  
Robert Bobrow, Marshall Brinn, Mark Burstein, & Robert Laddaga 
 
Question-Based Problem Recognition and Goal-Driven Autonomy 10  
Michael T. Cox 
 
Inferring Actions and Observations from Interactions 26  
Joseph P. Garnier, Olivier L. Georgeon, & Am?lie Cordier 
 
Beyond the Rational Player: Amortizing Type-Level Goal Hierarchies 34  
Thomas R. Hinrichs & Kenneth D. Forbus 
 
Situation Awareness for Goal-Directed Autonomy by Validating Expectations 43  
Michael Karg & Alexandra Kirsch 
 
HALTER: Hierarchical Abstraction Learning via Task and Event Regression 53  
Ugur Kuter & H?ctor Mu?oz-Avila 
 
Learning Models of Unknown Events 64  
Matthew Molineaux & David W. Aha 
 
Goal-Driven Autonomy in Dynamic Environments 79  
Matt Paisner, Michael Maynord, Michael T. Cox, & Don Perlis 
 
Hierarchical Goal Networks and Goal-Driven Autonomy: Going where AI Planning Meets Goal Reasoning 95  
Vikas Shivashankar, Ron Alford, Ugur Kuter, & Dana Nau 
  
Breadth of Approaches to Goal Reasoning: A Research Survey  111 
Swaroop Vattam, Matthew Klenk, Matthew Molineaux, & David W. Aha 
 
Towards Applying Goal Autonomy for Vehicle Control 127 
Mark Wilson, Bryan Auslander, Benjamin Johnson, Thomas Apker, James McMahon, & David W. Aha 
 
Author Index 143 
iii
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning  
 
 
 
 
Goal Substitution in Response to Surprises 
 
Robert Bobrow RUSTY@BBN.COM 
Marshall Brinn MBRINN@BBN.COM 
Raytheon BBN Technologies, Cambridge, MA 02138 
 Mark Burstein  BURSTEIN@SIFT.NET 
SIFT, LLC. Lexington, MA 02421 
 Robert Laddaga BOBLADDAGA@GMAIL.COM 
Vanderbilt University, Nashville, TN 37235 
Abstract 
This paper looks at the problem of learning from and responding to surprise during task 
performance by agents in complex unfamiliar environments. In particular, we describe an 
architecture and a brief demonstration of a cognitive agent driving a car in a simulated city street 
grid, in which unexpected obstacles are placed.  The agent is surprised by unexpected behavior of 
the car and its environment, and dynamically shifts, temporarily, to perform actions in terms of 
background safety goals even as it is learning how to behave appropriately to those unexpected 
conditions.  
1.  Introduction 
Software systems are built for particular environments and applications, and are notoriously 
brittle in facing circumstances for which they were not explicitly designed or trained. When such 
systems are embedded in physically autonomous or distributed systems, UXVs or systems that 
interlink multiple different organizations, and are faced with dynamic, open environments and 
situations they were never tested on, they demonstrate their brittleness in numerous ways.  For 
instance, a UAV might be programmed to perform reconnaissance tasks and be tested by flying 
solo over flat terrain but then be deployed in a mountainous setting where there are other UAVs 
in the same airspace. Autonomous cars may be tested on empty roads and then used in traffic.  
How do we ensure that such systems (a) maintain invariants, e.g., they do not run into each other 
or other unexpected obstacles, and (b) respond robustly to surprise or novel circumstances such as 
changes in maneuverability or control due to external factors?  Fundamentally, these systems do 
more than simply adapt in order to make progress toward a current or primary goal (e.g., 
Georgeff et al, 1985, Remmington et al, 2002). If evidence indicates that the agent?s actions are 
not having the expected effect, then continuing to select actions that its model says will produce 
those effects leads only to perserveration. The architecture must change to include mechanisms 
that recognize when expectations fail and react by coping with these disconnect.  In the short run 
this means considering safety goals, information gathering goals to aid in adapting and improving 
the agent?s own models and learning goals specifically aimed at changing action preconditions, 
procedures, and object or state characterizations as ways of improving its behaviors over the 
1
BOBROW, BRINN, BURSTEIN, & LADDAGA 
 
longer term. Such applications would then act more reliably, continually supporting high-level 
goals in the face of unfolding execution, and adjusting gradually to new environments. Even 
systems that have learned reactive controllers suffer from these issues when the environments 
they are employed in differ from their training.  For such systems to succeed, they need to 
recognize and adapt to the unanticipated, that is, to surprise. They must also be capable of 
suspending their primary objectives in the face of such surprises, in order to maintain stability and 
safety invariants.  
 This paper briefly outlines an approach and abstract cognitive architecture for goal-directed 
task reasoning in environments where surprising or unexpected events will occur and must be 
handled. We illustrate our approach with an implementation of an agent using this architecture to 
handle a simulated driving task. The example shows how the agent reasons about suspending its 
primary goals to maintain safety using a behavior learned from previous surprises.  
1.1  Goals of this research 
Our overall objective has been to develop distributed autonomous systems that are more robust in 
response to surprising circumstances by providing each agent with a reflective software layer that 
maintains and updates models of the capabilities, goals and actions of the system itself, of others, 
and of the operating environment. By robust, we mean systems that, when encountering surprise, 
? Don?t fail the first time. Instead, they do something reasonable to allow them to keep on 
going while they reconsider their understanding of the world. 
? Aren?t surprised the second time. They have some memory of past events and can update 
their models so that previously surprising events are no longer outside the realm of 
possibility. 
? Perform better each successive time. They learn continuously, improving their models of 
self, and others, their ability to represent and predict changes in the environment, and to plan 
and select actions in pursuit of goals. 
Our technical approach rests on the use of a class of architectures that: 
? contains a learned world-model that provides continued prediction of the evolution of the 
world (both as a result of actions by the agent, and as the result of endogenous processes in 
the world) 
? utilizes these predictions as expectations, which are compared against observations to detect 
surprises (expectation violations that may signify incompleteness or error of the world 
model, and  
? reacts to surprise through rapid reflection that triggers reprioritization of goals, replanning, 
and control changes. 
Reliable software systems must be able to pursue goals and maintain policies even as they adapt 
in response to surprises. Our approach rests on the use of architectures that utilize reflection ? 
diagnostic processes where one questions and tries to correct one?s interpretation of the world, 
one?s own behavior and capabilities ? based on the detection of expectation violations - 
mismatches between predictions or assumptions and observations. In this paper we focus on how 
the system avoids major failure in the face of surprise. In particular, the system described in this 
paper makes use of a reflective control layer that manages self-diagnosis and adapts models and 
behavior under the time constraints imposed by the system?s ongoing operations. Specifically, 
2
 GOAL SUBSTITUTION IN RESPONSE TO SURPRISES  
 
this layer manages learning (model update) goals and safety/survival goals along-side task 
performance goals.  
? Surprise Modeling. Surprise stems from expectations that have been violated in any of 
several ways. (e.g. [Horvitz et al, 2005, Shapiro et al, 2004] 
o Situations may be sufficiently uncertain or variable that they cannot be planned for.  
o Events may be considered to be impossible in that they contradict some internally 
modeled assumptions about the environment.  
o Events may be considered possible in general, but not predicted in specific 
circumstances because key indicators were not known or were not detected.  
? Assumption representation and re-evaluation. In responding to surprises, the underlying 
cause for the expectation failure must be identified and corrected. To that end, the agent must 
reconsider its representation of assumptions about its environment, the reliability of its own 
behavior, and its expected impact on the environment given its actions.  These assumptions 
must be systematically questioned and expanded to explain anomalous observations.  
? Memory-driven Expectation Monitoring. In addition to general operating assumptions, these 
systems must develop expectations of specific operational contexts in order to be surprised. 
Agents must develop (learn) predictive experience-based memory models that enable it to 
have expectations with variable certainty in different contexts.  Such models must 
characterize the capabilities of the agent itself, as well as its interactions with the environment 
and other agents. 
Figure 1 shows, abstractly, the main control loop and reflective control loops that we envision 
managing the process of acting in the world, and the process of reflecting on our expectations 
about that world by comparing effects to intended (?C or change in control) and expected (?M or 
change in model) states while simultaneously reflecting on how accurately those expectations are 
met and what that implies for its model of the world, and the likely success of its plans. The 
reflective, cognitive control loop reacts to differences between observations and predictions (?M) 
or expectations that it has generated based on its model of the actions it is performing. This loop 
is responsible for revising both the internal world model (predictors of the effects of its own 
actions and processes) and its plans for achieving goals. This model also includes what it knows 
about change processes operating in the world it is observing.  When differences arise due to 
observations of outcomes, it must evaluate whether these differences are likely or unlikely. When 
the agent?s actions are control changes, the differences it considers include changes in rates, not 
just positions.  Statistically significant differences require a decision whether to simply replan or 
to also change its models of how the world reacts to its actions, and, conversely, of the 
preconditions for actions necessary in order to get a particular outcome. Model changes include 
changes to the likelihood of discrete or qualitative effects of actions, including attentive actions, 
as well as to the state of the agent?s own effectors. They can also include the statistical ranges on 
quantitative effects. Model changes are context dependent, as we see in the case in the driving 
domain, discussed below, where applying brakes or turning the wheel does not have the usually 
predicted effect when the car is on ice. Alternate model changes and explanations for them must 
be considered, such as that could be that the brakes don?t work when the car is on ice, or that the 
breaks are themselves suddenly broken. Occam?s razor must be used to resolve these ambiguities. 
3
BOBROW, BRINN, BURSTEIN, & LADDAGA 
 
 
2.  Sample Scenario 
Consider an automated driver that knows the rudiments of driving a car in controlled test 
environments. It has enough knowledge to expect generally unobstructed well-paved roads, to 
know how to plan routes and follow directions to move through the streets, stay in lanes and 
follow other rules of the road, such as obeying traffic signals and signs. It also has rules 
associated with its and other?s safety, such as knowing to slow down and stop if it?s path is 
blocked, go around minor obstacles within its path and keeping a safe distance from cars ahead of 
it.  Though it didn?t exist when we first considered this task, Google?s driverless car now does 
reasonably well at these things. However, someone must still sit there ready to react to things that 
it cannot deal with, including such things as construction cones, human traffic directors, and 
pedestrians that come out from behind stopped vehicles ahead.  
If such systems are to be able to gradually gain reliability in more realistic urban driving 
situations, they must learn from experience but also have the ability to reason about the 
expectations from prior knowledge and advice they were given to explain how new situations 
violated those expectations. Recognizing expectation failures (surprises) and using reflective 
reasoning (about potential flaws in its own behavior, knowledge, model of how the world works) 
to recover and learn from these surprises, should help them improve by being less surprised over 
time, even though the events themselves are rare.  
Figure 2 shows our realization of the general COGENT (Cognitive Agent) architecture for 
our simplified driver agent operating in simulation. The simulated cars move forward an 
incremental amount based on their speed and direction at each tick, and the environmental 
controller makes observations and generated reactions at each step based on its observations and 
plans.  The cognitive controller acts in response to differences between the predicted incremental 
effects of actions and the observed ones.  When the cognitive controller is turned on, we refer to 
Figure 1. General COGENT Architecture 
4
 GOAL SUBSTITUTION IN RESPONSE TO SURPRISES  
 
this as ?reflective mode? for the simulation.  When prediction failures implicate the actions that 
the agent would use to replan for its current goal, safety goals are made primary instead.  
Figure 3 shows several screen shots of the simulation in action, with and without the results 
of reflective control. The simulation environment was developed based on a simulation model 
provided by Pat Langley and described in (Choi et al, 2004). In the non-reflective case, when the 
driver begins a turn and detects loss of traction, it responds by turning the wheel more, the normal 
goal-driven course adjustment loop. The result is it crashes into the building on the far corner. 
When reflection is turned on, and loss of traction is detected, the foreground goal of navigating 
the route path is suspended temporarily, and the background safety goals of regaining control and 
crash avoidance are switched to the foreground. When the safety goals are primary, a set of 
policies for avoiding crashes is used to control the vehicle. Though the agent had not categorized 
the ice as a threat to control, it now recognizes control loss and uses rules to steer in the direction 
of movement and apply braking until control is regained.  When the reflective driver does this, it 
misses the turn, but passes beyond the ice and stops, avoiding crashing.  Once traction is restored, 
the reflective agent resumes its normal goals and the safety goals are pushed into the background. 
Now the driver realizes it has passed the intersection, and must compute a new path to get to the 
higher goal of arriving at its destination. Figure 4a and 4b shows these steps.   
The other result of surprise is model repair. In this case, a search for differences between the 
state of the environment when control is maintained, and the current case of control loss, in order 
to explain the difference.  Classifying the observed ice as the distinguished new element of the 
driving situation enables the reflective agent to add a new case to its model of road hazards, so 
that a subsequent encounter with a patch of ice causes the agent to go around the ice, rather than 
over it. (Figure 4c).  
Figure 2. Simulated Driving Agent Architecture 
5
BOBROW, BRINN, BURSTEIN, & LADDAGA 
 
 
Figure 3. Simulated reaction to hitting ice patch. 
6
 GOAL SUBSTITUTION IN RESPONSE TO SURPRISES  
7 
 
Having avoided crash, 
it is now off course, 
and must replan route.  
Also learned it should avoid  
Ice in the future, if possible.  
Result of replanning.  
!"#!$%&#'(&#()*+,-#'&(-./#0%(&1-#2*#(3%1,#2*#+'%/#(1,#&%40(1-#2*#
 &%'*)%&#*&+5+1(0#5*(06##
 7"#8%40(11%,#&*92%/#'*1:19+15#;&*<#4*-+:*1#
 =%>*1,#2.%#+12%&-%':*1##
 ?"#@9=-%A9%12#9-%#
 *;#0%(&1%,#4*0+'>#
 2*#()*+,#+'%6#
 Figure 4: Goal Recovery and Learning Goal Generation 
BOBROW, BRINN, BURSTEIN, & LADDAGA 
 
3.  Discussion 
Though not previously published, the bulk of this work was done in 2004-2006. Our main 
research focus was on developing a framework for autonomous learning and adaptation in open 
environments where unexpected or surprising events, effects, or objects could impact the agent?s 
goals, its model of itself, its environment, and its expected impact on that world. In this regard we 
are very much aligned with the work described in (Molineaux et al, 2011, Klenk et al 2013) and 
our architecture looks very similar to theirs. Due to the nature of the driving simulation we used 
as a motivating illustration, we were more concerned with particular kinds of expectation 
violation in continuous processes than with discrete, symbolic differences. For the driver, the 
presence of an unknown object (ice) plus the unexpected rate of change in speed and direction 
compared to change in controls was a primary issue. We also used a very simple model of 
foreground and background goals, and did not address the subtleties of reasoning about the 
quality of the various explanations of the observed explanation failure that could impact the 
choice of plans for safe recovery. Our learning was limited to reclassification of the 
environmental conditions impacting the operators used in the task, based on an identification of 
the novel features in that environment.  
While there are many things that can trigger the reflective reasoning that causes shifts in goal 
priorities, surprise is a particularly interesting source of these responses. Although surprise is 
often linked solely with the expectation violations from unusual or low-probability events, in 
open worlds, it may also come from an inability to classify or explain objects or events of 
unknown types, or in unexpected contexts. We must also consider the impact on goal shifting of 
such things as the discovery of contradictory evidence from different sensors, and or failures in 
prediction due to incomplete knowledge of the world.  
Our model is based heavily on the notion of reacting to and learning from expectation failures 
(Schank, 1982) and on the creation of internal learning goals (Hunter, 1989; Ram and Leake, 
1995) when expectations are violated. Although our example illustrates the elevation of safety 
goals, in open environments surprises can be both negative and positive.  Autonomous agents 
must have multiple goals, not all of which are active at all times. Surprises can include 
unexpected goal achievements and unexpected discovery of useful objects for goals that were 
temporarily suspended, or were associated with persistent interests or needs of the agent (food, 
fuel, relevant information). Reasoning to elevate and shift to serendipitous goals can occur in a 
fashion very similar to the activation of the safety goal in the example presented above. Surprise 
can also lead to learning goals for exploration and experimentation in service of knowledge 
gathering.   
Other kinds of goal failures (especially knowledge failures) can result in shifts to plans for 
knowledge gathering.  When the agent assumed that she had enough information to pursue the 
original goal, these too look very much like responses to ?surprise?.  
Of course, not all goal shifts are due to surprise.  In our highly multi-tasked electronically 
supported society, interruption is probably the number-one source of new goals and incomplete 
goals.  If agents are working in a social setting with other agents, especially where cooperation is 
required, then reasoning about suspension of ones own goals to interact with and potentially 
address the needs of another agent is an important category of goal reasoning unrelated to 
surprise. Early work on agent teams such as (Tambe, 1997) illustrated this point, though without 
explicit goal shifts.   
8
 GOAL SUBSTITUTION IN RESPONSE TO SURPRISES  
 
Acknowledgements 
This work was performed in part with funding provided by DARPA under contract HR0011-04-
 C-0078.  
 
References 
Choi, D., Kaufman, M., Langley, P., Nejati, N., & Shapiro, D. (2004). An architecture for 
persistent reactive behavior. Proceedings of the Third International Joint Conference on 
Autonomous Agents and Multi Agent Systems (pp. 988-995). New York: ACM Press.  
Georgeff, M., Lansky, A. & Bessiere, P. (1985) A procedural logic. Proceedings of the Ninth 
International Joint Conference on Artificial Intelligence (pp. 921-927). Los Angeles, CA: 
Morgan Kaufmann. 
Horvitz, E., Apacible, J., Sarin, R. & Liao, L.  (2005). Prediction, expectation, and surprise: 
Methods, designs, and study of a deployed traffic forecasting service. In Proceedings of the 
Twenty-First Conference on Uncertainty in Artificial Intelligence. Edinburgh, Scotland, UK. 
Howe, A.E. (1995). Improving the reliability of AI planning systems by analyzing their failure 
recovery. IEEE Transactions on Knowledge and Data Engineering, 7, 14-25.  
Hunter, L.E. (1989). Knowledge acquisition planning: Gaining expertise through experience. 
Doctoral dissertation: Department of Computer Science, Yale University, New Haven, CT. 
Klenk, M., Molineaux, M., & Aha, D.W. (2013). Goal-driven autonomy for responding to 
unexpected events in strategy simulations. Computational Intelligence, 29(2), 187-206.  
Molineaux, M., Aha, D.W., & Kuter, U. (2011). Learning event models that explain anomalies. In 
T. Roth-Berghofer, N. Tintarev, & D.B. Leake (Eds.) Explanation-Aware Computing: Papers 
from the IJCAI Workshop. Barcelona, Spain.  
Ram, A., & Leake, D. B. (Eds.) (1995). Goal-driven learning. Cambridge, MA: MIT Press. 
Remington, R.W., Matessa, M.P, Freed, M. & Lee, S. (2003). Using Apex/CPM-GOMS to 
Develop human-like software agents. In Proceedings of the Second International Joint 
Conference on Autonomous Agents and Multi Agent Systems. Melbourne, Australia. 
Schank, R. (1983). Dynamic memory: A Theory of reminding and learning in computers and 
people. New York: Cambridge University Press.  
Shapiro, D., Billman, D., Marker, M., & Langley, P. (2004). A human-centered approach to 
monitoring complex dynamic systems (Technical Report). Palo Alto, CA: Institute for the Study 
of Learning and Expertise.  
Tambe, M. (1997). Towards flexible teamwork.  Journal of Artificial Intelligence Research, 7, 
83?124. 
9
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning  
 
 
 
Question-Based Problem Recognition and Goal-Driven Autonomy 
 
Michael T. Cox MCOX@CS.UMD.EDU 
Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742 USA 
Abstract 
Autonomy involves not only the capacity to achieve the goals one is given but also to recognize 
problems and to generate new goals of one?s own that are worth achieving. The creation of goals 
starts with the recognition of a novel problem, and this recognition begins with the detection of 
anomalies represented as the difference between expectations and observations (or inferences). The 
expectation failure triggers the posing of questions; questions lead to explanations; and explanations 
form the basis for goals. I illustrate these principles with examples from a computational architecture 
called MIDCA. 
1.  Introduction 
Have patience with everything that remains unsolved in your heart. Try to love the 
questions themselves, like locked rooms and like books written in a foreign language. Do 
not now look for the answers. They cannot now be given to you because you could not live 
them. It is a question of experiencing everything. At present you need to live the question. 
Perhaps you will gradually, without even noticing it, find yourself experiencing the answer, 
some distant day. ? Letter 4 (Rilke, 1986/1903) 
Certainly one of the key elements of intelligence that separates us from the rest of the primate order 
is that we routinely question our world and ourselves. Yet some questions can take us in false 
directions and waste valuable time and effort. Other questions are rhetorical and serve quite 
different functions (e.g., speech acts). However I claim here that questions frame an inquiry that 
lies at the heart of what it means to be intelligent. Although the performance of the individual may 
benefit, it is not the efficiency of action that primarily motivates questioning and answering. Instead 
it is the agent in search of truth that operates behind the curtain. In seeking to understand the world 
and ourselves, we discover problems in our understanding of and in our world. By addressing these 
problems, we can better comprehend and independently manage the worlds within which we exist. 
Represented appropriately, these problems give rise to goals that enable change in our environment 
as well as change within ourselves.  
Goals are key computational structures that drive problem solving, comprehension, and 
learning. In humans they constitute a special category of knowledge structure that represents 
desired and attainable states of affairs and that holds some measure of value for the individual 
(Kruglanski, 1996). For Kruglanski goals possess properties attributable to all knowledge, to goals 
as a class, and to the specific goal relation identified. Hierarchical systems of goal structures 
motivate and constrain reasoning and effective behavior (Kruglanski, K?petz, B?langer, Chun, 
Orehek, & Fishbach, 2013; Kruglanski, Shah, Fishbach, Friedman, Young, & Chun 2002). These 
10
M. T. COX 
goal systems enable efficient self-regulation and self-control. Goal orientation has also been shown 
to be an important determinant for successful learning (Dweck, 1986). Indeed the claim has been 
made that virtually all human behavior involves the pursuit of goals, that goals are at the center of 
human learning, and thus that goals constitute an effective means by which to organize educational 
curricula (Schank, 1994). 
In artificial cognitive systems, goals serve functions similar to those they do in people. In both 
cases goals provide focus, direction, and coordination for the allocation of computational resources 
during problem-solving and inference making. They also form a basis for the organization of and 
retrieval from memory (Schank & Abelson, 1977; Schank, 1982). Computationally goals reduce 
the amount of processing involved in decision-making; psychologically they afford attention and 
provide purpose. Further they also provide the means for explaining behavior (Malle, 2004; Ram, 
1990; Schank, 1982; 1986). That is, agents do particular actions because they intend to achieve the 
states that result from such actions. Finally, many researchers have shown a positive relation 
between goal-focused behavior and deliberate, planful learning (Cox, 2007; Cox & Ram, 1999; 
Ram & Leake, 1995). 
Most computational theories of cognition assume the existence of goal structures and 
concentrate on their application and use in problem-solving, planning and execution, learning and 
other activities. Except under two conditions, these theories generally do not account for how goals 
originate. One assumption is that goals are simply input by a human. A second possibility is that 
goals arise due to subgoaling. When action preconditions are unsatisfied in the environment, a sub-
 goal is spawned to achieve blocked preconditions. Here I will examine in detail an alternative 
cognitive process that creates novel goals for an autonomous agent. 
Autonomy involves not only the capacity to achieve given goals, but it also concerns the ability 
to recognize new problems and to propose goals that are worth achieving. An independent cognitive 
system is autonomous to the extent that it understands the world and its relation to it and can act 
accordingly. Both opportunities and threats in the world require an agent to anticipate events within 
the context of its interests. Given an understanding of these opportunities and threats, an 
autonomous agent manages the goals and plans it has by adapting either its plans or goals and by 
abandoning old goals or by generating new ones.  
This paper will examine these issues in the context of computational theories and will illustrate 
parts of our theory with various examples. The following section discusses what it means for a 
cognitive system to be autonomous, and it will challenge the current models. Here autonomy means 
being able to generate new goals rather than just following the goals of others. The subsequent 
section expands upon this model and proposes that new goals come from the recognition of 
problems in the environment and within one?s knowledge and experience. The key is to identify 
when observations violate expectations, to ask why such an anomaly occurred, to answer by 
explaining the causes of the anomaly, and to generate a goal to remove the primary cause. This 
process either points to a deficiency of knowledge or a deficiency in the world. Next we examine 
an architecture called MIDCA that implements many of these ideas and processes and that provides 
a direction for our research. I close with summary and concluding remarks. 
2.  Autonomy 
A common model of autonomy assumes an intelligent agent that can perform tasks given to it by a 
user and/or can automatically improve its performance in some environment over time (Maes, 
1994; Wooldridge, 2002). The consensus asserts that agents exist within some environment (either 
11
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
real or virtual) and can both sense and act within that environment (Franklin & Graesser, 1997; 
Weiss, 1999). Weiss (1999) distinguishes autonomous agents from mere programs, since agents 
decide whether or not to perform a request; whereas called functional programs have no say in the 
matter. Many researchers attempt to restrict autonomous agents to those that have their own agenda 
or otherwise do not need human intervention to perform some task (e.g., Franklin & Graesser, 
1997). However this transfers the uncertainty from the question of ?what is autonomy?? to ?what 
is having an agenda?? In general, an autonomous agent is one that makes many of its own decisions 
but that has an explicit mechanism of control. Control is provided by human assigned tasks or goals 
or by programming and design specifications. Autonomy is provided procedurally by automated 
planning and learning mechanisms. 
2.1  Goal-Following Autonomy 
Consistent with these characterizations, I define goal-following autonomy as either hardware or 
software agents that can accept a goal from a user (or another agent) and can automatically achieve 
the goal by performing a sequence of actions in their environment.1 The goal is simply some state 
configuration of the environment in terms of what needs to be satisfied. Functionally the agent 
implements the following characteristics. 
? A means to accept a goal request for some environment; 
? A decision whether to pursue the goal given its current situation and any prior requests; 
? A capacity to plan for goal achievement; 
? A capability to interact with the environment by executing a plan and perceiving results.  
If the world does not cooperate, a flexible agent may even change its plan at execution time to 
accomplish the goal as specified. But once the goal is achieved, does the agent wait to be told what 
to do next? Does it halt? In either case, this does not appear to constitute full autonomy in the more 
substantial sense.  
Practical problems also exist with the goal-following model. Most autonomous unmanned 
vehicles (AUVs) control their behaviors through preprogrammed mission plans that specify a set 
of waypoints, vehicle parameters, and tasks to perform at particular waypoints (Hagen, Midtgaard, 
& Hasvold, 2007). Complex tasks that cannot be fully specified in advance rely upon sporadic 
human intervention and communication. A large portion of the existing research effort into physical 
platforms revolves around motion planning and obstacle avoidance (e.g., see Berry, Howitt, Gu, & 
Postlethwaite, 2012; Minguez, Lamiraux, & Laumond, 2008). However an unrealistic burden on 
the user exists if a human must continuously monitor the behavior of an AUV. Furthermore in 
situations of low bandwidth or stealth, continuous human to machine communications cannot be 
assumed. Yet in many complex, dynamic environments, unusual situations arise on a regular basis 
as the world changes in unexpected ways. This quandary has been called the brittleness problem 
(Duda, & Shortliffe, 1983; Lenat & Guha, 1989). That is, agents and virtually all cognitive systems 
in complex environments are brittle except in narrow situations that have been foreseen and verified 
previously by the system designers. But problems will occur, and a truly autonomous system should 
be robust in the face of surprise. 
1 The goal-following model of autonomy also includes those agents that can accept tasks to perform rather 
than states to achieve. In either case, the agent takes some high-level description of the desired behavior 
and automatically computes how to instantiate the directive in a particular environment.  
                                                 
12
M. T. COX 
One widespread solution to the brittleness problem is machine learning technology (Maes, 
1994; Holland, 1986; Stone & Veloso, 2000). Instead of explicitly programming an agent to select 
the optimal choice among large numbers of candidates across all possible situations, machine 
learning attempts to create generalizations for those conditions that apply to the largest set of 
possible contingencies. For example a classifier maps from a state of the world to some choice or 
decision outcome. In the case of AUVs, this choice could be in terms of a particular action to 
perform given the current state. Learning from experience then would develop new responses for 
unexpected situations. Much success has of course been reported in the machine learning and agent 
literature and also in the cognitive systems community (e.g., see Li, Stracuzzi, & Langley, 2012 
and Laird, Derbinsky, & Tinkerhess, 2012 for the latter). 
I do not dispute that autonomy is about choosing good actions given a goal and about flexibly 
learning to improve these choices. But the goal-following model is not complete, and it misses an 
important distinction when the environment is extremely uncooperative and complex. Interestingly 
DARPA (2012, p. 8) has pointed to this missing component of autonomy in its capabilities 
description for its latest X-ship project called the ASW Continuous Trail Unmanned Vessel or 
ACTUV. Among the many autonomous capabilities, they identified that ACTUV needed to be 
?capable of autonomous arbitration between competing mission and operating objectives based on 
strategic context, mission phase, internal state, and external conditions.? That is, autonomy implies 
the need to balance its own condition in relation to what is happening in the world with the strategic 
context of what it is trying to accomplish.  
2.2  Goal-Driven Autonomy (GDA) 
Broadly construed, the topic of goal reasoning concerns cognitive systems that can self-manage 
their goals (Vattam, Klenk, Molineaux, & Aha, in press). The topic is about high-level management 
of goals, plans, knowledge, and activities, not simply goal pursuit. In particular we focus on 
managing goals in the midst of failure. Failure is important for any cognitive system if it is to 
improve its performance in complex environments. First no programming no matter how extensive 
will guarantee success in non-trivial domains. Second failure points to those aspects of the system 
that are no longer relevant or contain some gap that needs filling in the new context. In particular 
cognitive systems (especially humans) generate expectations about what will or should occur in the 
world. Expectation failures drive much of cognition and a number of researchers have recognized 
this relationship (Anderson & Perlis, 2005; Birnbaum, Collins, Freed, & Krulwich, 1990; Cox & 
Ram, 1999; Perlis, 2011; Schank, 82; Schank & Owens, 1987).  
The alternative model of autonomy I advocate here consists of self-motivated processes of 
generating and managing an agent?s own goals in addition to goal pursuit. This model - called goal-
 driven autonomy (GDA) (Cox, 2007; Klenk, Molineaux, & Aha, 2013; Munoz-Avila, Jaidee, Aha, 
& Carter, 2010) - casts agents as independent actors that can recognize problems on their own and 
act accordingly. Furthermore in this goal-reasoning model, goals are dynamic and malleable and 
as such arise in three cases: (1) goals can be subject to transformation and abandonment (Cox & 
Veloso, 1998; Talamadupula, et al., 2010); (2) they can arise from subgoaling on unsatisfied 
preconditions (e.g., see Veloso, 1994) or in response to impasses (Laird, 2012) during problem-
 solving and planning; and (3) they can be generated from scratch during interpretation (Cox, 2007; 
see also Norman, 1995). 
For our purposes here, the most important of the above three cases is the third one. The idea is 
that given a problem in the world, an autonomous cognitive system must distinguish between 
13
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
perturbations that require a change in plans for the old goal and those that require a new goal 
altogether. What is missing in the planning and agent communities is a recognition that autonomy 
is not just planning, acting and perceiving. It also must incorporate a first-class reasoning 
mechanism that interprets and comprehends the world as plans are executed (Cox, 2011). It is this 
comprehension process that not only perceives actions and events in the world, but can recognize 
threats to current plans, goals, and intentions. I claim that a balanced integration between planning 
and comprehension leads to agents that are more sensitive to surprise in the environment and more 
flexible in their responses. 
In my approach, flexibility is realized through a technique I call goal insertion, where an agent 
inserts a goal into its planning process. In general goals are produced through goal formulation 
(c.f., Hanheide et al., 2010; Wilson, Molineaux, & Aha, 2013; Weber, Mateas, & Jhala, 2010), the 
process that includes the creation and deployment of autonomous goals where the agent takes 
initiative to establish new concerns and to pursue new opportunities. First an agent detects 
discrepancies between its observations and its expectations. The agent then explains what causes 
such discrepancies and subsequently generates a new goal to remove or mitigate the cause (Cox, 
2007). 
Consider a baby that is crying in public. Given that it normally is quite calm, the crying 
observation violates the mother?s expectation and represents an anomaly. The baby?s behavior 
could be explained in a couple of simple ways. If the baby is hungry then it cries, and if the baby?s 
diapers are dirty then it cries. The mother may check the diapers to eliminate that explanation and 
conclude that it is hungry. The resulting goal for the mother is to eliminate the hunger. A reasonable 
plan would be to get a bottle from her bag and feed the baby. Table 1 shows the distinct stages of 
the GDA process in an overly simplified manner. 
Steps Representations 
Anomaly detection Expect: calm (baby) 
Observe: cry (baby) 
Anomaly explanation hungry (baby) ? cry (baby) 
Goal formulation ? hungry(baby) 
Plan get (bottle) 
feed (baby, bottle) 
In this example, the explanation is a basic rule, but in general it may be an arbitrary lattice or 
explanatory graph structure. Here the goal is just the negation of the rule antecedent (see Cox, 2007 
for an accompanying algorithm), but for realistic situations, goal formulation must choose some 
salient prior state or set of states as the cause that requires attention for the problem to be solved. 
The next section will take a closer look at this issue. 
3.  Question-Based Problem Recognition 
Years ago Getzels (Getzels & Csikszentmihalyi, 1975) described the early stages of problem 
solving as including a cognitive process they called problem finding (see also Runco & Chand, 
1994). Problem finding involves the identification of the problem and precedes problem-
 Table 1. Crying baby example 
14
M. T. COX 
representation. Getzels classified problem finding into three distinct classes: problem presentation 
involves tasks given to the subject; problem discovery is the detection or recognition of the 
problem; and problem creation is a creative act that designs new problems.2 This paper focuses on 
the second class and uses the more common term problem recognition (Pretz, Naples, & Sternberg, 
2003).3  
In a brief Cognitive Science article, Getzels (1979) notes that, despite the journal?s broad 
coverage of cognitive processes and the eminent role that problems play in such processes, little 
research examines how problems are found or formulated. Surprisingly this negative condition is 
still true today. Despite the acknowledgement that goal formulation constitutes an initial stage of 
computation for agents (e.g., Russell & Norvig, 1995, p. 56), authors and AI researchers still fail 
to investigate its operation. Instead the technical focus is on high quality problem-representation 
(by researchers) and optimal problem-solving algorithms (by machines). Computational problems 
and goal states are typically assumed as given (presumably by an intelligent human). But our claim 
here is that the recognition of the problem and the generation of a goal are the hard problems rather 
than the generation of solutions.  
The GDA approach starts with an expectation failure as an initial condition for anomaly 
detection. However not all anomalies are problems and not all problems are relevant to the agent. 
Furthermore the representation for problems is often overly simplified in the literature. For example 
the automated planning community represents problems as an initial state, a goal state, and a set of 
operators (i.e., action models) (see Ghallab, Nau, & Traverso, 2004). But the real issue with how 
problem instances are represented is that in general they are arbitrary. Consider the blocksworld 
domain. The initial states in this domain are random configurations of blocks, but so too are the 
goal states. For example in the first panel of Figure 1, the initial state is the arrangement of three 
blocks on the table and the goal state is to have block A on top of block B. The planner is given no 
reason for why this should be the case, unlike the second panel in the figure. In this situation the 
planner wishes to have the triangle D on the block A to keep water out. Here D represents the roof 
of the house composed of A, B, and C. Water getting into a person?s living space is a problem; 
stacking random blocks is not. 
 
          
Figure 1. Blocksworld state sequences that distinguish justified problems from arbitrary problems 
2 Although none of these researchers take a computational approach, Hawkins, Best, & Coney (1989) actually 
come closest to our conception of the processes of problem recognition, but they do so from the perspective 
of a business analysis of consumer behavior. Their process starts with a discrepancy between what the 
consumer expects and what they perceive. Such problem recognition then leads to a solution in terms of a 
product to purchase. As an aside, it is interesting to note that it is in the interest of businesses and marketing 
firms to manufacture a consumer need by giving the consumer specific expectations. In this sense product 
marketing may be viewed as problem creation in Getzels classification. 
3 See also Klein, Pliske, Crandall, & Woods (2005) for problem detection. Their perspective is similar to ours 
(although still not computational), but they insist that the detection is not solely about expectation violations. 
A
 A B C B C
 A
 B C
 a) b) c)
 Put A on B
 A
 B CB C
 A
 B C
 a) b) c)
 D
 DA
 D
 Put D on A to keep the water out
                                                  
15
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
In our earlier work on symbolic anomaly detection (Cox, Oates, Paisner, & Perlis, 2012), we 
simulated an anomaly by removing an operator from the set of action models for the logistics 
domain. The logistics problems were to transport packages from one location to another using 
planes and trucks. When the unload-airplane action was removed and the original problem set 
was presented to the planner, only a subset of problems could be solved. The new set of solutions 
then provided an anomalous series of observations relative to the normal set of solutions. Now 
consider how a GDA-based cognitive system might actually explain the planes not being unloaded 
at a particular airport.  
Figure 2 shows a possible explanation. The airplanes are not unloaded, because no workers are 
at the airport. This is the case because the stevedores are on strike. The strike is caused by bad 
working conditions and low pay. For the management of the logistics company, this represents a 
serious and relevant problem. If it continues, profits will plummet and investor confidence will 
wane. Both are thus threatened. The question remains as to the management?s goal however. 
Negating the pay predicate leads to higher 
wages; whereas negating the belief in poor 
conditions possibly leads to misinformation. An 
alternative is to negate ?at-airport(workers) which 
is to imply workers should be present. This could be 
achieved by bringing in scabs. Finally the 
management might address the actual working 
conditions. In this last case, the goal would be ?bad-
 working-conds. The choice of which cause of the 
problem to attack is unclear. A cost benefit analysis 
might be used to differentiate between choices, but 
one must be sure not to require a cost or benefit of 
the resulting plan or solution, only the goal. 
Otherwise one would need to perform planning 
before goal formulation ? a clear case of the cart 
before the horse. 
Instead an intelligent agent should ask the right question in the first place. It observes no planes 
unloading when it expects normal activity at the airport and experiences an expectation failure and 
hence an anomaly. If it asks why the planes were not unloaded it would get an explanation of how 
this was not the case. But if it asks why the workers chose not to perform their duty, it would not 
get an explanation that focused on their wages. They have unloaded planes in the past at these same 
wages. The answer would focus on the working conditions. The agent would then recursively 
question whether it was the perception of the conditions that changed (i.e., the belief) or the 
conditions themselves that changed. This secondary question would get to the core cause of the 
problem. Given an answer to the question and an appropriate explanation, the goal would be simple. 
Removing the cause of the grievance would be the goal that the autonomous agent should prefer. 
Planning and hence action would follow directly and effectively.  
In the next section I will examine a cognitive architecture that integrates problem-solving as 
planning and comprehension as goal-driven autonomy. The architecture includes a cognitive cycle 
for action and perception and an analogous metacognitive cycle for meta-level control and 
introspective monitoring. I will concentrate on the former cycle to illustrate the concepts presented 
above. 
 
Figure 2. Explanation for why airplanes are 
not unloaded 
16
M. T. COX 
4.  MIDCA 
The Metacognitive, Integrated, Dual-Cycle Architecture (MIDCA) (Cox, Maynord, Paisner, Perlis, 
& Oates, 2013; Cox, Oates, & Perlis, 2011) consists of ?action-perception? cycles at both the 
cognitive (i.e., object) level and the metacognitive (i.e., meta-) level (see Figure 3). The output side 
of each cycle consists of intention, planning, and action execution, whereas the input side consists 
of perception, interpretation, and goal evaluation. A cycle selects a goal and commits to achieving 
it. The agent then creates a plan to achieve the goal and subsequently executes the planned actions 
to make the domain match the goal state. The agent perceives changes to the environment resulting 
from the actions, interprets the percepts with respect to the plan, and evaluates the interpretation 
Figure 3. The MIDCA architecture 
Ground Level
 Object Level
 Meta-Level
 Intend
 Controller
 Plan
 Evaluate
 Monitor
 Interpret
 Meta Goals
 goal
 insertionsubgoal
 Task Hypotheses
 Activations Trace
 Meta-Level
 Control
 Introspective 
Monitoring
 Memory
 Reasoning Trace 
( )
 Strategies
 Episodic Memory 
Metaknowledge
 Self Model ( )
 Mental Domain = ?
 Goal Management
 goal change      goal input
 World =?
 Memory
 Mission & 
Goals( )
 World Model (M?)
 Episodic Memory
 Semantic Memory 
& Ontology
 Plans( ) & 
Percepts ( )
 Problem
 Solving
 Comprehension
 goal change goal input
 goal
 insertion
 Intend
 Act 
(& Speak)
 Plan
 Evaluate
 Perceive 
(& Listen)
 Interpret
 Goals
 subgoal
 Task
 Actions Percepts
 M?
 Hypotheses
 M?
 M?
 17
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
with respect to the goal. At the object level, the cycle achieves goals that change the environment 
(i.e., ground level). At the meta-level, the cycle achieves goals that change the object level. That is, 
the metacognitive ?perception? components introspectively monitor the processes and mental state 
changes at the cognitive level. The ?action? component consists of a meta-level controller that 
mediates reasoning over an abstract representation of the object level cognition. 
To appreciate the distinctions in the relationship between levels, examine the finer details of 
the object level as shown in Figure 3. Here the meta-level executive function manages the goal set 
??. In this capacity, the meta-level can add initial goals (??0), subgoals (????) or new goals (????) to the 
set, can change goal priorities, or can change a particular goal (???). In problem solving, the Intend 
component commits to a current goal (????) from those available by creating an intention to perform 
some Task that can achieve the goal (Cohen & Levesque, 1990). The Plan component then 
generates a sequence of Actions (???, e.g., a hierarchical-task-net plan, see Nau, et al., 2003) that 
instantiates that Task given the current model of the world (M?) and its background knowledge 
(e.g., semantic memory and ontologies). The plan is executed by the Act component to change the 
actual world (?) through the effects of the planned Actions (????). Problem solving stores the goal 
and plan in memory to provide the agent expectations about how the world will change in the future. 
Then given these expectations, the comprehension task is to understand the execution of the plan 
and its interaction with world with respect to the goal so that success occurs.  
Comprehension starts with perception of the world in the attentional field via the Perceive 
component. The Interpret component takes as input the resulting Percepts (i.e., ??????) and the 
expectations in memory (??? and ????) to determine whether the agent is making sufficient progress. 
A GDA interpretation procedure implements the comprehension process. The procedure is to note 
whether an anomaly has occurred; assess potential causes of the anomaly by generating explanatory 
Hypotheses; and guide the system through a response. Responses can take various forms, such as 
(1) test a Hypothesis; (2) ignore and try again; (3) ask for help; or (4) insert another goal (????). 
Otherwise given no anomaly, the Evaluate component incorporates the concepts inferred from the 
Percepts thereby changing the world model (?????), and the cycle continues. This cycle of problem-
 solving and action followed by perception and comprehension functions over discrete state and 
event representations of the environment.  
Likewise introspective monitoring starts with ?perception? of the self (?) via the Monitor 
component. The Interpret component takes as input the resulting Trace (i.e.,????) and the expectations 
in memory (??? and ????) to determine whether the reasoning is making sufficient progress. The 
Interpret procedure is to detect a reasoning failure; explain potential causes of the failure by 
generating explanatory Hypotheses; and generate a learning goal or attainment goal. Reasoning 
about the self (e.g., am I knowledgeable about the domain) and the reasoning task enables the agent 
to determine the difference (i.e., learning vs. attainment goal). If MIDCA produces a learning goal, 
the meta-level control will create and execute a learning plan to change its knowledge. Attainment 
goals are passed through to the object level. Given no anomaly, the Evaluate component 
incorporates the concepts inferred from the Trace thereby changing the self-model (?????), and the 
cycle continues.  
4.1  Implementation: MIDCA_1.1 
MIDCA_1.1 (Paisner, Maynord, Cox, & Perlis, in press) is a simplified version of the complete 
MIDCA architecture shown in the schematic of Figure 3. It is currently composed of the cognitive 
(object level) cycle components shown in Figure 3. The implementation effort has concentrated on 
18
M. T. COX 
a simulator that generates successor states based on valid actions taken in the blocksworld domain; 
a state interpretation component; and a planner. For the planner, we used SHOP2 (Nau et al., 2003), 
a domain-independent task decomposition planner. Whereas the full MIDCA architecture has a 
meta-cognitive component that manages goals, MIDCA_1.1 has no goal management, and simply 
passes any new goals from the interpreter directly to the planner. In MIDCA_1.1, the Interpret 
component consists of an integration of bottom up and top down process as explained below. The 
Act component is incorporated into the blocksworld simulator, and the Perceive component is 
implicit in the transfer of world state representation to the interpreter. 
The GDA interpretation procedure at the object level has two variations that represent a bottom-
 up, data-driven track and a top-down, knowledge rich, goal-driven track (Cox, Maynord, Paisner, 
Perlis, & Oates, 2013). The data-driven track we call the D-track; whereas the knowledge rich track 
we call the K-track. The D-track is implemented by a bottom-up GDA process as follows. A 
statistical anomaly detector constitutes the first step, a neural network identifies low-level causal 
attributes of the anomaly, and a machine learning goal classifier provides the goal formulation. The 
K-track is implemented as a case-based explanation process. The representations for expectations 
significantly differ between the two tracks. K-track expectations come from explicit knowledge 
structures such as action models used for planning and ontological conceptual categories used for 
interpretation. Predicted effects form the expectations in the former and attribute constraints 
constitute expectation in the latter. D-track expectations are implicit by contrast. Here the implied 
expectation is that the probabilistic distribution of observations will remain the same. When 
statistical change occurs instead, an expectation violation is raised.  
The D-track interpretation procedure uses a novel approach for noting anomalies. We apply the 
statistical metric called the A-distance to streams of predicate counts in the perceptual input (Cox, 
Oates, Paisner, & Perlis, 2012; 2013). This enables MIDCA to detect regions whose statistical 
distributions of predicates differ from previously observed input. These regions are those where 
change occurs and potential problems exist.  
When a change is detected, its severity and type can be determined by reference to a neural 
network in which nodes represent categories of normal and anomalous states. This network is 
generated dynamically with the growing neural gas algorithm (Paisner, Perlis, & Cox, 2013) as the 
D-track processes perceptual input. This process leverages the results of analysis with A-distance 
to generate anomaly archetypes, each of which represents the typical member of a set of similar 
anomalies the system has encountered. When a new state is tagged as anomalous by A-distance, 
the GNG net associates it with one of these groups and outputs the magnitude, predicate type, and 
valence of the anomaly. 
Goal generation is done through a conjunction of two machine learning algorithms both of 
which work over symbolic predicate representations of the world (Maynord, Cox, Paisner, & Perlis, 
2013). Given a world state interpretation, the state is first classified using a decision tree into one 
of multiple state classes, where each class has an associated goal generation rule generated during 
learning. Given an interpretation and a class, different groundings of the variables of the rule are 
permuted until either one is found which satisfies that rule (in which case a goal can be generated) 
or until all permutations of groundings have been attempted (in which case no goal can be 
generated). This approach to goal insertion is na?ve in the sense that it constitutes a mapping 
between world states and goals which is static with respect to any context. 
The K-track GDA procedure is presently implemented with the Meta-AQUA system (Cox & 
Ram, 1999). In Meta-AQUA frame-based concepts in the semantic ontology provide constraints 
on expected attributes of observed input and on expected results of planned actions. When the 
19
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
system encounters states or actions that diverge from these expectations, an anomaly occurs. Meta-
 AQUA then retrieves an explanation-pattern that links the observed anomaly to the reasons and 
causal relationships associated with anomaly. A goal is then generated from salient antecedents of 
the instantiated explanation pattern (see also Cox, 2007). 
4.2  Firefighting Example: Autonomous goal formulation  
To generate goals autonomously, one might statistically train a classifier to recognize a goal 
given an arbitrary state representation.4 Maynord, Cox, Paisner, & Perlis (2013) demonstrated the 
potential of this approach using a knowledge structure called a TF-Tree. As mentioned in the 
previous section, the TF-Tree combines the results of two machine learning algorithms to detect 
those conditions that warrant the generation of a new goal. Given multiple examples of state-goal 
pairs, the classifier learns to generate appropriate goals when presented with novel states.  
For the implementation of MIDCA_1.1 we used a modified blocksworld for the domain. This 
version of blocksworld includes both rectangular and triangular blocks that compose the materials 
for simplified housing construction (see Figure 1). The initial goals for problems in this domain are 
to build houses consisting of towers of blocks with a roof on each. In addition the possibility exists 
that blocks may catch fire (set by a hidden arsonist). Furthermore there are additional actions added 
to the standard blocksworld operators. One action will put out fires, and another will find and 
capture the arsonist. In the amalgamated firefighting/house-construction/blocksworld domain, the 
TF-Trees will learn to generate a goal to have a fire extinguished when given a state containing a 
block on fire.  
The fires are problems because of the effect on housing construction and the supposed profits 
of the housing industry, and they pose threats to life and property. The approach to understanding 
the fire problems is to ask why the fires were started and not just how.5 An explanation of how the 
fire started would relate the presence of sufficient heat, fuel, and oxygen with the combustion of 
the blocks. Generating the negation of the presence of the oxygen for example would result in the 
goal ?oxygen and therefore put out the fire. But this does not get to the reason the fire started in 
the first place. To ask why the fire was started would result in possibly two hypotheses or 
explanations. Poor safety conditions can lead to fire or the actions of arsonists can result in fire. In 
this latter case, the arsonist causes the presence of the heat through some hidden lighting action. 
Given this explanation the agent can anticipate the threat of more fires and generate a goal to 
remove the threat by finding the arsonist. Apprehending the arsonist then removes the potential of 
fires in the future rather than just reacting to fires that started in the past.  
Empirical results show that a GDA approach to goal formulation significantly outperforms the 
statistical approach with fewer actions required for the same amount of housing construction. In 
brief the housing domain goes through a cycle of three state classes in building new ?houses.? 
4 One might also enumerate all possible goals and the conditions under which they are triggered. Tac-Air 
Soar (Jones, Laird, Nielsen, Coulter, Kenny, & Koss, 1999) takes this approach. Operators exist for various 
goal types and data-driven context-sensitive rules spawn them given matching run-time observations. 
However even if one could engineer all relevant goals for a domain and all the conditions under which they 
apply, an expectation failure or surprise may occur if the domain shifts (e.g., the introduction of novel 
technology). The recognition of new problems and the explanation of their causes would enable the 
formulation of a goal from first principles, even under conditions not envisioned by the agent designer. 
5 See Leake (1991) for extensive discussion of explanation types and the conditions under which they are 
appropriate. 
                                                 
20
M. T. COX 
Figure 4 shows three instantiated states classified by the TF-Trees and the grounded goals that each 
tree recognizes.  
Using the GDA method, over 1000 
time steps results in an average 
improvement of 245.4% relative to a 
baseline method that lacks goal generation 
altogether; whereas the statistical method 
results in a 54.2% improvement over the 
baseline (see Paisner, Maynord, Cox, & 
Perlis, in press, i.e., this volume, for 
technical details). The specific results are 
less important, however, than the 
conceptual framing of the problem. Many 
methods exist to implement a GDA approach to planning and action (e.g., see Aha, Klenk, Mu?oz-
 Avila, Ram, & Shapiro, 2010; Aha, Cox, & Munoz-Avila, in press). The crucial point for 
autonomous cognitive systems is to consider how they can be made to understand and represent 
the problem not just optimize a solution.  
5.  Conclusion 
This paper is not a technical analysis of autonomy, nor is it an empirical investigation of specific 
algorithms and data structures; rather it makes a sometimes intuitive case for a different approach 
to autonomy and intelligent agency. The proposal is that autonomy is chiefly about recognizing 
problems independently and generating goals to solve them. The specific technical details as they 
exist now are to be found in the references cited throughout this article. In lieu of these details here, 
I have tried instead to explore an alternative that lies relatively unexamined and to shed some light 
on the issues. For the most part, my proposition is novel, because few have thought much about it 
computationally; the research that does exists is found mainly in the social psychology literature. 
Also it is still unclear how many of the propositions and claims herein can be implemented fully, 
although further examples exist, especially in some of the earlier AI literature (e.g., Leake, 1991; 
Ram, 1991; Schank, 1986). A few new researchers have done work under the GDA topic, but much 
of the focus has been on goal formulation, explanation, and case learning rather than problem 
recognition and question posing. So as it stands now, much research remains, not the least of which 
is to clarify the role of question asking and experience in problem recognition.  
No current theory explains precisely why humans question the world and themselves. What is 
the exact relationship between question posing, problem recognition, and motivation in an 
intelligent agent? Why does an agent actively look for problems to begin with? That is why are we 
so motivated (sometimes compelled) to solve problems? Certainly a benefit accrues or some 
personal utility exists in the cost-benefit calculations that percolate in our mind, but this may not 
constitute the whole story. Perhaps at least part of the answer can be seen in the explanations 
provided by Higgins (2012).  
Higgins proposes that human intelligence is motivated by three things: value, truth, and fit. 
Value corresponds to the utility (i.e., value) we experience in things (e.g., objects, events, or states). 
Truth corresponds to the capability to accurately interpret the world with respect to one?s own 
experience and memory. Fit refers to the appropriateness of strategy. The AI community focusses 
on the first factor; whereas this paper has focused on the second. The motivation of truth impels us 
Figure 4. Three classifiers exist that recognize goals to 
get to the next state 
21
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
to look clearly at the world. We seek to discern where our observations agree with and where they 
differ from our knowledge of the world with the aim of improving our understanding. As I see it, 
value and truth correspond to the problem solving and problem comprehension divisions within the 
MIDCA architecture. An autonomous agent values effective performance and finds ?truth? in 
effective interpretation. One is as a complement to the other.  
Acknowledgements 
This material is based upon work supported by ONR Grants # N00014-12-1-0430 and # N00014-
 12-1-0172 and by ARO Grant # W911NF-12-1-0471. I thank Michael Maynord, Tim Oates, Don 
Perlis and the anonymous reviewers for comments on the content of this paper. 
References 
Aha, D. W., Cox, M. T., & Munoz-Avila, H. (Eds.) (in press). Proceedings of the 2013 Annual 
Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning. College Park, 
MD: University of Maryland. 
Aha, D. W., Klenk, M., Mu?oz-Avila, H., Ram, A., & Shapiro, D. (Eds.) (2010). Goal-Directed 
Autonomy: Papers from the AAAI Workshop. Menlo Park, CA: AAAI Press.  
Anderson, M., & Perlis, D. (2005). Logic, self-awareness and self-improvement. Journal of Logic 
and Computation 15, 21?40. 
Berry, A. J., Howitt, J., Gu, D.-W., & Postlethwaite, I. (2012). A continuous local motion planning 
framework for unmanned vehicles in complex environments. Journal of Intelligent & Robotic 
Systems 66(4), 477-494. 
Birnbaum, L., Collins, G., Freed, M., & Krulwich, B. (1990). Model-based diagnosis of planning 
failures. In Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 318-
 323). Menlo Park, CA: AAAI Press.  
Cohen, P. R. & Levesque, H. J. (1990). Intention is choice with commitment. Artificial Intelligence 
42(2-3), 213 ? 261. 
Cox, M. T. (2007). Perpetual self-aware cognitive agents. AI Magazine 28(1), 32-45. 
Cox, M. T. (2011). Metareasoning, monitoring, and self-explanation. In M. T. Cox & A. Raja (Eds.) 
Metareasoning: Thinking about thinking (pp. 131-149). Cambridge, MA: MIT Press. 
Cox, M. T., Maynord, M., Paisner, M., Perlis, D., & Oates, T. (2013). The integration of cognitive 
and metacognitive processes with data-driven and knowledge-rich structures. In Proceedings of 
the Annual Meeting of the International Association for Computing and Philosophy.  
Cox, M. T., Oates, T., Paisner, M., & Perlis, D. (2012). Noting anomalies in streams of symbolic 
predicates using A-distance. Advances in Cognitive Systems 2, 167-184. 
Cox, M. T., Oates, T., Paisner, M., & Perlis, D. (2013). Detecting change in diverse symbolic 
worlds. In L. Correia, L. P. Reis, L. M. Gomes, H. Guerra, & P. Cardoso (Eds.), Advances in 
Artificial Intelligence, 16th Portuguese Conference on Artificial Intelligence (pp. 179-190). 
University of the Azores, Portugal: CMATI. 
22
M. T. COX 
Cox, M. T., Oates, T., & Perlis, D. (2011). Toward an integrated metacognitive architecture. In P. 
Langley (Ed.), Advances in Cognitive Systems: Papers from the 2011 AAAI Fall Symposium (pp. 
74-81). Technical Report FS-11-01. Menlo Park, CA: AAAI Press. 
Cox, M. T., & Ram, A. (1999). Introspective multistrategy learning: On the construction of learning 
strategies. Artificial Intelligence, 112, 1-55. 
Cox, M. T., & Veloso, M. M. (1998). Goal transformations in continuous planning. In M. 
desJardins (Ed.), Proceedings of the 1998 AAAI Fall Symposium on Distributed Continual 
Planning (pp. 23-30). Menlo Park, CA: AAAI Press.  
Defense Advanced Research Projects Agency (2012). ASW Continuous Trail Unmanned Vessel 
(ACTUV) Phases 2 through 4. DARPA-BAA-12-19. Arlington, VA: DARPA. 
https://www.fbo.gov/utils/view?id=2935bca24073347c8fd1ae0820cc20f8 
Duda, R. O., & Shortliffe, E. H. (1983). Expert systems research. Science 220, 261-268. 
Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040-
 1048. 
Franklin, S., & Graesser, A. (1997) Is it an agent, or just a program?: A taxonomy for autonomous 
agents, Intelligent agents III. Berlin: Springer, 21-35. 
Ghallab, M., Nau, D., & Traverso, P. (2004). Automated planning: Theory and practice. San 
Francisco: Morgan Kaufmann. 
Getzels, J. W., & Csikszentmihalyi, M. (1975). From problem solving to problem finding. In I. A. 
Taylor & J. W. Getzels (Eds.), Perspectives in creativity (pp. 90-116). Chicago: Aldine. 
Getzels, J. W. (1979). Problem finding: A theoretical note. Cognitive Science 3, 167-172. 
Hagen, P. E., Midtgaard, O., & Hasvold, O. (2007). Making AUVs truly autonomous. In 
Proceedings of the MTS/IEEE Oceans Conference and Exhibition (pp. 1-4). Red Hook, NY: 
Curran Associates.  
Hanheide, M., Hawes, N., Wyatt, J., Gobelbecker, M., Brenner, M., Sjoo, K., Aydemir, A., Jenselt, 
P., Zender, H. and Kruiff, G. (2010). A framework for goal generation and management. In D. 
W. Aha, M. Klenk, H. Mu?oz-Avila, A. Ram, & D. Shapiro (Eds.), Goal Directed Autonomy: 
Papers from the AAAI Workshop. Menlo Park, CA: AAAI Press. 
Hawkins, D. I., Best, R. J., & Coney, K. A. (1989). Consumer behavior: Implications for marketing 
strategy, 4ed. Boston: Business Publications. 
Higgins, E. T. (2012). Beyond pleasure and pain: How motivation works. New York: Oxford 
University Press. 
Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning 
algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell & T. Mitchell 
(Eds.), Machine learning: An artificial intelligence approach, Vol. 2 (pp. 593-623). San Mateo, 
CA: Morgan Kaufmann Publishers. 
Jones, R. M., Laird, J. E., Nielsen, P. E., Coulter, K. J., Kenny, P., & Koss, F. V. (1999). Automated 
intelligent pilots for combat flight simulation. AI Magazine, 20(1), 27-41. 
Klein, G., Pliske, R. Crandall, B., & Woods, D. D. (2005). Problem detection. Cognition, 
Technology & Work 7(1), 14-28.  
Klenk, M., Molineaux, M., & Aha, D. (2013). Goal-driven autonomy for responding to unexpected 
events in strategy simulations. Computational Intelligence, 29(2), 187?206, 2013. 
23
 QUESTION-BASED PROBLEM RECOGNITION AND GOAL-DRIVEN AUTONOMY  
Kruglanski, A. W. (1996). Goals as knowledge structures. In P. M. Gollwitzer & J. A. Bargh (Eds.), 
The psychology of action: Linking cognition and motivation to behavior (pp. 599-618). New 
York: Guilford Press. 
Kruglanski, A. W., K?petz, C., B?langer, J. J., Chun, W. Y., Orehek, E., & Fishbach, A. (2013). 
Features of multifinality. Personality and Social Psychology Review, 17(1) 22?39. 
Kruglanski, A. W., Shah, J. Y., Fishbach, A., Friedman, R., Young, W., & Chun (2002). A theory 
of goal systems. In M. P. Zanna (Ed.), Advances in experimental social psychology (pp. 331-
 378). New York: Academic Press. 
Laird, J. E. (2012). The Soar cognitive architecture. Cambridge, MA: MIT Press. 
Laird, J. E., Derbinsky, N., & Tinkerhess, M. (2012). Online determination of value-function 
structure and action-value estimates for reinforcement learning in a cognitive architecture. 
Advances in Cognitive Systems 2, 221-238. 
Leake, D. (1991). Goal-based explanation evaluation. Cognitive Science 15, 509-545. 
Lenat, D., & Guha, R. (1989). Building large knowledge-based systems. Menlo Park, CA: Addison-
 Wesley. 
Li, N., Stracuzzi, D. J., & Langley, P. (2012). Improving acquisition of teleoreactive logic programs 
through representation extension. Advances in Cognitive Systems 1, 109?126.  
Maes, P. (1994). Modeling adaptive autonomous agents. Artificial Life 1 (1-2), 135-162.  
Malle, B. F. (2004). How the mind explains behavior: Folk explanations, meaning, and social 
interaction. Cambridge, MA: MIT Press/Bradford Books. 
Maynord, M., Cox, M. T., Paisner, M., & Perlis, D. (2013). Data-driven goal generation for 
integrated cognitive systems. In C. Lebiere & P. S. Rosenbloom (Eds.), Integrated Cognition: 
Papers from the 2013 Fall Symposium (pp. 47-54). Menlo Park, CA: AAAI Press.  
Minguez, J., Lamiraux, F., & Laumond, J.-P. (2008). Motion planning and obstacle avoidance. In 
B. Siciliano & O. Khatib (Eds.), Springer handbook of robotics (pp. 827-852). Berlin: Springer. 
Munoz-Avila, H., Jaidee, U., Aha, D. W., Carter, E. (2010). Goal-driven autonomy with case-based 
reasoning. In Case-Based Reasoning. Research and Development, 18th International Conference 
on Case-Based Reasoning, ICCBR 2010 (pp. 228-241). Berlin: Springer. 
Nau, D., Au, T., Ilghami, O., Kuter, U., Murdock, J., Wu, D., & Yaman, F. (2003). SHOP2: An 
HTN planning system. Journal of Artificial Intelligence Research 20, 379?404 
Norman, T. (1995). Motivation-based direction of planning attention in agents with goal autonomy. 
PhD thesis, Department of Computer Science, University College London. 
Paisner, M., Maynord, M., Cox, M. T., & Perlis, D. (in press). Goal-driven autonomy in dynamic 
environments. To appear in D. W. Aha, M. T. Cox, & H. Munoz-Avila (Eds.), Proceedings of 
the 2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning. 
College Park, MD: University of Maryland. 
Paisner, M., Perlis, D., & Cox, M. T. (2013). Symbolic anomaly detection and assessment using 
growing neural gas. In Proceedings of the 25th IEEE International Conference on Tools with 
Artificial Intelligence (pp. 175-181). Los Alamitos, CA: IEEE Computer Society. 
Perlis, D. (2011). There?s no ?me? in meta - or is there? In Cox, M. T., and Raja, A., (Eds.), 
Metareasoning: Thinking about thinking (pp. 15?26). Cambridge, MA: MIT Press. 
24
M. T. COX 
Pretz, J. E., Naples, A. J., & Sternberg, R. J. (2003). Recognizing, defining, and representing 
problems. In J. E. D. a. R. J. Sternberg (Ed.), The psychology of problem solving (pp. 3-30). 
Cambridge, UK: Cambridge University Press. 
Ram, A. (1990). Decision models: A theory of volitional explanation. In Proceedings of Twelfth 
Annual Conference of the Cognitive Science Society (pp. 198-205). Hillsdale, NJ: LEA.  
Ram, A. (1991). A theory of questions and question asking. Journal of the Learning Sciences, 1, 
(3&4), 273-318. 
Ram, A., & Leake, D. (1995). Learning, goals, and learning goals. In A. Ram & D. Leake (Eds.), 
Goal-driven learning (pp. 1-37). Cambridge, MA: MIT Press/Bradford Books.  
Rilke, R. M. (1986). Letters to a young poet. New York: Vintage Books. Originally published 1903.  
Runco, M. A., & Chand, I. (1994). Problem finding, evaluative thinking, and creativity. In M. A. 
Runco (Ed.), Problem finding, problem solving, and creativity (pp. 40-76). Norwood, NJ: Ablex.  
Russell, S. & Norvig, P. (1995). Artificial intelligence: A modern approach. Upper Saddle River, 
NJ: Prentice Hall. 
Schank, R. C. (1982). Dynamic memory: A theory of reminding and learning in computers and 
people. Cambridge, MA: Cambridge University Press.  
Schank, R. C. (1986). Explanation patterns: Understanding mechanically and creatively. Hillsdale, 
NJ: Lawrence Erlbaum Associates.  
Schank, R. C. (1994). Goal-based scenarios: A radical look at education. The Journal of the 
Learning Sciences, 3(4), 429-453. 
Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding: An inquiry into 
human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates.  
Schank, R. C., & Owens, C. C. (1987). Understanding by explaining expectation failures. In R. G. 
Reilly (Ed.), Communication failure in dialogue and discourse. New York: Elsevier Science.  
Stone, P. & Veloso, M. M. (2000). Multiagent systems: A survey from a machine learning 
perspective. Autonomous Robots. 8, 345-383. 
Talamadupula, K., Schermerhorn, P., Benton, J., Kambhampati, S., & Scheutz, M. (2011). Planning 
for agents with changing goals. In Proceedings of the 21st International Conference on 
Automated Planning and Scheduling (pp. 71-74). Menlo Park, CA: AAAI Press. 
Vattam, S., Klenk, M., Molineaux, M., & Aha, D. (in press). Breadth of approaches to goal 
reasoning: A research survey. In D. W. Aha, M. T. Cox, & H. Munoz-Avila (Eds.), Proc. of the 
2013 Annual Conference on ACS: Workshop on Goal Reasoning. College Park: Univ. Maryland. 
Veloso, M. (1994). Planning and learning by analogical reasoning. Berlin: Springer-Verlag.  
Weber, B. G., Mateas, M., & Jhala, A. (2010). Case-based goal formulation. In Proceedings of the 
AAAI Workshop on Goal-Driven Autonomy. 
Weiss, G. (1999). Multiagent systems: A modern approach to distributed artificial intelligence. 
Cambridge, MA: MIT Press. 
Wilson, M., Molineaux, M., & Aha, D. W. (2013). Domain-independent heuristics for goal 
formulation. Proceedings of the Twenty-Sixth Florida Artificial Intelligence Research Society 
Conference (pp. 160-165). Menlo Park, CA: AAAI Press.  
Wooldridge, M. (2002). An introduction to multiagent systems. Hoboken, NJ: Wiley. 
25
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning
 Inferring Actions and Observations from Interactions
 Joseph P. Garnier JOSEPH.GARNIER@LIRIS.CNRS.FR
 Olivier L. Georgeon OLIVIER.GEORGEON@LIRIS.CNRS.FR
 Am?lie Cordier AMELIE.CORDIER@LIRIS.CNRS.FR
 Universit? de Lyon, CNRS
 Universit? Lyon 1, LIRIS, UMR5205, F-69622, France
 Abstract
 This study follows the Radial Interactionism (RI) cognitive modeling paradigm introduced pre-
 viously by Georgeon and Aha (2013). An RI cognitive model uses sensorimotor interactions
 as primitives?instead of observations and actions?to represent Piagetian (1955) sensorimotor
 schemes. Constructivist epistemology suggests that sensorimotor schemes precede perception and
 knowledge of the external world. Accordingly, this paper presents a learning algorithm for an
 RI agent to construct observations, actions, and knowledge of rudimentary entities, from spatio-
 sequential regularities observed in the stream of sensorimotor interactions. Results show that the
 agent learns to categorize entities on the basis of the interactions that they afford, and appropriately
 enact sequences of interactions adapted to categories of entities. This model explains rudimen-
 tary goal construction by the fact that entities that afford desirable interactions become desirable
 destinations to reach.
 1. Introduction
 Georgeon and Aha (2013) introduced a novel approach to cognitive modeling called Radical Interac-
 tionism (RI), which invites designers of artificial agents to consider the notion of sensorimotor inter-
 action as a primitive notion, instead of perception and action. A sensorimotor interaction represents
 an indivisible cognitive cycle, consisting of sensing, attending, and acting. Within constructivist
 epistemology, it corresponds to a Piagetian (1955) sensorimotor scheme from which the subject
 constructs knowledge of reality. RI suggests a conceptual inversion of the learning process as com-
 pared to traditional cognitive models: instead of learning sensorimotor interactions from patterns
 of observations and actions, RI recommends constructing observations and actions as secondary
 objects. This construction process rests upon regularities observed in sensorimotor experience, and
 happens concurrently with the construction of knowledge of the environment. Figure 1 illustrates
 the RI cognitive modeling paradigm.
 26
J. P. GARNIER, O. L. GEORGEON, AND A. CORDIER
 Environment
 ?? ? ? ?? ? ?Agent
 Figure 1. The Radical Interactionism modeling paradigm (adapted from Georgeon & Aha 2013). At time
 t, the agent chooses an intended primitive interaction it from among the set of interactions I . The attempt
 to enact it may change the environment. The agent then receives the enacted primitive interaction et. If
 et = it then the attempt to enact it is considered a success, otherwise, a failure. The agent?s "perception of
 its environment" is an internal construct rather than the input et.
 The algorithm begins with a predefined set of sensorimotor interactions I , called primitive inter-
 actions. At time t, the agent chooses a primitive interaction it that it intends to enact, from among I .
 The agent ignores this enaction?s meaning; that is, the agent has no rules that would exploit knowl-
 edge of how the designer programmed the primitive interactions through actuator movements and
 sensory feedback (such as: "if a specific interaction was enacted then perform a specific computa-
 tion"). As a response from the tentative enaction of it, the agent receives the enacted interaction et,
 which may differ from it. The enacted interaction is the only data available to the agent that carries
 some information about the external world, but the agent ignores the meaning of this information.
 An RI agent is programmed to learn to anticipate the enacted interactions that will result from its
 intentions, and to tend to select intended interactions that are expected to succeed (et = it). Such a
 behavior selection mechanism implements a type of self-motivation called autotelic motivation (the
 motivation of being "in control" of one?s activity, Steels, 2004). Additionally, the designer associates
 a numerical valence with primitive interactions, which defines the agent?s behavioral preferences
 (some primitive interactions that the agent innately likes or dislikes). Amongst sequences of inter-
 actions that are expected to succeed, an RI agent selects those that have the highest total valence,
 which implements an additional type of self-motivation called interactional motivation (Georgeon,
 Marshall, & Gay, 2012).
 Our previous RI agents (Georgeon & Ritter, 2012; Georgeon, Marshall, & Manzotti, 2013)
 learned to organize their behaviors so as to exhibit rudimentary autotelic and interactional moti-
 vation without constructing explicit observations and actions. Here we introduce an extension to
 construct instances of objects (in the object-oriented programming sense of "object") that represent
 explicit observations and actions learned through experience. Our motivation is to design future RI
 agents that will use these to learn more sophisticated knowledge of their environment and develop
 smarter behaviors. In particular, we address the problem of autonomous goal construction by mod-
 eling how an observable entity in the environment that affords positive interactions can become a
 desirable destination to reach.
 27
INFERRING ACTIONS AND OBSERVATIONS FROM INTERACTIONS
 2. Agent
 Our agent has a rudimentary visual system that generates visual interactions with entities present in
 the environment. A visual interaction is a sort of sensorimotor interaction generated by the relative
 displacement of an entity in the agent?s visual field as the agent moves. The agent is made aware
 of the approximate relative direction of the enacted visual interaction et by being provided with
 the angular quadrant  t in which et was enacted. Additionally, the agent is made aware of its
 displacement in space through the angle of rotation  t 2 R induced by the enaction of et. The
 information  t corresponds to the information of relative rotation given by the vestibular system in
 animals. It can be obtained through an accelerometer in robots. Figure 2 illustrates these additions
 to the RI model.
 Environment
 ?? ? ??? ?? ?? Agent
 Figure 2. The Directional Radical Interactionism (DRI) model. Compared to RI (Figure 1), the DRI model
 provides additional directional information  t and  t when a primitive interaction et is enacted at time t.  t
 represents the directional quadrant where the interaction et is enacted relative to the agent, and  t the angle
 of rotation of the environment relative to the agent, generated by the enaction of et.
 3. Experiment
 We propose an implementation (see Figure 3) using the DRI model to study how agents constructs
 observations and actions from spatio-sequential regularities observed in its stream of sensorimo-
 tor interactions. This experiment was implemented in Java in our environment using the Enactive
 Cognitive Architecture (ECA). ECA is a cognitive architecture based on sensorimotor modeling,
 inspired by the Theory of Enaction, to control an agent that learns to fulfill its autotelic and in-
 teractional motivation. Also, ECA allows implementing self-motivation in the agent1(Georgeon,
 Marshall, & Manzotti, 2013). The environment consists of a grid of empty cells (white squares)
 where the agent (represented by the brown arrowhead) tries to move one cell forward, turn to left or
 to the right. The experimenter can flip any cell from empty to wall or vice versa by clicking on it at
 any time. Also, the environment is composed of walls (gray squares) where the agent could bump
 if it tries to move through them.
 1. http://e-ernest.blogspot.fr/2013/09/ernest-12.html
 28
J. P. GARNIER, O. L. GEORGEON, AND A. CORDIER
 The agent has a rudimentary distal sensory system was inspired by the visual system of an
 arachaic arthropod, the limulus: the limulus?s eyes responds to movement, and the limulus has to
 move to "see" immobile things. The agent "likes" to eat blue fish (called target). When the agent
 reaches a target, the target disappears as if the agent had eaten it. The experimenter can introduce
 other targets by clicking on the grid. The agent?s visual system consists of one simple detector
 (violet half-circle on the agent) for detecting target. His detector covers a 180 span. This visual
 system is not sensitive to static elements of the visual field (such as the presence and the position
 of the target) but to changes in the visual field as the agent moves: closer, appears, unchanged and
 disappeared. Moreover, the agent divides his visual field in three area: A, B and C. These area
 inform the agent in which directional quadrant the entity is detected.
 The designer can also specify the numerical valence associated with primitive interactions be-
 fore running the simulation. The values chosen implement a behavioral proclivity to move towards
 targets because the agent has positive satisfaction when the targets appears or closer, and negative
 satisfaction when the target disappears.
 a) Primitive interactions (valence) Meaning (ignored by the agent) 
b) Environment:
 i7 (-1) i8 (-1)
 i3 (3) i4 (3)
 i1 (10) i2 (10)
 i5 (-1) i6 (-1)
 i9 (15) i10 (-1)
 i11 (-5) i12 (-5)
 i13 (-6) i14 (-6)
 turn right target closer, turn left target closer
 turn right target appears, turn left target appears
 turn right visual field unchanged, turn left visual field unchanged
 turn right target disappeared, turn left target disappeared
 move forward target eaten, bump
 move forward target disappeared, move forward target appears
 move forward visual field unchanged, move forward target closer
 Visual field
 Area A
 Area B
 Area C
 Figure 3. a) The 14 primitive interactions available to the agent with their numerical valence in parentheses,
 set by the experimenter. This valence system implements the motivation to move towards targets because the
 valence is positive when the target appears or approaches, and negative when the target disappears. b) The
 agent in the environment with the agent?s visual field overprinted. There are three directional quadrants in
 which visual interactions can be localized:  t 2 fA;B;Cg. Non-visual interactions are localized in a fourth
 abstract quadrant labeled "O".
 29
INFERRING ACTIONS AND OBSERVATIONS FROM INTERACTIONS
 To understand how the agent, during interactions with the environment, constructs its actions
 and its observations, we propose a simplified UML model and an example in Figure 4, and finally
 the algorithm.
 Interaction
 -Valence
 -TryToEnact()
 Agent_Action
 -ListOfInteractions
 Agent_Observation
 -ListOfInteractions
 1..n
 1..n
 i9
 i10
 i11
 i12
 i13
 i1
 i2
 i3
 i4
 Simplified UML model Example constructed instances
 i11
 i12
 i14
 MovewForward
 Target
 i14
 Figure 4. Simplified UML model (left): the modeler defines primitive interactions as instances of sub-
 classes of the Interaction class (left) and programs their effects in the TryToEnact() method. The agent
 constructs actions as instances of the Agent_Action class (top-right) and observations as instances of the
 Agent_Observation class (bottom-right) from sequential and spatial regularities observed while enacting in-
 teractions. Example constructed instances (right): the action Move Forward can be enacted through the
 interactions i9; i10; i11; i12; i13; i14. The observation Target affords interactions i1; i2; i3; i4; i11; i12; i14.
 To interact with the environment, the agent utilizes a set of interactions defined by the designer.
 The designer programs an interaction as an action on the environment and an observation. But, the
 agent originally ignores this distinction and must learn that some interactions inform it about the
 presence of an entity in its surrounding space, while simultaneously learning to categorize these
 entities. Each interaction can be afforded by a specific type of entity. In using the model DRI, see
 section 2, at decision step t, the agent tries to enact an intended interaction it and get the actually
 enacted interaction, enacted interaction, et at the end of step t. If the enacted interaction differs
 from the intended interaction (et 6= it) then the agent considers that these interactions produce two
 different actions a1; a2. Thus, a first action is represented by interaction et and a second action is
 represented by interaction it (a1 = fetg and a2 = fitg). In case of et = it, the agent considers that
 these interactions produce the same action, which can be represented by the set of these interactions
 (a1 = a2 = fet; itg).
 A type of entity present in the world affords a collection of interactions. When a set of inter-
 actions consistently overlaps in space, the agent infers the existence of a kind of entity that affords
 these interactions. To be concrete, a physical object would be an entity that is solid and persistent.
 The agent uses spatial information from DRI model to learn to categorize the entity with it can
 interact, according to the collection of interactions that this entity affords. At decision step t the
 agent tries to enact an intended interaction it and get the interaction effectively enacted, enacted
 interaction, et at the end of step t. In each enacted interaction there is the directional quadrants
 (A, B, C or O) where it enacted. If the enacted interaction et is in the same area that enacted
 interaction et 1 then the agent considers that these interactions are afforded by the same entity
 30
J. P. GARNIER, O. L. GEORGEON, AND A. CORDIER
 (entity1 = entity2 = fet; et 1g). In case of these interactions are enacted in two different area,
 the agent infer it exists two kind of entity (entity1 = fetg and entity2 = fet 1g).
 4. Result
 During the learning phase, the agent learns a behavior that it then uses to reach subsequent targets
 introduced by the experimenter. Different instances of agents may learn different behaviors as a
 result of having different learning experiences. Figure 5 and 6 show traces of two behaviors learned
 by two different agents. Once a behavior has been learned, the agent keeps using it indefinitely to
 reach subsequent targets.
 10 20 30 40 50 60 70 80 90
 Tape 
1
 2
 3
 4
 Figure 5. First 97 steps in Example 1. Tape 1 represents the primitive interactions enacted, in directional
 quadrant A (top), B (center), C (bottom), with the same symbols as in Figure 3. Tape 2 represents the
 valence of the enacted primitive interactions as a bar graph (green when positive, red when negative). Tape
 3 represents the progressive aggregations of interactions to form actions. The shape represents the action
 and the color is the color of the enacted interaction aggregated to this action at a particular time step. The
 triangles correspond to the move forward action, the inferior half-circles to the turn right action, and the
 superior half-circles to the turn left action. Tape 4 represents the progressive aggregation of interactions to
 form observations. The shape represents the category of observation and the color is the color of the enacted
 interaction aggregated to this category of observation at a particular time step. The circles represent the
 observation of a target, and the squares the observation of void. The agent also constructs a third category of
 observation: the observation of walls. However, since walls are only observable through a single interaction
 (i10, red rectangles), there is no aggregation of other interactions to the wall observation. In this example, the
 agent ate the first target on step 20 (blue rectangle in Tape 1). The experimenter introduced the second target
 on step 30, and the agent ate it on step 70. The third target was introduced on step 74 and eaten on step 97.
 The agent learned to reach the target through a "stair step" behavior consisting of repeating the sequence turn
 left - move forward - turn right - move forward, until it aligns itself with the target and then keeps moving
 forward until it reaches the target (steps 78 to 97).
 A different choice of valence or modification of the environment by the experimenter at dif-
 ferent times shows that behavior depends on the motivation that drives the agent and environment
 configuration. For example, if the experimenter add a target earlier than in Example 1, the agent
 acts differently. This behavior has been observed in Experiment 2 illustrated by the example trace
 in Figure 6.
 31
INFERRING ACTIONS AND OBSERVATIONS FROM INTERACTIONS
 0 10 20 30 40 50 60 70 80 90
 Tape 
1
 2
 3
 4
 Figure 6. First 99 steps in Example 2. The behavior is the same as in Example 1 up to step 25. The ex-
 perimenter introduced the second target on step 26 rather than 30 in Example 1. This difference caused the
 agent to learn a different behavior to reach the target, consisting in moving in a straight line until the target
 disappears from the visual field, then getting aligned with the target by enacting the sequence turn right ? turn
 right ? move forward ? turn right, then keeping moving forward until it reaches the target (episodes 26 to 44,
 50 to 67, 71 to 86 and 89 to 99).
 This experiment also demonstrates the interesting property of individuation: different instances
 of agents with the same algorithm may learn different behaviors due to the specific learning expe-
 riences that they had. From step 26, behaviors are different. Such individuation occurs through "en
 habitus deposition" as conceptualized by the theory of enaction.
 5. Conclusion
 This work addresses the problem of implementing agents that learn to master the sensorimotor
 contingencies afforded by their coupling with their environment (O?Regan & No?, 2001). In our
 approach, the modeler specifies the low-level sensorimotor contingencies through a set of sensori-
 motor interactions, which corresponds to what Buhrmann, Di Paolo, and Barandiaran (2013) have
 called the sensorimotor environment. The learning consists for the agent to simultaneously learn
 actions and categories of observable entities as second-order constructs. Here, we use the concept
 of action in its cognitive sense of "intentional action" (Engel et al., 2013). Our algorithm offers a
 solution to implements Engel et al.?s (2013, p203) view that "agents first exercise sensorimotor con-
 tingencies, that is, they learn to associate movements with their outcomes, such as ensuing sensory
 changes. Subsequently, the learned patterns can be used for action selection and eventually enable
 the deployment of intentional action".
 Our agent has no pre-implemented strategy to fulfill his inborn motivation (approaching the
 target). We show two examples in which the agent learns two different deployments of actions to
 fulfill this motivation (Figure 5 and 6). These deployments of actions can be considered intentional
 because the agent anticipates the consequences of actions and use anticipation to select actions.
 In future studies, we plan on designing agents capable of reasoning upon their intentionality to
 learn to explicitly consider observable entities as possible goals to reach. We expect that emergent
 intentionality associated with explicit goal construction will make the agents capable of exhibiting
 more sophisticated behaviors in more complex environments, and contribute more broadly to the
 research effort on goal reasoning.
 32
J. P. GARNIER, O. L. GEORGEON, AND A. CORDIER
 Acknowledgements
 This work was supported by the French Agence Nationale de la Recherche (ANR) contract ANR-
 10-PDOC-007-01.
 References
 Buhrmann, T., Di Paolo, E. A., & Barandiaran, X. (2013). A dynamical systems account of senso-
 rimotor contingencies. Frontiers in psychology, 4.
 Engel, A. K., Maye, M., Kurthen, M., & K?nig, P. (2013). Where?s the action? the pragmatic turn
 in cognitive science. Trends in Cognitive Sciences, 17, 202?209.
 Georgeon, O. L., & Aha, D. (2013). The radical interactionism conceptual commitment. Journal of
 Artificial General Intelligence, In press.
 Georgeon, O. L., Marshall, J., & Gay, S. (2012). Interactional motivation in artificial systems: Be-
 tween extrinsic and intrinsic motivation. International Conference on Development and Learning
 (ICDL-Epirob).
 Georgeon, O. L., Marshall, J., & Manzotti, R. (2013). Eca: An enactivist cognitive architecture
 based on sensorimotor modeling. Biologically Inspired Cognitive Architectures, 6, 46?57.
 Georgeon, O. L., & Ritter, F. (2012). An intrinsically-motivated schema mechanism to model and
 simulate emergent cognition. Cognitive Systems Research, 15-16, 73?92.
 O?Regan, J. K., & No?, A. (2001). A sensorimotor account of vision and visual consciousness.
 Behavioral and Brain Sciences, 24, 939?972.
 Piaget, J. (1955). The construction of reality in the child. London: Routledge and Kegan Paul.
 Steels, L. (2004). The autotelic principle. In F. Iida, R. Pfeifer, L. Steels, & Y. Kuniyoshi (Eds.), Em-
 bodied artificial intelligence, Vol. 3139 of Lecture Notes in Computer Science, 231?242. Springer
 Berlin Heidelberg.
 33
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning 
 
 
 
Beyond the Rational Player: Amortizing Type-Level Goal Hierarchies 
 
Thomas R. Hinrichs T-HINRICHS@NORTHWESTERN.EDU 
Kenneth D. Forbus FORBUS@NORTHWESTERN.EDU 
EECS, Northwestern University, Evanston, IL 60208 USA 
Abstract 
To what degree should an agent reason with pre-computed, static goals?  On the one hand, 
efficiency and scaling concerns suggest the need to avoid continually re-generating subgoals, 
while on the other hand, flexible behavior demands the ability to acquire, refine, and prioritize 
goals dynamically.  This paper describes a compromise that enables a learning agent to build up a 
type-level goal hierarchy from learned knowledge, but amortizes the cost of its construction and 
application over time.  We present this approach in the context of an agent that learns to play the 
strategy game Freeciv. 
1.  Introduction 
One of the premises of goal reasoning is that explicit, reified goals are important for flexibly 
driving behavior.  This idea is almost as old as AI itself and was perhaps first analyzed and 
defended by Alan Newell (1962), yet it is worth revisiting some of the rationale.  Explicit goals 
allow an agent to reason about abstract future intended states or activities without making them 
fully concrete.  A goal may be a partial state description, or it may designate a state in a way that 
is not amenable to syntactic matching (e.g., to optimize a quantity). 
 A challenge arises when a domain is dynamic and involves creating new entities on the fly, and 
therefore potentially new goals, or simply involves too many entities to permit reifying 
propositional goals.  For example, in playing Freeciv1, new cities and units are continually being 
created and it is not possible to refer to them by name before they exist.  Moreover, each terrain 
tile in a civilization may be relevant to some goal, but it is infeasible to explicitly represent them 
all.  In short, static goal trees can be inflexible and reified propositional goal trees can be 
prohibitively large.  Goal representations may not be amenable to matching, and dynamically 
inferring subgoals can be expensive. 
 These challenges have led us to a compromise solution with six parts: 
1. Elaborate a goal lattice from a given performance goal and a learned qualitative model. 
2. Represent goals at the type level. 
3. Indexicalize goals for reuse across game instances. 
4. Reify denotational terms (names) for goals to reduce computational cost of subgoaling. 
5. Identify and reify goal tradeoffs. 
6. Index goals and actor types by capability roles. 
                                                 
1 http://freeciv.wikia.com 
34
T. R. HINRICHS, AND K. D. FORBUS 
 Given this information, a very simple agent can efficiently make informed decisions with 
respect to goals.  We call such an agent a rational player.  A rational player has the property that 
its actions can be explained or justified with respect to a hierarchy of domain goals.  This is not a 
planner per se, because it does not necessarily project future states of the world, but a rational 
player prefers actions that it believes will positively influence higher-level (and therefore more 
important) goals.  As we progress beyond the rational player, this weak method for deciding what 
to do can be incrementally augmented and overridden by learned knowledge that looks more like 
task-decomposition plans and policies for choosing among tradeoffs. 
 This paper presents our type-level goal representation and describes how it is produced and 
used by a rational player in the context of learning to play Freeciv, and previews our extension to 
a more reflective player. 
2.  The Freeciv Domain 
Freeciv is an open-source implementation of the popular Civilization series of games. The main 
objective is to build a large civilization over several simulated millennia and conquer all other 
civilizations.  This involves many types of decisions, long-term goals, and tradeoffs.  Examples of 
some of these types of decisions can be seen in Figure 1.   
 There are clear challenges in learning a game like this.  The sheer diversity of activities, 
complexity of the world model, uncertainty of adversarial planning and stochastic outcomes,  
incomplete information due to the "fog of war", and dynamic nature of the game makes it 
fundamentally different from games like chess or checkers.   
 Some salient features of this game are that many actions are durative, actions have delayed 
effects, and the game simulates a number of quantitative systems.  This makes it relatively easy to 
model in terms of continuous processes (Forbus, 1984).  In fact, one of our hypotheses has been 
that by learning a qualitative model of the game, a simple game playing agent should be able to 
exploit the model to achieve flexible behavior in far fewer trials than it would need using 
reinforcement learning.  An additional benefit of an explicit model is that it can be extended 
through language-based instruction or by reading the manual, and its behavior is explainable with 
respect to comprehensible goals. 
 
 
Figure 1: Freeciv map, typical production decisions, and partial technology tree 
 
35
 BEYOND THE RATIONAL PLAYER: AMORTIZING TYPE-LEVEL GOAL HIERARCHIES  
3.  Representing a Reified Goal Lattice 
We assume that games are defined in terms of one or more top-level goals for winning along with 
a system of rules.  We also assume that there is a set of discrete primitive actions with declarative 
preconditions, plus an enumerable set of quantity types whose value can be sampled.  Beyond 
that, we assume almost nothing about the domain.  Initially, therefore, there is no goal lattice, no 
HTN plans, or goal-driven behavior.   The system must learn to decompose goals to subgoals by 
observing an instructor?s demonstration and inducing a qualitative model of influences between 
quantities and the effects of actions on quantities.  This directed graph of influences is translated 
into a lattice of goals.  So for example, given a positive influence between food production on a 
tile and food production in a city, a goal of maximizing city food production is decomposed into a 
subgoal of maximizing food production on its tiles.  This lattice-building process starts with a 
high-level goal and works down, bottoming out in leaves of the influence model.  Notice that the 
construction of this goal lattice does not suggest that the system knows how to pursue any 
particular goal or goal type.  It merely provides a general way to search for subgoals that might be 
operational ? there might be a learned plan for achieving a goal or a primitive action that has been 
empirically discovered to influence a goal quantity. 
 As we talk about goals in this paper, we will occasionally refer to performance goals, as 
opposed to learning goals.  Performance goals are domain-level goals that contribute to winning 
the game.  Learning goals, on the other hand, are knowledge acquisition goals that the agent posts 
to guide experimentation and active learning.  In addition, the performance goals in the lattice can 
be further partitioned into propositional goals (achieve, maintain, or prevent a state) and 
quantitative goals (maximize, balance, or minimize a quantity).   Identifying subgoal relations 
between propositional and quantitative goals involves a combination of empirical and analytical 
inferences. In this section, we describe how this works and how these goals are decomposed and 
represented. 
3.1  The learned qualitative model 
A key idea is that subgoal decompositions can be derived directly from a learned qualitative 
model of the domain (Hinrichs & Forbus 2012a).  The system induces qualitative relations in the 
game by observing an instructor?s demonstration and tracking how quantities change in response 
to actions and influences over time, in a manner similar to BACON (Langley et al., 1987).  
Influences are induced at the type-level using higher-order predicates (Hinrichs & Forbus 2012b).  
For example: 
(qprop+TypeType (MeasurableQuantityFn cityFoodProduction)  
                (MeasurableQuantityFn tileFoodProduction)  
                FreeCiv-City FreecivLocation cityWorkingTileAt) 
This represents the positive indirect influence between the food produced at a tile worked by a 
city and the total food produced by the city. This second-order relation implicitly captures a 
universal quantifier over all cities and locations related by the cityWorkingTileAt predicate.  
We refer to this as a type-level relation because it does not reference any instance-level entities 
such as particular cities or locations in the game.  In addition to influences, we also induce 
qualitative process limits by looking for actions and events that bookend monotonic trends. 
36
T. R. HINRICHS, AND K. D. FORBUS 
3.2  Building the Goal Lattice from the Qualitative Model 
Given a model of type-level qualitative influences and a performance goal, such as maximizing 
food production in cities, it is straightforward to infer that one way to achieve this is to maximize 
the food produced on tiles worked in cities.  This inference is more complex when the parent goal 
is quantitative and the subgoal is propositional, or vice versa.  In the former case, the influence 
model may contain a dependsOn relation, which is a very general way of saying that a quantity is 
influenced by a proposition, for example, the food production on a tile depends on the presence of 
irrigation.  Our system currently learns this through language-based instruction. 
 The opposite case is more complex.  What does it mean for a propositional goal to have a 
quantitative subgoal?  We?ve identified two ways this happens: maximizing the likelihood of 
achieving or preventing the proposition and maximizing the rate of achieving it.  Our current 
implementation focuses on subgoaling via the rate.  For example, given a propositional goal such 
as (playerKnowsTech ?player TheRepublic), it identifies this as the outcome of a 
durative action whose process is directly influenced by the global science rate (learned 
empirically).  Maximizing this will achieve the proposition sooner, and is therefore a subgoal. 
 We found such inferences became unacceptably slow when they are re-computed for each 
decision and transitively propagated through the influence model.  To avoid repeatedly 
instantiating intermediate results, we translate the influence model into an explicit goal lattice 
offline, store it with the learned knowledge of the game, and elaborate it incrementally as the 
learned model is revised.  A partial goal lattice for Freeciv is shown in Figure 2. 
3.3  Representing goals at the type level 
Because type-level goals are not tied to particular entities that change across instances of the 
game, they remain valid without adaptation.  For example, a goal to maximize surplus food in all 
cities could be represented as: 
(MaximizeFn 
 ((MeasurableQuantityFn cityFoodSurplus) 
  (GenericInstanceFn FreeCiv-City)) 
Here, GenericInstanceFn is used to designate a skolem to stand in for instances of a city, as 
if there were a universally quantified variable. 
3.4  Indexical Goal Representations 
One problem with the goal above is that it does not restrict the cities to those owned by the 
current player whose goal this is.  The goal representation must relate everything back to the 
current player, except that the player?s name changes in each game instance.  The player may be 
called Teddy Roosevelt in one game and Attila the Hun the next.  Consequently, we introduce an 
indexical representation for the current player as: 
(IndexicalFn currentPlayer) 
where currentPlayer is a predicate that can be queried to bind the player?s name.  Now, instead of 
our skolem being simply every city, we must construct a specification that relates the subset of 
cities to the indexical: 
37
 BEYOND THE RATIONAL PLAYER: AMORTIZING TYPE-LEVEL GOAL HIERARCHIES  
(CollectionSubsetFn FreeCiv-City 
  (TheSetOf ?var1 
   (and (isa ?var1 FreeCiv-City) 
         (ownsCity (IndexicalFn currentPlayer)?var1)))) 
This subset description is constructed by recursively walking down from the top-level 
indexicalized game goal through the type-level qualitative influences.  These influences relate the 
entities of the dependent and independent quantities.  These relations are incrementally added to 
the set specification to form an audit trail back to the player.  Thus the translation from model to 
goal involves no additional domain knowledge.  
3.5  Denotational Terms for Goals 
The fully-expanded representation for a simple quantity goal starts to look rather complex.  We 
found that simply retrieving subgoal relationships at the type level suffered from the cost of 
unifying such large structures.  As a consequence, we construct explicit denotational terms  for 
type-level goals and use these more concise names to capture static subgoal relationships.  The 
goalName predicate relates the denotational term to the expanded representation: 
(goalName (GoalFn 14) 
  (MaximizeFn 
   ((MeasurableQuantityFn cityFoodSurplus) 
    (GenericInstanceFn 
     (CollectionSubsetFn FreeCiv-City 
      (TheSetOf ?var1 
       (and (isa ?var1 FreeCiv-City) 
            (ownsCity (IndexicalFn currentPlayer)?var1)))))))) 
Representing the subgoal relationship is now more concise: 
(subgoal (GoalFn 14) (GoalFn 15)) 
This leads to more efficient retrieval, unification and reasoning.  The savings in run time to 
retrieve a fact of this form versus compute it ranges from a factor of about 36 to a factor of 250. 
3.6  Reifying Goal Tradeoffs 
Goals are often in conflict.  Resource allocation decisions in particular almost always involve 
tradeoffs.  Most commonly, there?s an opportunity cost to pursuing a goal.  The appropriate 
balance usually changes over time and often varies across different entities based on extrinsic 
properties such as spatial location.  In the process of building the goal lattice, we also record 
whether sibling goals involve a tradeoff.  In Figure 2, this is indicated by an oval node shape. 
 The tradeoffs are partitioned along two dimensions: partial/total indicates whether every 
instance of the goal is affected or only some, while progressive/abrupt indicates whether the 
tradeoff happens gradually or instantaneously.  So a total-abrupt tradeoff describes mutually 
exclusive goals, while total-progressive describes a global balance that could shift over time (such 
as changing tax allocations).  A partial-abrupt tradeoff instantaneously switches the goal 
preference for some subset of possible entities and not others (such as suddenly putting all coastal 
38
T. R. HINRICHS, AND K. D. FORBUS 
cities on a defensive war footing), whereas a partial-progressive tradeoff gradually changes the 
goal preference over time and for different entities. 
 The existence of a tradeoff is inferred based on whether two quantities are percentages of a 
common total, inversely influence each other, or constitute a build-grow pattern that trades off the 
number of entities of a quantity type versus the magnitude of that of that quantity for each entity.   
3.7  Indexing Capability Roles 
While the goal network amortizes subgoal relations, assignment decisions can also benefit from 
reifying type-level information.  We treat the Freeciv game player as a kind of multi-agent system 
to the extent that there are different units and cities that can behave independently or in concert. 
The different unit types have different capabilities such that, for example, only Settlers can found 
cities, and only Caravans or Freight can establish trade routes.    Static analysis run before the 
first game inspects the preconditions of primitive actions and clusters unit types into non-
 mutually exclusive equivalence categories with respect to groups of actions.   
4.  Reasoning with the Goal Lattice 
To illustrate how the goal lattice can be used flexibly and efficiently, we present a sequence of 
 
Figure 2: Reified Freeciv Goal Lattice.  Oval shapes indicate goals with tradeoffs.  Full goal 
representations would be too large to display, but see section 3.5 for the expansion of (GoalFn 14). 
 
Irrigating
 Building 
Libraries, etc
 Player knows Democracy
 Maximize Science Rate
 Trade
 Production
 Food
 Acquired from language
 Acquired from language
 39
 BEYOND THE RATIONAL PLAYER: AMORTIZING TYPE-LEVEL GOAL HIERARCHIES  
game-playing interpreters.  We do not refer to these as planners because they do not project future 
states.  They make individual decisions or expand learned HTN tasks to execute primitives, after 
which they re-plan to adapt to the dynamically changing environment. 
4.1  The Legal Player 
The legal player is essentially a strawman game interpreter.  It works by enumerating all legal 
actions on each turn for each agent and picking randomly.  This provides a point of comparison to 
show the effect of domain learning, and also serves to illustrate the size of the problem space.  On 
the initial turn, with five units and no cities, there are 72 legal actions.  This problem space 
quickly grows as new units and cities are constructed so that on a typical game, after 150 turns 
there are over 300 legal actions (388 on a recent game, but this depends on how many units and 
cities a civilization has).  The gameplay is abysmal, as expected.  In particular, it tends to 
continually revise decisions about what to research, what to build and what government to pursue. 
4.2  The Rational Player 
The rational player is the simplest player that makes use of goals when deciding what to do.  It 
uses the goal lattice in two ways: it makes event-triggered decisions and it polls for things for 
agents to do.  To clarify this distinction, consider that some decisions can legally be made at any 
time, such as choosing what to build, selecting a government or a technology to research.  To 
avoid constantly churning through these decisions on every turn, we learn via demonstration what 
domain events trigger these decision types.  So for example, we choose what a city should 
produce when it is first built or whenever it finishes building something.  By factoring out the 
sequencing and control issues, the rational player can focus on making these individual decisions 
with respect to its goal lattice.  It does this by enumerating the alternative choices and searching 
breadth-first down the goal lattice.  Recall that the lattice reflects the causal influence structure of 
the domain model.  In other words, traversing the goal lattice is a kind of regression planning 
from desired effects to causal influences.  At each level in the lattice, it collects the sibling goals 
and uses simple action regression planning to gather plans that connect the decision choices to the 
goals.  If no such connection is found, it proceeds to the next set of subgoals and tries again.  
Alternatively, if more than one action leads to a goal at that level, the competing plans are used to 
populate a dynamic programming trellis in order to select a preferred choice.  Due to the dynamic 
nature of the domain, we use a heuristic that prefers choices that maximize future options. So for 
example, to achieve TheRepublic, a good place to start is by researching TheAlphabet 
because it enables more future technologies sooner than do the alternatives. 
 The other mode of operation is resource-driven.  The rational planner enumerates the available 
actors in the game and assigns them to goals by again searching breadth-first through the goal 
lattice and finding the highest-level operational goal with which the actor can be tasked.  Here, 
the notion of an operational goal is different than it is for decision tasks.  Instead of applying 
action regression and dynamic programming, it looks for primitive actions and learned HTN 
plans that have been empirically determined to achieve the goal type.  The most common learned 
plans are those that achieve the preconditions of primitives.  So for example, the goal of having 
irrigation in a location is operational because of the existence of a plan for achieving the 
preconditions of the doIrrigate primitive. The precondition plan is learned from demonstration 
and by reconciling the execution trace against the declarative preconditions of the action.  
40
T. R. HINRICHS, AND K. D. FORBUS 
Consequently, when the rational planner searches the goal lattice, it finds the goal to achieve 
irrigation, which it can directly perform. 
 In these ways, the rational player uses learned knowledge both in the decomposition of goals 
and in the construction of plans to achieve them.  It does not, however, use known goal tradeoffs 
to search for appropriate balances between goals. 
4.3  The Reflective Player 
The reflective player is a superset of the rational player.  It is designed to allow learned 
knowledge to override the blind search methods of the rational player.  Although still under 
development, the reflective player will exploit higher-level strategies acquired by 
operationalizing domain-independent abstract strategies.  Operationalization works by reconciling 
demonstration and instruction against the known goals and goal tradeoffs.  Our belief is that 
effective, coordinated strategies are not something that people invent from scratch when they 
learn a game, but instead they are adapted from prior background knowledge.  This is markedly 
different from a reinforcement learning approach that re-learns each game anew. 
5.  Related Work 
To the extent that we are concerned with rational behavior in a resource-bounded agent, our 
approach can be viewed as a kind of Belief-Desires-Intentions (BDI) architecture.   As such, it is 
novel in what it chooses to represent persistently in terms of goals (the type level lattice), and 
how it breaks behavior down into action plans and decision tasks and what or how much to re-
 plan on each turn. 
 Goel and Jones (2011) investigated meta-reasoning for self-adaptation in an agent that played 
Freeciv.  We are pursuing similar kinds of structural adaptation, while trying to learn higher-level 
strategies from demonstration and instruction. 
 Many researchers have explored using reinforcement learning for learning to play games. In the 
Freeciv domain, Branavan et al. (2012) combine Monte-Carlo simulation with learning by 
reading the manual.  While information from the manual accelerates learning, it still requires 
many trials to learn simple behaviors and the learned knowledge is not in a comprehensible form 
that can serve to explain its behavior. 
 Hierarchical Goal Network planning (HGN) is a formalism that incorporates goals into task 
decomposition planning (Shivashankar et al, 2012).  It combines the task decomposition of HTN 
planning with a more classical planning approach that searches back from goal states.  This is 
similar to the way we use HTN decomposition to set up regression planning from propositional 
goals, however we do not produce a complete plan this way and execute it.  Instead, we use the 
planner to identify the best initial step and execute that, and then re-plan.  This works well when 
the actions are durative, such as researching a technology.  In other cases, where actual 
procedures make sense, we bottom out at learned HTN plans, rather than classical planning.  
6.  Summary and Future Work 
  We have described a compromise between pre-computing or providing static goals versus 
completely inferring them on the fly.  Since the structure of the game doesn't change, the lattice 
41
 BEYOND THE RATIONAL PLAYER: AMORTIZING TYPE-LEVEL GOAL HIERARCHIES  
of goal types can be saved for efficient reuse without over-specifying particular entities.  This 
goal lattice can be used in a simple reasoning loop to guide planning and decision making. 
 This does not preclude learning more strategic and reactive behaviors.  Our current focus is on 
learning more coordinated and long-term strategies through language, demonstration, and 
experimentation.  We believe that a key benefit of this weak-method for applying learned 
knowledge will be that as it learns to operationalize higher-level goals, it will be able to short-
 circuit more of the reasoning.  This is a step towards our main research goal of long-term, high-
 performance learning. 
Acknowledgements 
This material is based upon work supported by the Air Force Office of Scientific Research under Award 
No.FA2386-10-1-4128. 
References 
Branavan, S.R.K, Silver, D. & Barzilay, R., (2012).  Learning to Win by Reading Manuals in a 
Monte-Carlo Framework.  Journal of Artificial Intelligence Research, 43, 661-704. 
Forbus, K. D. (1984). Qualitative process theory. Artificial Intelligence, 24, 85?168. 
Forbus, K., Klenk, M., & Hinrichs, T.  ( 2009). Companion Cognitive Systems: Design Goals and 
Lessons Learned So Far. IEEE Intelligent Systems, 24, no. 4, pp. 36-46, July/August. 
Goel, A. & Jones, J. (2011). Metareasoning for Self-Adaptation in Intelligent Agents.  In M. T. 
Cox & A. Raja (Eds.), Metareasoning: Thinking about Thinking. Cambridge, MA: MIT Press. 
Hinrichs, T. & Forbus K. (2012) Learning Qualitative Models by Demonstration.  In Proceedings 
of the Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 207-213), Toronto, CA. 
Hinrichs, T. & Forbus, K. (2012). Toward Higher-Order Qualitative Representations.  In 
Proceedings of the 26th International Workshop on Qualitative Reasoning, Playa Vista, CA. 
Langley, P., Simon, H.A., Bradshaw, G.L. & Zytkow, J.M. (1987). Scientific Discovery: 
Computational Explorations of the Creative Processes.  Cambridge, MA: MIT Press. 
Newell, A. (1962) Some Problems of Basic Organization in Problem-Solving Programs.  RAND 
Memo RM-3283. 
Rao, A. S. & Georgeff, M. P., (1995).  BDI Agents: From Theory to Practice.  In Proceedings of 
the First International Conference on Multi-Agent Systems (pp. 312-319). 
Shivashankar, V., Kuter, U., Nau, D., & Alford, R. (2012).  A Hierarchical Goal-Based 
Formalism and Algorithm for Single-Agent Planning.  In Proceedings of the 11th International 
Conference on Autonomous Agents and Multiagent Systems. 
42
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning
 Situation Awareness for Goal-Directed Autonomy by Validating
 Expectations
 Michael Karg KARGM@IN.TUM.DE
 Institute for Advanced Study, Technische Universit?t M?nchen, Lichtenbergstrasse 2a, D-85748
 Garching, Germany, http://hcai.in.tum.de
 Alexandra Kirsch ALEXANDRA.KIRSCH@UNI-TUEBINGEN.DE
 Department of Computer Science, University of T?bingen, Sand 14, D-72076 T?bingen, Germany,
 http://www.hci.uni-tuebingen.de
 Abstract
 Robots that are supposed to work in everyday environments are confronted with a wide variety of
 situations. Not all such situations can be taken into account by a programmer when implementing
 the system. This is why robots often show strange behavior when they encounter situations that their
 programmer has not expected. We propose a knowledge-based approach to explicitly represent
 expectations in the robot program. Comparing those expectations to the current situation allows
 the robot itself to detect unusual situations and react appropriately. Our general framework can
 incorporate expectations from different knowledge sources and offers a flexible combination of
 different expectations. We demonstrate the feasibility of the approach in the context of a household
 robot in simulation. Finally we discuss the adequacy of our proposed solution and open questions
 for further research.
 1. Introduction
 Goal-directed agents must be able to detect situations that require them to generate new goals or
 change existing goals. For example, one component in the framework of goal-driven autonomy
 (GDA) (Molineaux, Klenk, and Aha, 2010) is a discrepancy detector that compares expectations
 to the current situation in the world. The output of the discrepancy detector is used to generate an
 explanation, which then generates new goals.
 In this paper, we introduce a general framework for detecting anomalies by comparing expecta-
 tions to the situation in the world. We represent expectations explicitly in the robot program, using
 different knowledge sources. Figure 1 illustrates some categories of expectations: In the left picture
 there are objects floating in the air. This clearly violates the laws of physics and would contradict
 a naive physics reasoner. The human lying on the floor obeys the laws of physics, but is unusual
 behavior. The image on the right shows a less surprising situation, even though a pile of boxes on a
 table might not be a very normal thing in a household while they would be expected in a warehouse.
 In general, unexpected situations can have different causes including failures in the robot program
 (caused by inaccurate sensing or unreliable action execution); unexpected human behavior, such
 43
M. KARG AND A. KIRSCH
 Figure 1. Left: An "abnormal" simulated scene in human robot interaction. Right: A "normal" simulated
 scene in human robot interaction.
 as dropping objects or forgetting things (the latter is especially relevant in the care of people with
 dementia); and events in the environment such as someone calling at the door or a storm coming up.
 We propose a framework that combines different kinds of knowledge to represent different classes
 of expectations that can differ according to the specific application. Promising candidates to provide
 such knowledge are naive physics reasoning (Akhtar and Kuestenmacher, 2011), simulation-based
 projection (Kunze et al., 2011), geometric reasoning (M?senlechner and Beetz, 2013) or knowledge
 based inference (Kunze, Tenorth, and Beetz, 2010). Expectations can also arise from the execution
 context: If the robot uses a planner, the effects of the actions are expectations of properties of the
 world after executing the action. Each of these methods can detect certain kinds of anomalies in the
 specific situation, but none of them covers all possible sources of surprise.
 An unusual situation is not necessarily an error or a failure. Some events may not require
 any reaction of the robot at all (when the doorbell rings and an inhabitant of the household opens
 the door, the robot has no need for action). Others may require clarification actions by the robot
 such as asking the human lying on the floor what is wrong. In this paper, we do not deal with
 the explanation of the observed discrepancy or appropriate reactions. However, the framework
 represents all expectations in a normality tree that can serve as a starting point to analyze the found
 discrepancies and to generate or change goals.
 We first present related work of automatic detection of failures or surprise in general. Then
 we present our framework, in particular options for representing expectation models and how to
 combine different expectations. After that we demonstrate the feasibility of the approach for a
 household robot in a simulated world. The paper ends with a discussion of the benefits and limits
 of our approach, pointing out further research questions.
 2. Related Work
 General, formal approaches for failure detection in technical systems follow a similar aim as our
 proposed expectations framework and include research from model-based programming (Struss,
 2008) and discrete control theory (Kelly et al., 2009). In both approaches, the system is described
 44
SITUATION AWARENESS USING EXPECTATINOS
 by a formal model like a state automaton with probabilistic transitions or Petri nets. By explicitly
 modeling failures, a system can avoid such states or diagnose the reasons for encountering a failure.
 In the field of robotics, such approaches are common for the diagnosis of internal faults of
 single components of a robot. Gerald Steinbauer (2005) introduces observers to perform model
 based diagnosis on robot components without affecting the control system, while Kuhn et al. (2008)
 propose the paradigm of pervasive diagnosis to simultaneously enable active diagnosis and model
 based control. Wiliams et al. (2003) use model based autonomy to enable autonomous systems to be
 aware of their states and possible errors in uncertain environments. They use models of the nominal
 behavior of a system as well as models of common failure modes to perform extensive reasoning
 and recognize and recover from failures. While the approaches mentioned so far consider only
 internal faults of technical systems, Akhtar and Kuestenmacher (2011) use qualitative reasoning on
 naive physics concepts for diagnosis for the prediction of external faults of autonomous systems.
 However, these approaches are not suitable to failures in the high-level behavior of autonomous
 service robots. First, defining the interaction of a human and a robot in a finite state machine would
 mean a huge modeling effort. Whereas in discrete control theory the formal model can often be ex-
 tracted directly from the system specification (which is usually given in a work-flow programming
 language), such an automaton would have to be hand-coded in the case of a robotic system inter-
 acting with humans. And this would mean that a designer would have to take all possible failure
 situations into account when modeling the behavior. Even more important, it is often impossible
 for a robot to decide whether it is in a failure state. A typical failure state would be ?human has
 abandoned task?. But the fact that a person has left the room can mean anything from fetching a
 necessary object to abandoning the joint task. And modeling on a direct observation-level would
 lead to extremely complex models, which would have to take into account the situational context.
 Therefore, we propose to detect failures without an explicit model of failure states. Similar to
 the models of nominal behavior of technical systems of Wiliams et al. (2003), we define a poten-
 tial failure as all observations that are not compliant with the robot?s experience and knowledge
 about how the world and in particular humans should behave. Combining different evidences and
 considering the degree of divergence from the robot?s expectations prevents the robot from be-
 ing overcautious. Work on such an explicit use of expectations has been done by Minnen et al.
 (2003). They use extended stochastic grammars to introduce constraints into activity recognition
 to recognize a human playing ?Towers of Hanoi? from video data by analyzing object interaction
 events. They find that humans have strong prior expectations about actions in activities and techni-
 cal systems performing activity recognition can benefit from explicit models of expectations about
 high-level activities. Another example for the use of expectations in autonomous robots can be
 found in (Maier and Steinbach, 2010). Here Maier and Steinbach generate expectations to enable
 a mobile robot to detect unexpected scenes from video data. They use a dense map of images of
 the robot?s environment and comparisons of luminance and chrominance values of the images at
 different times. This enables them to detect changes in the robot?s environment and make assump-
 tions about the expectations and uncertainties in the environment. Kurup et al. 2012 propose that
 cognitive systems can benefit from constantly generating expectations and matching them against
 observations to be able to react when discrepancies are detected. They evaluate their system in the
 45
M. KARG AND A. KIRSCH
 ACT-R cognitive architecture and create expectations about the tracks of pedestrians that cross an
 intersection.
 These approaches work well for diagnosing faults in specific domains. But to our knowledge
 there is no approach that incorporates a combination of different models, including of humans, to
 enable robots performing diagnosis on cooperative everyday tasks.
 3. Expectations Framework
 We define an expectation as any piece of knowledge that describes normal mechanisms of the world,
 such as physical laws, robot behavior or habitual human behavior. The normality of a situation is
 assessed by comparing each expectation with the current situation and combining the single values
 of experience matching into one overall value.
 Our framework offers a common interface for all expectation and implement a validation method
 that returns a normality value between 0 and 1. A normality value of 1 means that the expectation
 is met perfectly in the current situation and a value of 0 means that the expectation is currently
 not at all fulfilled. Validation of expectations can be triggered at different points in time depending
 on the type of expectation. Imagine for example expectations about physical models in contrast to
 expectations about future locations of a person. While it makes sense to validate physical constraints
 continually, we can only validate the expectation that the human will go to the refrigerator next as
 soon as we know if he/she actually went to the refrigerator.
 All expectations are stored in an expectations pool that can be modified dynamically. Depending
 on the context and the progression of the situation, expectations can be added, adapted and removed.
 For example, when a user enters the room, the robot should add expectations about human behavior,
 which can be removed when the person has left again.
 Similar expectations can be grouped in categories, which are treated as composite objects with
 the same interface as single expectations. The normality value of a category is computed recursively
 from the normality values of all the experiences in that category. One such category could be
 ?Human Activity Expectations? to group several expectations about human activities and get an
 impression about how well the human actions observed so far generally fulfill the expectations of
 the robot. The introduction of expectation categories as well as the different types of expectations
 themselves strongly depend on the scenario of application.
 3.1 Expectation Models
 We already mentioned that the validation method of an expectation returns a value between 0 and
 1 to express the degree to which expectation is fulfilled. For our household robot application we
 differentiate between logical, temporal and probabilistic expectation representations. Logical ex-
 pectations consist of logical propositions and return 1 on validation if the proposition is evaluated
 to be true and 0 otherwise. ?Cup on table?, for example is a logical expectation that is either true or
 false. Temporal expectations extend logical expectations by a duration during which an expectation
 is expected to hold. As long as the duration during which the described fact holds is smaller than the
 expected duration, the validation of such an expectation returns 1. When the duration exceeds the
 expected duration, the normality value starts decreasing. We define the decrease of the normality
 46
SITUATION AWARENESS USING EXPECTATINOS
 linearly by default, although one might as well think of other functions depending on the intended
 application. One could for example think of modeling the time that a human is standing in front of a
 drawer when picking up objects from it as a temporal expectation. Probabilistic Expectations model
 situations where we expect multiple possibilities to be likely to happen, some possibly more prob-
 able than others. Probabilistic expectations have probability distributions over random variables
 instead of binary propositions assigned to them. The probability distribution assigns a probability
 to each event describing how likely we expect single events to occur. One might for example expect
 a human to clean the table after breakfast or directly leave the room without cleaning (when he is in
 a hurry) while the former is more likely than the latter. Validation of a probabilistic expectation will
 return a value between 0 and 1 corresponding to the probability that was assigned to the expectation
 that was observed to be true. The implications of such a validation are discussed in section 5.
 3.2 Expectation Validation
 We assume that the overall normality of a situation results from the combination of all expectations
 in the expectations pool. We construct a normality tree of expectations values by recursively vali-
 dating all known expectations. The normality value of an composite expectation class is computed
 by recursively validating and combining all single expectations contained in it. The framework
 leaves the method for combining these values open. For our trials we have so far simply averaged
 the values from the child nodes. But other methods such as weighted sums, thresholds or maximum
 are also conceivable. Also the depth of the hierarchy is not restricted, since classes of expectation
 classes can be defined. In our experiments we always used three layers as shown in the exemplary
 normality tree in Figure 2. Here, a situation is observed that mostly fulfills the robot?s expectations
 and thus leads to the high overall normality of 0.95. In the example, the robot?s expectations about
 the objects of interest are fulfilled since the table did not move and it has detected the cup on the
 table as expected, resulting in a normality value of 1 for the expectations about the objects. Also the
 validation of expectations about the human, consisting of two probabilistic expectations, return the
 values of 0.92 and 0.85, resulting in an average normality of 0.89 for the human normalities. The
 normality tree is updated as soon as one of the validated expectations changes its value so the robot
 will obtain an estimation of the normality of the current situation after every new event for which
 an expectation exists.
 The normality tree allows a limited, but general way of finding the cause of surprising events. So
 when the overall normality value drops below a certain threshold, the robot should consider taking
 some action. By traversing the normality tree it can diagnose which of its expectations has been
 violated and act accordingly, for example by further clarifying the situation or taking immediate
 action.
 4. Application: Expectations for a Household Robot
 To evaluate the applicability of our expectations framework we set up our framework in a simulated
 apartment scenario using different types of expectations. The simulated apartment environment
 47
M. KARG AND A. KIRSCH
 Overall
 Normality:
 0.95
 Object
 Normalities:
 1.0
 Human
 Normalities:
 0.89
 ......
 Table static:
 1.0
 Cup on table:
 1.0
 Next location:
 Table
 0.92
 Location
 duration: 4 s
 0.85
 ......
 Figure 2. A normality tree generated by the validation of different types of expectations. The highest level
 of the tree describes the overall normality of the situation whereas the lower levels allow for a more fine-
 grained description of the normalities of specific expectations. Blue nodes represent normalities generated
 from composite expectations, white nodes correspond to normalities from single expectations.
 includes a person and a PR2 robot. We use the realistic physical simulator MORSE 1 that includes
 a human avatar that can be controlled like in modern 3D computer games (Lemaignan et al., 2012)
 and enables a user to perform pick- and place actions, open doors, cupboards and drawers and
 operate switches in the simulated scenario. The robot is equipped with a simulated object detection
 sensor that returns the name of objects that are within the field of view of the robot as well as their
 positions. Furthermore, the robot is able to detect if doors are open or closed and the positions of
 humans, while it is unable to distinguish between different persons. The robot moves using the
 ROS-based 2D navigation ?move_base? 2. A video of the simulated scenario is available online 3.
 Our household robot ? having nothing else to do ? is guarding the apartment while the human
 is sleeping. It constantly patrols several locations to check if everything is as expected and it stays at
 each location for two seconds. We use five expectations in this scenario: The TV is expected to be
 in the living room and it is expected not to move. Humans are not expected to be anywhere else but
 in the bedroom or outside of the house. The entrance door of the apartment is locked and the robot
 navigation works normally. For the robot navigation, we generate a temporal expectation each time
 the robot starts moving to another waypoint and queries the navigation path planner for a new path
 plan. It then estimates the time to the location using the length of the path plan and its average speed
 and generates a temporal expectation. The expectation gets removed when a goal point is reached,
 meaning that during the two seconds where the robot is standing still at each location, only the four
 expectations mentioned before are active.
 In this experiment, we use the human avatar to simulate a burglar that enters the apartment
 during the robot?s patrol. He opens the entrance door, passes the hallway to get into the living
 room, picks up the TV and passes the hallway again, carrying the TV out of the apartment. The
 simulated burglary is illustrated in Figure 3. The average normality of the robot during this burglary
 plotted over time is shown in Figure 4. The normality drops as soon as the robot detects the open
 1. http://www.openrobots.org/wiki/morse
 2. http://wiki.ros.org/move_base
 3. http://vimeo.com/hcai/expectations
 48
SITUATION AWARENESS USING EXPECTATINOS
 Figure 3. The patrol robot scenario. Left: The robot detects the open door and the human in the hallway.
 Middle: The robot detects the TV moving and not being in the living room. Right: The burglar has left the
 apartment with the TV.
 Figure 4. Average normality of the patrol robot scenario plotted over time.
 entrance door and the human in the hallway. It drops even further when it detects that the TV is not
 in the living room any more and has moved since the last detection. However, the normality never
 drops to 0 since the navigation of the robot is still working correctly. When leaving the apartment
 with the TV, the burglar had to pass the robot really closely, getting into the sensor range of the
 simulated laser scanner thus the robot?s navigation is blocked due to security reasons. This caused
 the navigation to take longer as expected and the temporal expectation starts decreasing linearly
 towards the end.
 5. Discussion
 The proposed framework of expectations along with their validation is designed to offer a general,
 modular and knowledge-based way to monitor and react to unusual situations. We make no assump-
 tions about the knowledge used for representing expectations except that it can be used to quantify
 the normality of a situation. With the modular definition of expectations the framework can be used
 and extended for arbitrary applications. The combination of normality estimates in the normality
 tree provides some guidance towards suitable reactions. By exploiting knowledge that the robot may
 49
M. KARG AND A. KIRSCH
 also need for its decision making or state estimation, the recognition of unusual situations can be
 done by the robot itself, thus freeing engineers from the burden of considering every single situation
 the robot might encounter.
 One open question is how to get the necessary expectations. The number and types of expecta-
 tions a robot needs to identify abnormal situations largely depends on the type of tasks it is designed
 to perform. While for example a vacuuming robot can benefit from expectations about the humans?
 future locations in order to avoid vacuuming in close proximity to the human, expectations about
 the location of a cup will be of no interest for such a robot. In contrast, a more elaborate house-
 hold robot that is designed for household activities like setting the table, expectations about the
 location of a cup can prove useful. In our current application scenarios, we partly manually define
 expectations that seem useful in specific situations. We also use learned models about the expected
 behavior and future locations of humans performing different activities and give an example about
 how we can use learned models about human activities in combination with activity recognition to
 generate and validate expectations about human task performance. Another way of generating ex-
 pectations is by inference using common sense knowledge. Tenorth et al. (2010), for example, use
 Knowledge-Linked Semantic Object Maps in the knowledge processing system KnowRob to infer
 likely storage locations of objects in a kitchen which can be modeled as expectations.
 No matter how expectations are defined and represented, a common problem is the comparison
 of real-world sensor data with the expectations. For intelligent behavior, the most useful expecta-
 tions are defined on abstract levels, giving the robot some global insights in what is happening in
 the world. However, such abstract information is hard to extract reliably from sensor data. So com-
 paring the current activity of a user to some expected activity requires a stable recognition of human
 actions and also a representation of acceptable variations. Expectations that are represented on the
 level of sensor data are easier to evaluate, but it is more difficult to define a baseline expectation.
 Even though humans can easily recognize activities of other humans, they would hardly be able to
 predict every movement of a person. Thus, expectations defined on sensor values will almost always
 be inaccurate and thus expectations can hardly be met.
 The uncertainty of expectations also leads to an implicit weighting factor. When using prob-
 abilistic expectations, the normality factor can at most be the probability of the most likely event.
 So when the robot considers several next actions of the user as equally likely, the normality should
 intuitively be 1 if one of those actions is observed. However, because of the uncertainty in the
 expectation, the normality value will get a small value.
 The choice of the combination function for normality values in general is an open question
 and it is not clear whether our proposed representation in the normality tree is enough to cover
 all necessary interactions of normality values. We chose the average as the combination function,
 which can be justified by the tallying heuristic of humans, where weights are ignored for decision
 making (Gigerenzer and Gaissmaier, 2011). However, in some cases weights may be necessary
 and some kind of non-linear combination may be appropriate. For example when expectations are
 violated that imply a state of danger, just averaging may mean that the overall normality value is
 not affected strongly enough. So when the robot detects smoke it should not wait until its overall
 averaged normality factor drops below a threshold, but act immediately. One way of achieving such
 a shortcut is by not only monitoring the overall normality value, but also single critical values.
 50
SITUATION AWARENESS USING EXPECTATINOS
 Our expectations framework only considers one aspect of goal-directed autonomy. It leaves
 open the question of what to do when expectations are not met. The normality tree offers a straight-
 forward way for diagnosis by identifying those branches of the tree that have a low normality value.
 One way of generating reactions would be to associate with each node in the normality tree a mod-
 ule in the robot program that is not only responsible for evaluating the normality, but also to offer
 actions for re-establishing normality. For example, one such component may be a task planner. It
 can monitor whether the effects of its actions are indeed the expected ones. If this is not the case, it
 can generate a new plan that considers the changed circumstances.
 Such a local reconsideration of goals is not always possible. Sometimes the normality value
 drops because several modules detect small deviations from expectations, which would not be
 alarming in isolation, but which sum up to an overall low normality. For example if the user per-
 forms certain activities more slowly than normally and drops objects more often, the single devia-
 tions from the normality would not require any action. But the combination of these deviations may
 indicate some deterioration in the user?s health and should be reacted to. To deal with such complex
 cases, additional reasoning mechanisms and knowledge would be required.
 Even without being able to deal with all situations autonomously, the pure recognition of unex-
 pected situations can make robots safer and more reliable for realistic tasks. When not understand-
 ing the situation, a robot could prompt the user for help or access some external help desk, where
 operators could either provide the robot with additional knowledge to understand the situation or
 manually set robot goals, so that the situation will get back to normal.
 In all, our framework is a first step towards a general treatment of failures and unusual situations
 by autonomous robots. It can be easily integrated with traditional methods of failure recognition
 and handling, which is still the most robust way to handle frequent failures or dangerous situations.
 But our knowledge-based approach can cover the large variety of situations that an engineer would
 not think of or for which specific coding would be too costly.
 6. Conclusion
 This paper presented a promising first step towards a general framework that enables robot assis-
 tants to measure the normality of situations by validating a combination of different expectations.
 We showed how different types of expectations can be created, grouped and validated, giving a
 robot an impression of the overall normality of the situation as well as the normality of different
 categories or single expectations. In a simulated apartment, we showed how a robot can use a
 variety of expectations to detect situational anomalies and we illustrated how it can get a continu-
 ous impression of the normality of human behavior by observing motion tracking data in a sensor
 equipped kitchen. Future work includes other combinations of normality values as well as improved
 validation techniques for probabilistic expectations.
 Acknowledgements
 With the support of the Technische Universit?t M?nchen - Institute for Advanced Study, funded by
 the German Excellence Initiative, and the Bavarian Academy of Sciences and Humanities.
 51
M. KARG AND A. KIRSCH
 References
 Akhtar, N., and Kuestenmacher, A. 2011. Using naive physics for unknown external faults in
 robotics. In 22nd International Workshop on Principles of Diagnosis (DX-2011), volume 1, 23.
 Gigerenzer, G., and Gaissmaier, W. 2011. Heuristic decision making. Annual Review of Psychology
 62:451?482.
 Kelly, T.; Wang, Y.; Lafortune, S.; and Welsh, M. 2009. A formal foundation for failure avoidance
 and diagnosis. (HPL-2009-203).
 Kuhn, L.; Price, B.; de Kleer, J.; Do, M.; and Zhou, R. 2008. Pervasive diagnosis: Integration of
 active diagnosis into production plans. In proceedings of AAAI.
 Kunze, L.; Dolha, M. E.; Guzman, E.; and Beetz, M. 2011. Simulation-based temporal projection of
 everyday robot object manipulation. In Yolum; Tumer; Stone; and Sonenberg., eds., Proc. of the
 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011). Taipei, Taiwan:
 IFAAMAS.
 Kunze, L.; Tenorth, M.; and Beetz, M. 2010. Putting People?s Common Sense into Knowledge
 Bases of Household Robots. In 33rd Annual German Conference on Artificial Intelligence (KI
 2010), 151?159. Karlsruhe, Germany: Springer.
 Kurup, U.; Lebiere, C.; Stentz, A.; and Hebert, M. 2012. Using expectations to drive cognitive
 behavior. In AAAI.
 Lemaignan, S.; G., E.; Karg, M.; Mainprice, M.; Kirsch, A.; and Alami, R. 2012. Human-robot
 interaction in the morse simulator. In Proceedings of the 2012 Human-Robot Interaction Confer-
 ence (late breaking report).
 Maier, W., and Steinbach, E. 2010. A probabilistic appearance representation and its application to
 surprise detection in cognitive robots. IEEE Transactions on Autonomous Mental Development
 Vol. 2, No. 4, pp. 267 - 281.
 Minnen, D.; Essa, I.; and Starner, T. 2003. Expectation grammars: Leveraging high-level expecta-
 tions for activity recognition. In Computer Vision and Pattern Recognition, 2003. Proceedings.
 2003 IEEE Computer Society Conference on, volume 2, II?626. IEEE.
 Molineaux, M.; Klenk, M.; and Aha, D. 2010. Goal-driven autonomy in a navy strategy simulation.
 In AAAI Conference on Artificial Intelligence.
 M?senlechner, L., and Beetz, M. 2013. Fast temporal projection using accurate physics-based
 geometric reasoning. In IEEE International Conference on Robotics and Automation (ICRA).
 Steinbauer, G., and Wotawa, F. 2005. Detecting and locating faults in the control software of au-
 tonomous mobile robots. In 19th International Joint Conference on Artificial Intelligence (IJCAI-
 05), 1742?1743. Citeseer.
 Struss, P. 2008. Model-based problem solving. 395?465.
 Tenorth, M.; Kunze, L.; Jain, D.; and Beetz, M. 2010. KNOWROB-MAP ? Knowledge-Linked Se-
 mantic Object Maps. In Proceedings of 2010 IEEE-RAS International Conference on Humanoid
 Robots.
 Williams, B.; Ingham, M.; Chung, S.; Elliott, P.; Hofbaur, M.; and Sullivan, G. 2003. Model-based
 programming of fault-aware systems. AI Magazine 24(4):61.
 52
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning
 HALTER: Hierarchical Abstraction Learning via Task and Event
 Regression
 Ugur Kuter UKUTER@SIFT.NET
 SIFT, LLC, 211 North 1st Street, Suite 300, Minneapolis, MN 55401-2078
 Hector Munoz-Avila MUNOZ@LEHIGH.EDU
 Lehigh University, Dept. of Computer Science & Engineering, 19 Memorial Drive West, Bethle-
 hem, PA 18015-3084
 Abstract
 In this paper we present HALTER, an algorithm capable of learning task-subtasks from input
 action traces. HALTER iteratively learns how task subtraces can be folded to achieve tasks of
 increasing level of abstraction. HALTER also learns applicability conditions to determine when
 a task can be decomposed into a task subtrace. We examine the expressiveness of HALTER and
 observe this is the first HTN learning algorithm capable of learning nonregular languages.
 1. Introduction
 One of the basic questions agents face when interacting in an environment is the goal formulation
 question: what goals should it try to achieve next to achieve some over-arching goals? the an-
 swer depends on multiple factors such as the underlying planning paradigm (i.e., if the planner is
 totally-ordered it needs to consider how achieving goals might interact with the current partial plan
 generated) and the overarching goals or objectives (i.e., whether achieving a goal will benefit the
 overall goals or objectives). Planning paradigms offer a variety of answers to the question. For
 example, heuristic planning techniques may estimate the effort to reach a goal state from a state
 achieving an intermediate goal (i.e., selecting the state that has less estimated effort). Landmark
 planning will identify intermediate facts that must be fulfilled regardless of the solution (e.g., going
 through a choke point when navigating out of a maze).
 This paper presents a procedure for learning hierarchical task networks (HTNs). Hierarchical
 task network (HTN) planning side-steps reasoning about concrete goals and instead reasons on how
 to achieve tasks. Plans are generated in a top-down manner by decomposing complex tasks into
 simpler ones, which are themselves recursively decomposed until so-called primitive tasks corre-
 sponding to actions are generated. In our work, we view tasks as goals of increasing level of ab-
 straction. This view is not unusual: ICARUS (Langley & Choi, 2006a) and works combining HTN
 and STRIPS planning (Shivashankar et al., 2013; Kambhampati & Srivastava, 1995) all use HTN
 planning formalisms where the tasks are STRIPS goals. Unlike STRIPS planners where the planner
 automatically finds the goal-subgoal relations as part of the planning process, in HTN planning?s
 users must define these goal-subgoal relations manually. Hence, we view our work on learning
 53
U. KUTER AND H. MUNOZ-AVILA
 HTNs in the same vein as existing work on learning goal formulation knowledge (e.g., (K?nik &
 Laird, 2006; Jaidee, Mu?oz-Avila, & Aha, 2011)).
 HTN learning has been a recurrent research topic. There are two camps for HTN learning
 research: those that use existing HTNs and generalize them to be reused in different situations and
 those that learn the HTN structures from scratch (i.e., from input plans). The former is motivated by
 domains such as project planning where the project plans, i.e., hierarchy of steps that were followed
 to organize an event, are readily available. The latter is motivated by domains where only the
 actions are available but not the hierarchy that could lead to generate them. We will expand on this
 discussion in the related work section.
 We present HALTER (Hierarchical Abstraction Learning via Task and Event Regression).
 HALTER bridges these two camps. It can exploit existing HTNs, when available, to generalize
 and reuse in new situations. But it can also learn new task decompositions, not existing in any pre-
 viously seen HTN, to acquire new knowledge that can be used to generate plans which cannot be
 generated with the current HTN knowledge.
 2. Basics
 We use the usual definitions for HTN planning as in Chapter 11 of (Ghallab, Nau, & Traverso, 2004).
 A state s is a collection of ground atoms. A planning operator is a 3-tuple o = (h; pre; del; add),
 where h (the head of the operator) is a logical expression of the form (n arg1 : : : argk) such that n
 is a symbol denoting the name of the operator and each argument argi is either a logical variable or
 constant symbol. The preconditions, delete list and add list of the planning operator, pre, del, and
 add respectively, are logical formulas over literals.
 An action a is a ground instance of a planning operator. An action is applicable to a state s if its
 preconditions hold in that state. The result of applying a to s is a new state  (s; a) = (sndel)[add.
 A plan  is a sequence of actions.
 A task is a symbolic representation of an activity in the world, formalized as an expression of
 the form (t arg1 : : : argk) where t is a symbol denoting the name of the activity and each argi is
 either a variable or a constant symbol. A task can be either primitive or nonprimitive. A primitive
 task corresponds to the head of a planning operator and denotes an instantaneous action that can be
 directly executed in the world. A nonprimitive task cannot be directly executed; instead, it needs to
 be decomposed into simpler tasks until primitive ones are reached.
 Each nonprimitive task t is associated with two task indicators, startt and endt, indicating the
 starting and ending points of the nonprimitive task t in a plan trace. A primitive task trace is a
 sequence ht1; t2; : : : ; tki where each ti is either a primitive task or a task indicator. A nonprimitive
 task trace may contain primitive and nonprimitive tasks. If the starting indicator for a task t occurs
 in a task trace, its ending task indicator must appear later in the task trace. These task indicators are
 primitive tasks but do not reflect any action in the domain.
 We assume a em failure in a task trace is designated by identifying a primitive task (i.e., an
 action) as failed in that trace. There are multiple reasons why an action in a trace fails. Potential
 reasons for action to fail include: (1) the preconditions of the action are not applicable in the state
 resulting after the previous action in the trace was applied, and (2) applying the action sends the
 54
HALTER
 remaining actions in the trace towards a state that will not fulfill the goals. An example of the
 latter is a navigation domain where the action turns a vehicle in the wrong direction and the vehicle
 ends in a different location from the destination location. Had the action been to turn in a different
 direction, the vehicle would have reached the destination by following the remaining actions in the
 trace.
 More formally, we say that given a set of goals, g and a primitive task trace  ,  is a failed trace
 if executing  from the state s0 yields a state that doesn?t satisfy one or more of the goals in g.
 In this paper, we restrict ourselves to the Ordered Task Decomposition formalism of HTN plan-
 ning (Nau et al., 1999). In this formalism, tasks are executed in the order they are achieved. An
 HTN method is a procedure that describes how to decompose nonprimitive tasks into simpler ones.
 Formally, a method is a triple m = (h; pre; subtasks), where h is a nonprimitive task (the head
 of the method), pre is a logical formula denoting the preconditions of the method, and subtasks
 is a totally-ordered sequence of subtasks. A method m is applicable to a state s and task t if the
 head h of the method matches t and the preconditions of the method are satisfied in s. The result of
 applying a method m to a state s and task t is the state s and sequence of subtasks.
 An HTN planning problem is a 4-tuple P h = (T ; s0; T;O;M), where T is the finite set of
 all possible tasks, s0 is the initial state, T is the initial sequence of tasks, and O and M are sets of
 planning operators and methods respectively. A solution for the HTN planning problem P h is a plan
 (i.e., a sequence of actions)  that, when executed in the initial state, performs the desired initial
 tasks T . If there is a solution for the HTN planning problem P h, then P h is solvable.
 A learning example is a tuple of the form (s0;  ; g) where s0 is a state,  is a primitive task
 trace, and g is a set of goals, such that  (s0;  ) j= g. An HTN learning problem is a 4-tuple
 P = (T;O;M;L), where T is the finite set of tasks and task indicators, and O and M are sets of
 planning operators and (possibly empty) initial set of methods respectively. L is the set of learning
 examples.
 A solution to an HTN learning problem P = (T;O;M;L) is a set M 0 of methods such that
 M  M 0 and for every state s and goal g, any plan  generated by a provable correct HTN planner
 is a solution for (s; T;O;M 0) and  (s;  ) j= g.
 3. Algorithms
 HALTER is an incremental learning algorithm that produces a knowledge base of HTN methods
 from an input solution plan to a classical planning problem and successively updates its knowl-
 edge base when presented with solutions to new classical planning problems in the same planning
 domain.
 The basis for the high-level HALTER algorithm is a variant of the well-known chart parsing
 technique (Charniak, Goldwater, & Johnson, 1998). A chart parser is a type of parser suitable for
 ambiguous grammars (including grammars of natural languages). It uses a dynamic programming
 approach - partial hypothesized results are stored in a structure called a chart and can be re-used.
 This eliminates backtracking and prevents a combinatorial explosion in the search space since the
 parser does not generate the same grammar rules multiple times in different parts of the search
 space.
 55
U. KUTER AND H. MUNOZ-AVILA
 Algorithm 1: A high-level description of the HALTER learning procedure.
 Procedure HALTER(T ; O;M;L)1
 begin2
 foreach learning example (s;  ; g) 2 L do3
 plan_tree Learn_Structure(T ; s;  ; g)4
 M 0  Learn_HTN_Methods(plan_tree; s; g;M)5
 M  Inductively_Generalize(M 0)6
 returnM7
 end8
 Algorithm 2: A high-level description of structure learning in HALTER.
 Procedure LEARN_STRUCTURE(T; s;  ; g)1
 begin2
 plan_tree ;3
 task_trace  4
 loop until no new task can be learned5
 X  ;6
 foreach task in T do7
 children find children of task in task_trace8
 insert (task; children) into X9
 task_trace X10
 insert task_trace into plan_tree11
 return plan_tree12
 end13
 Algorithm 1 shows a high-level description of the HALTER algorithm. For each learning
 example given to the algorithm, HALTER first learns a task hierarchy from that learning example.
 The task hierarchy for the example includes all of the primitive tasks that are specified by the task
 trace of the example, the nonprimitive tasks from T that can be inferred from the example, and the
 parent-child relationships among those nonprimitive and primitive tasks.
 Algorithm 2 shows a high-level description of structure learning in HALTER that produces
 a task hierarchy from a learning example. In the initial iteration, the procedure starts from the
 primitive task trace and infers all of the nonprimitive tasks such that all of the children of each
 such nonprimitive task is in the task trace. Then, the algorithm replaces those children with their
 nonprimitive parents. This creates the next task trace from which the algorithm repeats the above
 process. The learning process terminates when there is no new nonprimitive task for which the
 algorithm can learn children.
 Once the task hierarchy that corresponds to input learning example is learned, HALTER uses
 this task hierarchy in order to learn new HTN methods. The procedures for learning HTN methods
 depends on whether the learning example was positive or negative; thus we discuss each separately
 in the subsequent sections.
 56
HALTER
 HALTER uses a particular type of chart parsing for learning HTNs. It infers task hierarchies
 from the trace, in a layer by layer fashion. Given a task trace (see Algorithm 2): ht1; t2; : : : ; tki, it
 performs the following steps:
  Line 7 of Algorithm 2 replaces subtraces in the task layer ht1; t2; : : : ; tki with their parent
 tasks. This creates a new task trace ht01; : : : ; t
 0
 ni (with n  k) where each t
 0 is either one of
 tasks tk or is parent task for a subtrace. That is, by replacing a subtrace hstartt0 ; : : : ; endt0i
 with t0. Hence, ht01; : : : ; t
 0
 ni may contain both primitive and nonprimitive tasks.
  HALTER recursively repeats the task replacement process with ht01; : : : ; t
 0
 ni. The structural
 learning process terminates when pre-specified top-level tasks are reached, or no new non-
 primitive tasks can be inferred.
 The hierarchical task structure inferred is a tree. Each interior node in the tree is a nonprimitive
 task t0 and its children are the tasks in the subtrace that t0 folds. The leaves are the primitive tasks
 in the input trace ht1; t2; : : : ; tki.
 To learn the set of preconditions P of a method, HALTER uses goal regression over a task
 trace ht01; : : : ; t
 0
 ni by recursively constructing P , which is initially empty. Starting from the last task
 (i.e., k = n) in the trace and ending in the first task (i.e., k = 1), for each task t0k it removes from P
 any effect added by t0k and adds to P the preconditions of t
 0
 k.
 Once HALTER learns preconditions and effects for each nonprimitive task in the hierarchy, it
 generates HTN methods from them. At this point, however, the HTN methods generated for each
 nonprimitive task are still ground; i.e., they only contain planning domain objects as arguments for
 the task, its subtasks and for the precondition predicates. HALTER uses similar inductive gener-
 alization techniques as in HTN-Maker (Hogg, Mu?oz-Avila, & Kuter, 2008) in order to generalize
 the objects into variable symbols, while ensuring the soundness of the method.
 Finally, the algorithm returns the set of the generalized methods as the HTN library learned
 from the input task trace.
 As mentioned above, HALTER can also learn HTN methods from failed task traces. Given a
 learning example (s0;  ; g),  is a failed trace if executing  from the state s0 yields a state that
 doesn?t satisfy one or more of the goals in g. The learning process in HALTER assumes an action
 in the trace has been marked as a failed action. This is used to learn HTN methods that ensure that
 such a failure will not happen in a planning episode.
 We assume that if an action fails, then any action that contributes to the application of that action
 in the particular state it fails is a potential cause for the failure. In other words, if a fails, and there
 is another action a0 in the trace that appears before a and eff(a?) j= pre(a), then a0 is a potential
 cause for the failure of a. Starting from the failed action, HALTER performs a variant of goal
 regression in order to find the earliest possible action a0 that is found to be a cause for the failure
 in the trace. Then, the algorithm uses the task hierarchy learned from successful traces in order
 to identify the ancestor tasks of the action a0 in the method library. The ancestor tasks include an
 immediate parent of a0 in the task hierarchy, the parents of that immediate parent, and so on. For
 each such ancestor, HALTER learns additional preconditions that captures the reasons why that
 task failed. For example, suppose the failed action has a precondition p. Then the algorithm infers
 57
U. KUTER AND H. MUNOZ-AVILA
 a new precondition of the form (not p) for the parent task. This additional precondition ensures that
 the methods for that parent task are not going to be applied in states where p is true, preventing the
 failure from happening.
 Once the methods from failed traces are learned, they are added to the method library and the
 algorithm returns the library.
 4. Preliminary Results and Discussion
 In this section, we discuss some of the properties of the HALTER learning algorithm.
 We define a classical planning problem as a 3-tuple P c = (s0; g; O), where s0 is the initial state,
 g is the goals represented as a conjunction of logical atoms, and O is a set of planning operators
 defined in Section 2. A solution for the classical planning problem P c is a plan  = ha1; : : : ; aki
 such that the state s0 =  ( (: : : ; ( (s0; a1); a2); : : :); ak) satisfies the goals g.
 Suppose a set of goal atoms g are specified for some task t. Then, the HTN-equivalent
 planning problem to a classical planning problem P c = (s0; g; O) is an HTN planning problem
 P h = (T ; s0; ftg; O;M) such that M is a set of HTN methods.
 Proposition 1. Let O be a set of planning operators for a planning domain. Suppose HALTER is
 given a set of learning examples, and it produces a set M of HTN methods. Then, for any classical
 planning problem P c in the domain, a solution  produced by a sound HTN planning algorithm on
 the HTN-equivalent problem P h using the methods inM will be a solution to the classical planning
 problem.
 A comparison with HALTER and its predecessor HTN-Maker is in order. Unlike HTN-
 Maker (Hogg, Mu?oz-Avila, & Kuter, 2008), HALTER can learn both from successful and failed
 traces (HTN-Maker only learns from positive examples). Furthermore, HALTER does not require
 semantically-annotated task definitions required by HTN-Maker. This enables HALTER to learn
 more expressive HTNs compared to those learned by HTN-Maker, whose HTNs can express regular
 expressions only. The same holds for ICARUS since the tasks are goals, hence its task decomposi-
 tion structure is equivalent to STRIPS sub-goaling.
 In HTN-Maker, an annotated task t describes the preconditions that must hold in the state prior
 to pursuing to achieve t and the effects in the state after achieving t. Hence, the HTNs learned by
 HTN-Maker are strictly equivalent to STRIPS planning. Indeed, HTN-Maker can only learn task
 structures equivalent to regular languages (i.e., right recursive methods). In contrast, HALTER can
 learn structures that are equivalent to context-free grammars. To see this, assume the following two
 traces are given as input to HALTER (t is a nonprimitive task and t1, t2, t3 and t4 are primitive
 tasks):
 hstartt; t2; t3; endti
 hstartt; t1; t2; t3; t4; endti
 From the first trace HALTER learns a method decomposing t into ht2; t3i. From the sec-
 ond trace it will learn a second method decomposing t into ht1; t; t4i. This means that the second
 method?s structure captures the context-free grammar rule: t ! ht1; t; t4i. Hence, repeated appli-
 cation of the method will decompose t into ht1n; t; t4ni for an arbitrary n which is a non-regular
 58
HALTER
 expression. It is impossible to generate such non-regular expression with STRIPS planning. The
 following theorem states this result.
 Proposition 2. HALTER learns HTN methods that can be used to generate non-regular expres-
 sions.
 The ability to learn plans exhibiting such non-regular expressions is important to represent many
 realistic situations. A simple situation is to plan an NPC in a video game to repeatedly patrol a
 circuit of locations (e.g., starting and ending in the same location) and to break this patrol process
 when a particular situation occurs (e.g., the NPC sees an intruder, in which case it sounds an alarm).
 Another example is planning to control a robotic arm that must distribute some chemical across
 some containers of equal but not pre-determined size. A third example, is one where we need to
 plan a gate in a network that distributes messages uniformly across a number of predefined channels
 until a certain message (e.g., a terminate message) is received that closes the gate.
 5. Related Work
 Goal-Driven Autonomy (GDA) is a reflective model of goal reasoning that controls the focus of
 an agent?s planning activities by dynamically resolving unexpected discrepancies in the world state
 (Mu?oz-Avila et al., 2010; Molineaux, Klenk, & Aha, 2010). Current research on goals in GDA are
 mostly restricted to STRIPS goals representations in which the goal is either a desired state or an
 atom in the state. However, ARTUE and variants use HTN representations (Molineaux, Klenk, &
 Aha, 2010; Powell, Molineaux, & Aha, 2011).
 One difficulty in GDA research is for the developer the need to manually specify the GDA
 elements such as actions? expectations, explanations for discrepancies between the action?s expec-
 tations and their actual outcomes, and goal formulation knowledge. Research has been conducted
 to learn these elements. Notably, Jaidee et al (2011) reports on a system learning some GDA ele-
 ments. It uses STRIPS representations for the GDA elements. For example, the expectation are the
 collection of atoms resulting from applying an action (Jaidee, Mu?oz-Avila, & Aha, 2011). Another
 notable system, reported by Weber and Matheas (2012) learn all the main GDA elements although
 this is restricted to a vector of state-value representations and a representational bias towards in-
 creasing these value (Weber, Mateas, & Jhala, 2012). For example, values represent the number of
 the GDA agent?s own units in a combat-based game; actions are expected to increase the number of
 these units or at least not to decrease them. Our work will enable learning of HTNs that could be
 used by GDA researchers.
 Hierarchical decompositions representations, such as HTNs, in which complex tasks are decom-
 posed into simpler ones are a common representation paradigm in many cognitive agent architec-
 tures (Langley, Cummings, & Shapiro, 2004). SOAR uses a representation of operator hierarchies to
 capture complex interrelation between goals. Hierarchies in SOAR are used to fill gaps in SOAR?s
 production rule knowledge. Abstract operators are seen as subgoals that enables the system to move
 forward towards achieving its goals (Laird, 2012). ICARUS follows a converse idea: it uses HTNs to
 generate plans and whenever it finds gaps in its HTN knowledge, it falls back to standard (STRIPS)
 planning to fill this gap (Langley & Choi, 2006b). LIGHT uses a similar mechanism to ICARUS
 59
U. KUTER AND H. MUNOZ-AVILA
 but enables indexing goals (Nejati, Langley, & Konik, 2006). ACT-R uses production composition
 to represent hierarchical knowledge (Anderson, 1993); at the bottom level of the hierarchy, when
 production rules are triggered they generate new goals to achieve. Applicability conditions are eval-
 uated against the current goals the system is trying to achieve. Hence, at the bottom level, triggering
 production rules is reminiscent of backward goal chaining. At higher levels in the hierarchy, trigger-
 ing production rules generates new conditions, which in turn can be used to trigger other production
 rules. Hence, high level production rules capture knowledge about complex relations between the
 goals. The Companion cognitive architecture aims at developing agents that assist humans in their
 problem solving efforts (Forbus, Klenk, & Hinrichs, 2008). As such, its HTN planning process is
 built on top of a truth-maintenance system that enables Companion to automatically build expla-
 nations about the chain of inferences that led to a plan. This enables Companion to justify to the
 user the reasons for decisions made. It also maintains and/or branches that enable exploration of
 alternative decompositions when needed. The Disciple cognitive architecture uses generalization
 hierarchies to represent concepts in increasingly high level of generalization (G. & Boicu, 2008).
 Problem-solving is done in a divide-and-conquer manner by decomposing the problems into simpler
 ones although it doesn?t use HTN or other similar hierarchical planning techniques.
 HTN planning origins started with seminal work in the systems NOAH (Sacerdoti, 1977) and
 NONLIN (Tate, 1977). Part of the recurrent interest in HTN planning is due to the fact that its
 semantics are well understood (Erol, Hendler, & Nau, 1994), the simplicity of its representation
 (Nau et al., 1999) and recurrent reports on its applications (Wilkins & Myers, 1998; Currie & Tate,
 1985; Nau et al., 2005).
 Learning HTNs have been a recurrent research topic over the years. Some research focuses on
 generalizing given hierarchies. Examples of situations where such hierarchies are available include
 work-breakdown structures, which represent a hierarchy of steps to produce a one-of-a-kind project
 and can be edited with commercial-off-the-shelf software such as Microsoft Project. A system in
 this camp is CAMEL (Ilghami et al., 2005). CAMEL receives as input plans annotated with the
 intermediate states between the actions and the HTNs that were used to generate these plans. It
 propagates the intermediate states upwards in the hierarchy to infer the applicability conditions of
 task decompositions given HTNs. Since the same task decomposition might occur multiples times
 and the applicability conditions might not be the same, it uses version spaces to obtain the most
 general conditions that are consistent with the examples generated so far. To ensure convergence to
 a single set of conditions, it assumes that negative examples are also given. That is, incorrect ap-
 plicability conditions for the task decompositions. Another algorithm of this kind is DINCAT (Xu
 & Mu?oz-Avila, 2005). Unlike CAMEL, it doesn?t assume that negative examples are given. DIN-
 CAT uses inductive generalization techniques to infer generalized conditions, but unlike CAMEL,
 it can?t guarantee that the learned conditions are the most specific that are consistent with the given
 examples. Unlike these systems, HALTER can also learn new hierarchical decompositions.
 Other HTN learning systems learn the hierarchical decomposition. One of the early systems
 in this category is X-LEARN (Reddy & Tadepalli, 1997). It uses bootstrap learning by learning
 increasingly more complex hierarchies from carefully ordered training examples in the form of
 plans. ICARUS also learn the task decomposition knowledge (Langley & Choi, 2006b). ICARUS
 identifies gaps in the HTN knowledge and uses given concepts, represented as Horn clauses, to learn
 60
HALTER
 hierarchies of skills filling this gap. The way this works is by attempting to generate a plan using
 the current HTN knowledge. When the plan can?t be generated, ICARUS uses STRIPS planning
 techniques to generate the plan. It then carefully analyzes the resulting plan to learn new task
 decompositions. In ICARUS the tasks correspond directly to goals and, hence, problems solved by
 using the HTNs can also be generated by using STRIPS planning. However, using HTN planning
 provides two advantages: (1) it leads to speed-up in problem solving and (2) the HTNs provide a
 rationale explaining why the plans are generated. Another system in this camp is HTN-MAKER
 (Hogg, Mu?oz-Avila, & Kuter, 2008). Like LIGHT, HTN-MAKER follows a bottom-up process
 to learn HTNs from given plans. But unlike LIGHT. HTN-MAKER can learn HTN knowledge
 that can solve problems not expressible in STRIPS planning. Unlike these systems, HALTER can
 also exploit given hierarchical decompositions when available to learn more general applicability
 conditions.
 6. Final Remarks and Future Work
 In this paper, we presented HALTER, an HTN learning algorithm that learns goal formulation
 knowledge by indicating the subtasks that must be achieved in other to achieve tasks, which are
 goals of higher level of abstraction. HALTER carefully analyzes how tasks where achieved in
 input traces and pinpoints parent-child task relations when a subtrace of the former subsumes a
 subtrace of the latter. HALTER can extract such relations over multiple levels and in any order
 of the traces. As a result, it can learn hierarchical structures that are strictly more expressive than
 regular languages. This is the first HTN learner that is capable of learning such expressive HTNs.
 For the near future work, we will like to conduct experiments where we compare the perfor-
 mance of the HTN planner SHOP when using the HTN methods learned with HALTER versus
 when using the the HTN methods learned by an state-of-the-art HTN learner such as HTN-Maker
 (Hogg, Mu?oz-Avila, & Kuter, 2008). The experiments will also include comparisons of HTNs
 learned by HALTER and those used by SHOP2 in the past International Planning Competitions.
 The latter HTN libraries are written by human SHOP2 users, and thus, would provide a benchmark
 to evaluate how efficient HALTER can learn HTNs compared to those written by human experts.
 In these experiments, we are interested in comparing (1) the speed of convergence towards learning
 a domain (e.g., the percentage of problems that each can solve), (2) the running time speed of SHOP
 when solving these problems, and (3) the quality of the solutions obtained (measured in terms of
 the plan length).
 Another research direction relates to work integrating HTN and STRIPS planning (Shivashankar
 et al., 2013; Kambhampati & Srivastava, 1995). These works do not specify hierarchies in terms of
 nonprimitive or primitive tasks, but in terms of goals. We are interested in generalizing HALTER
 to learn goal hierarchies directly. Learning goal hierarchies has been studied before. For example,
 (K?nik & Laird, 2006) uses inductive logic programming techniques as opposed to HALTER?s
 explanation-based parsing techniques such as chart parsing and goal regression. It would be inter-
 esting to generalize HALTER for goal hierarchies and compare both approaches both theoretically
 and experimentally.
 61
U. KUTER AND H. MUNOZ-AVILA
 Acknowledgements
 This research is funded in part by Contract N00014-12-C-0239 with Office of Naval Research
 (ONR). The views expressed are those of the authors and do not reflect the official policy or po-
 sition of the U.S. Government.
 References
 Anderson, J. R. (1993). Rules of the mind. New York.
 Charniak, E., Goldwater, S., & Johnson, M. (1998). Edge-based best-first chart parsing, 127?133.
 Association for Computational Linguistics.
 Currie, K., & Tate, A. (1985). O-plan?control in the open planning architecture. In Expert systems,
 225?240.
 Erol, K., Hendler, J., & Nau, D. S. (1994). HTN planning: Complexity and expressivity. National
 Conference on Artificial Intelligence (AAAI).
 Forbus, K., Klenk, M., & Hinrichs, T. (2008). Companion cognitive systems: Design goals and
 some lessons learned. In the AAAI Fall Symposium on Naturally Inspired Artificial Intelligence.
 Washington D.C.
 G., G. T., & Boicu, M. (2008). A guide for ontology development with disciple (Technical Report).
 Learning Agents Center, George Mason University.
 Ghallab, M., Nau, D. S., & Traverso, P. (2004). Automated Planning: Theory and Practice. Morgan
 Kaufmann.
 Hogg, C., Mu?oz-Avila, H., & Kuter, U. (2008). HTN-MAKER: Learning HTNs with minimal
 additional knowledge engineering required. Conference on Artificial Intelligence (AAAI) (pp.
 950?956). AAAI Press.
 Ilghami, O., Mu?oz-Avila, H., Nau, D. S., & Aha, D. W. (2005). Learning approximate precondi-
 tions for methods in hierarchical plans. International Conference on Machine Learning (ICML)
 (pp. 337?344). Bonn, Germany.
 Jaidee, U., Mu?oz-Avila, H., & Aha, D. W. (2011). Integrated Learning for Goal-Driven Autonomy.
 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume
 Volume Three (pp. 2450?2455).
 Kambhampati, S., & Srivastava, B. (1995). Universal classical planner: An algorithm for unifying
 state-space and plan-space planning. European Workshop on Planning (EWSP).
 K?nik, T., & Laird, J. E. (2006). Learning goal hierarchies from structured observations and expert
 annotations. Machine Learning, 64, 263?287.
 Laird, J. (2012). The soar cognitive architecture. MIT Press.
 Langley, P., & Choi, D. (2006a). Learning recursive control programs from problem solving. The
 Journal of Machine Learning Research, 7, 493?518.
 Langley, P., & Choi, D. (2006b). Learning recursive control programs from problem solving. Jour-
 nal of Machine Learning Research, 7, 493?518.
 62
HALTER
 Langley, P., Cummings, K., & Shapiro, D. (2004). Hierarchical skills and cognitive architectures.
 Proceedings of the Twenty-Sixth Annual Conference of the Cognitive Science Society.
 Molineaux, M., Klenk, M., & Aha, D. W. (2010). Goal-Driven Autonomy in a Navy Strategy
 Simulation. AAAI.
 Mu?oz-Avila, H., Jaidee, U., Aha, D., & Carter, E. (2010). Goal-Driven Autonomy with Case-Based
 Reasoning. In Case-based reasoning. research and development, 228?241. Springer.
 Nau, D. S., Au, T.-C., Ilghami, O., Kuter, U., Mu?oz-Avila, H., Murdock, J. W., Wu, D., & Yaman,
 F. (2005). Applications of SHOP and SHOP2. IEEE Intelligent Systems, 20, 34?41.
 Nau, D. S., Cao, Y., Lotem, A., & Mu?oz-Avila, H. (1999). SHOP: Simple hierarchical ordered
 planner. International Joint Conference on Artificial Intelligence (IJCAI) (pp. 968?973). Morgan
 Kaufmann.
 Nejati, N., Langley, P., & Konik, T. (2006). Learning hierarchical task networks by observation.
 International Conference on Machine Learning (ICML) (pp. 665?672).
 Powell, J., Molineaux, M., & Aha, D. W. (2011). Active and interactive discovery of goal selection
 knowledge. FLAIRS Conference.
 Reddy, C., & Tadepalli, P. (1997). Learning goal-decomposition rules using exercises. International
 Conference on Machine Learning (ICML).
 Sacerdoti, E. (1977). A structure for plans and behavior. American Elsevier.
 Shivashankar, V., Alford, R., Kuter, U., & Nau, D. (2013). The godel planning system: a more
 perfect union of domain-independent and hierarchical planning. Proceedings of the Twenty-Third
 international joint conference on Artificial Intelligence (pp. 2380?2386).
 Tate, A. (1977). Generating project networks. International Joint Conference on Artificial Intelli-
 gence (IJCAI) (pp. 888?893).
 Weber, B. G., Mateas, M., & Jhala, A. (2012). Learning from demonstration for goal-driven auton-
 omy. AAAI.
 Wilkins, D. E., & Myers, K. L. (1998). A multiagent planning architecture. International Confer-
 ence on AI Planning Systems (AIPS) (pp. 154?162).
 Xu, K., & Mu?oz-Avila, H. (2005). A domain-independent system for case-based task decomposi-
 tion without domain theories. National Conference on Artificial Intelligence (AAAI).
 63
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning 
 
 
 
 
Learning Models of Unknown Events 
 
Matthew Molineaux MATTHEW.MOLINEAUX@KNEXUSRESEARCH.COM 
Knexus Research Corporation, 9120 Beachway Lane, Springfield, VA 22153 
 
David W. Aha DAVID.AHA@NRL.NAVY.MIL 
Naval Research Laboratory (Code 5514), 4555 Overlook Avenue SW, Washington, DC 20375 USA 
Abstract 
Agents with incomplete models of their environment are likely to be surprised by it. For agents in 
immense environments that defy complete modeling, this represents an opportunity to learn. We 
investigate approaches for situated agents to detect surprise, discriminate among different forms of 
surprise, and ultimately hypothesize new models for the unknown events that surprised them. We 
instantiate these approaches in a new goal reasoning agent, FOOLMETWICE, and investigate how 
that agent performs in a simulated environment. In this case study, we found that FOOLMETWICE 
learn models that substantially improve its performance. 
1.  Introduction 
In most work on planning and reasoning, the world is assumed to be reasonably well-behaved, 
changing according to a known model (e.g., a policy, a fixed set of rules). In contrast to most 
other work, we relax the assumption that the agent has a complete and correct environment 
model. Thus, the environment can surprise our agent, meaning that an event can occur for which 
the agent lacks knowledge to predict or immediately recognize it. 
For example, surprises can occur due to incomplete knowledge of events and their locations. 
In the fictional Princess Bride (Goldman, 1973), the main characters entered a fire swamp with 
three types of threats (i.e., flame spurts, lightning sand, and rodents of unusual size) for which 
they had no prior model. They learned models of each type of threat after encountering it, 
allowing them to predict and prevent each threat type. Surprising realistic events can likewise 
occur while an agent monitors an environment?s changing dynamics: consider an autonomous 
underwater vehicle that detects an unexpected underwater oil plume for which it has no model. A 
typical default response might be to immediately surface (requiring hours) when surprised by a 
novel occurrence and report to an operator. However, if the vehicle first learned a model of the 
spreading plume, it could react to the projected effects, perhaps by identifying the plume's source. 
Surprises are interesting because they occur frequently in real-world environments and cause 
failures in real-world robots. The ability to respond autonomously to failures would allow such 
robots to act for longer periods without oversight. While some surprises can be avoided by 
increased knowledge engineering, it?s often impractical due to high environment variance, or 
events unknown to the knowledge engineers. Therefore, we instead focus on the task of learning 
from surprises. In this paper, we use FOIL (Quinlan, 1990) to learn environment models and 
demonstrate its utility to control a simulated mobile robot. 
64
M. MOLINEAUX AND D. AHA 
 
In this paper, we introduce an approach for learning models of unknown events in response to 
surprises as part of a goal reasoning agent. Surprise detection and response are critical to goal 
reasoning (Klenk, Molineaux, & Aha, 2013), which includes the study of agents that dynamically 
modify what goals they pursue in response to unexpected situations. Our approach to unknown 
event learning is developed as an agent named FOOLMETWICE, an extension to the Autonomous 
Response to Unexpected Events (ARTUE) agent. These agents implement the Goal-Driven 
Autonomy (GDA) model of goal reasoning (Molineaux, Klenk, & Aha, 2010a), which entails 
monitoring the environment for surprises, explaining the cause of surprises, and resolving them 
through dynamic modification of goals. We begin by discussing related work in Section 2. We 
review the GDA model in Section 3 and present a formal description of explanations and 
surprises. In Section 4, we review GDA?s implementation in ARTUE and its extensions in the 
novel agent FOOLMETWICE, which extends ARTUE with the capability to learn event models 
using FOIL. Section 5 then describes its empirical evaluation. Our results support our research 
hypothesis,  which states that by learning event models, FOOLMETWICE can outperform ablations 
that cannot learn these models as measured by the time required to perform navigation tasks. 
Finally, we conclude in Section 7. 
2.  Related Work 
This paper extends our work on deriving explanations for surprises as detected by a GDA agent. 
Molineaux, Aha, and Kuter (2011) introduced DISCOVERHISTORY, an algorithm for discovering 
an explanation given a series of observations; it outputs an event history and a set of assumptions 
about the initial state. Rather than employing a set of enumerated assumptions, 
DISCOVERHISTORY enumerates which predicates are observable (when true). Assumptions can be 
made about the initial state value of any literal that is not observable. We showed that, given 
knowledge of event models, an agent that uses DISCOVERHISTORY could improve its prediction 
of future states. Molineaux, Kuter, and Klenk (2012) later described an extension of 
DISCOVERHISTORY that can increase an agent?s accuracy for generating state expectations in the 
context of replanning tasks. They also reported that, for one task, it significantly increased its goal 
achievement rate versus an ablation that does not perform explanation. Our current work 
addresses further the question of explanation in the absence of a complete model. Recent related 
work on abductive diagnosis includes that by Sohrabi, Baier, and McIlraith (2010) concerning the 
diagnosis of discrete-event systems using planning algorithms. This does not consider the 
challenges of further reasoning and autonomous execution based on diagnoses, as does ours. 
Gspandl et al. (2011) also conduct history-based diagnosis in an execution environment. Our 
work differs in that we focus on diagnosis of exogenous events rather than failed actions, and we 
compute diagnoses iteratively when new problems are found. 
Several other investigations have addressed the task of explaining surprises in the current 
state. Early work on SWALE (Leake, 1991) used surprises to drive a story understanding process 
that conducted goal-based explanation to achieve understanding goals. Weber, Mateas, and 
Jhala?s (2012) GDA agent learns explanations from expert demonstrations when a surprise is 
detected, where an explanation is a prediction of a future state obtained from executing an 
adversary?s actions. Hiatt, Khemlani, and Trafton (2012) introduce and instantiate a framework 
for Explanatory Reasoning to identify and explain surprises, where explanations are generated 
using a cognitively-plausible simulation process. Ramisinghe and Shen (2008) describe the 
65
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
Surprise-Based Learning process, in which an agent learns and refines its action models. These 
models are represented by qualitative rules that can be used to predict state changes and identify 
when surprises occur (i.e., when the rules? predictions fail). Nguyen and Leong (2009) introduce 
the Surprise Triggered Adaptive and Reactive (STAR) framework to dynamically learn and revise 
an agent?s models of its opponents? strategies in non-stationary game environments (i.e., 
opponent strategies can change over time). After an accumulated surprise threshold is exceeded, a 
STAR agent generates hypotheses to predict an opponent?s strategy, and will adopt a strategy if 
its prediction accuracy exceeds a different threshold. While these studies, like our own, concern 
agents that can recognize and (in most cases) respond to surprises, our contribution here is 
unique: we describe an algorithm for learning (and applying) environment models of unknown 
exogenous events (i.e., rather than action models for the agent or other agents).  
A substantial amount of research has focused on learning environment models such as action 
policies, opponent models, or task decomposition methods for planning (e.g., Zhuo et al., 2009). 
However, a variety of techniques have also been used to learn other types of models, and under 
different assumptions. For example, Bridewell et al. (2008) describe how Inductive Process 
Modeling techniques can be used to learn process models from time series data, and predict the 
trajectories of observable variables. Pang and Coghill (2010) instead survey methods for 
Qualitative Differential Equation (QDE) Model Learning (QML), which have been used to study 
real-world non-interactive dynamic systems. Reverse Plan Monitoring (Chernova, Crawford, & 
Veloso, 2005) can be used to automatically perform sensor calibration tasks by learning 
observation models during plan execution.  In contrast to these prior investigations, we consider 
the problem of obtaining models for use by a deliberative agent in subsequent prediction and 
planning in an execution environment. 
In model-free reinforcement learning (RL) (Sutton & Barto, 1998), agents are responsible for 
acquiring environment models for their immediate use. Our work diverges significantly from the 
RL framework in that it is goal-oriented rather than reward-driven, which allows frequent goal 
change without requiring significant re-learning of a policy. 
3.  Models 
Goal Reasoning is a model for online planning and execution in autonomous agents (Klenk et al., 
2013). As in our prior work, we focus on the Goal-Driven Autonomy (GDA) model of goal 
reasoning, which separates the planning process from procedures for goal formulation and 
management. Section 3.1 summarizes a minor extension of this model, Section 3.2 describes our 
formalism for plausible explanations, and Section 3.3 describes how to use these to explain 
anomalies. We describe GDA agent implementations in Section 4.  
3.1  Modeling Goal-Driven Autonomy 
Figure 1 illustrates how GDA extends Nau?s (2007) model of online planning. The GDA model 
expands and details the Controller, which interacts with a Planner and a State Transition System 
? (an execution environment). 
System ? is a tuple (S,A,E,O,?,?) with states S, actions A, exogenous events E, observations 
O, state transition function ?: S?(A?E)?S, and observation function ?: S?O. The transition 
function ? describes how an action?s execution (or an event?s occurrence) transforms the 
66
M. MOLINEAUX AND D. AHA 
 
environment from one state to another. The observation function ? describes what observation an 
agent will receive in a given state. We will use the term ?event? to refer to an exogenous event. 
The Planner receives as input a planning problem (M?,sc,gc), where M? is a model of ?, sc is 
the current state, and gc is the active goal, from the set of all possible goals G, that can be satisfied 
by some set of states Sg ? S. The Planner outputs (1) a plan pc, which is a sequence of actions  
Ac=[ac+1,?,ac+n], and (2) a corresponding sequence of expectations Xc=[xc+1,? xc+n], where each 
xi?Xc is the state expected to result after executing ai in Ac, and xc+n?gc. 
The Controller takes as input initial state s0, initial goal g0, and M?, and sends them to the 
Planner to generate plan p0 and expectations X0. The Controller forwards p0?s actions to ? for 
execution and processes the resulting observations, where ? also processes exogenous events.  
During plan execution, the Controller performs the following knowledge-intensive GDA 
tasks:  
Discrepancy detection: GDA detects unexpected events by comparing the observation 
obsc?O (received after action ac is executed) with expectation xc?X. If one or more 
discrepancies d?D (i.e., the set of possible discrepancies) are found, then explanation generation 
is performed.  
Explanation generation: Given the history of past actions [a1,?,an] and observations 
[obs0,?,obsc] and a discrepancy d?D, this task hypothesizes one or more explanations of d?s 
cause ? ? ?, the set of possible explanations. 
Goal formulation: Resolving a discrepancy may warrant a change in the current goal(s). If so, 
this task formulates a goal g?G in response to d, given also ? and obsc.  
Goal management: The formulation of a new goal may warrant its immediate focus and/or 
edits to the set of Pending Goals GP ? G. Given GP and new goal g?G, this task may update GP 
and then select the next goal g??GP to be given to the Planner. (It is possible that g=g?.) 
GDA makes no commitments to specific types of algorithms for the highlighted tasks (e.g., 
goal management may involve comprehensive goal transformations (Cox & Veloso, 1998)), and 
treats the Planner as a black box. 
Figure 1: Conceptual Model for Goal-Driven Autonomy (GDA) 
67
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
3.2  Modeling Explanations 
In this subsection, we present a detailed model of explanations useful for describing the 
explanation generation task. While this is not the only model of explanation compatible with 
explanation generation in GDA, it facilitates understanding of the DISCOVERHISTORY algorithm 
(Molineaux et al., 2012), which we use here, and, possibly, future algorithms as well. Under this 
model, explanations express statements about the occurrence and temporal ordering of a past 
sequence of observations, actions, and exogenous events. 
This model represents exogenous environmental effects as deterministic exogenous events 
that must occur whenever their preconditions are met. In contrast to other representations for 
exogenous effects, such as contingent action effects (Peot & Smith, 1992; Pryor & Collins, 1996) 
or external actions (Sohrabi et al., 2010), this has three advantages. First, prediction or diagnosis 
of the exact time of an event?s occurrence is possible, which reduces the set of potential 
explanations for a given sequence of observations. Second, exogenous events are a factored 
representation that allows effects to combine without an explosion in representation size. Finally, 
the multiplication of possible states is caused only by hidden information and never by a 
nondeterministic choice, which simplifies diagnosis. 
3.2.1  Events 
We assume several standard definitions from classical planning (Ghallab, Nau, & Traverso, 2004) 
for our model. Let P be the finite set of all propositions describing a planning environment, 
where a state assigns a value to each ? ? P. A planning environment is partially observable if an 
agent ? has access to the environment only through observations that do not cover the complete 
state. Let P??? ? P be the set of all propositions that ? will observe, where an observation 
associates a truth value with each ? ? P???. Let P???????P  be a set of hidden propositions that 
? cannot observe (e.g., the exact location of a robot that does not have a GPS contact). 
An event model is syntactically identical to a classical planning operator, comprising a tuple 
(name; preconds; effects), where name, the name of the event, preconds and 
effects, the preconditions and effects of the event, are sets of literals. We use effects? and 
effects+ to denote the negative and positive literals in effects, respectively. An event is a 
ground instance of an event model. We assume that an event always occurs immediately when all 
of its preconditions are met in the state. After each action, any events it triggers occur, followed 
by events they trigger, etc. When no more events occur, the agent receives a new observation. 
3.2.2  Explanations 
We formalize the agent?s knowledge about the changes in its environment as an explanation of 
the environment's history. We define a finite set of occurrence points T={?0, ?1, ?2,? , ??} and an 
ordering relation between two such points, denoted as ?1 ? ?2 , where ?1, ?2 ? T . 
Three types of occurrences exist. An observation occurrence is a pair (obs, ?), where obs 
is an observation and t is an occurrence point. An action occurrence is a pair (?, ?), where a is 
an action. Finally, an event occurrence is a pair (?, ?), where e is an event. Given an occurrence 
o, we define occ as a function such that occ(?) ? ?; that is, occ refers to the occurrence point t 
of any observation, action, or event. 
An execution history is a finite sequence of observations and actions 
obs0; ?1;obs1; ?2;? ; ??;obs?+1. An agent?s explanation of a state given an execution history 
68
M. MOLINEAUX AND D. AHA 
 
is a tuple ? = (?, ?) such that C is a finite set of occurrences that includes each obsi for ? =
 0,? , ? ? 1 and each action ?? for ? = 1,? , ? for some number ?. ? also includes zero or more 
event occurrences that happened according to that explanation. ? is a partial ordering over a 
subset of ?, described by ordering relations occ(??) ? occ(??) such that ??, ?? ? ?. As a 
shorthand, we will sometimes write ?? ? ?? if and only if occ(??) ? occ(??). 
We use the relations knownbefore(?, ?) and knownafter(?, ?) to refer to the value of a 
proposition ? before or after an occurrence ? ? ? occurs. Let ? be an action or event occurrence. 
Then, knownbefore(?, ?) is true iff ? ? preconds(?). Similarly, knownafter(?, ?) is 
true iff ? ? effects(?). If ? is an observation occurrence and ? ? obs, then both 
knownbefore(?, ?) and knownafter(?, ?) are true, and otherwise are false. 
An occurrence ? is relevant to a proposition ? if the following holds: 
relevant(?, ?) ?
 knownafter(?, ?) ?  knownafter(??, ?) ?
 knownbefore(?, ?) ? knownbefore(??, ?).
  
We also use the predicates prior(?, ?) and next(?, ?) to refer to the prior and next occurrence 
relevant to a proposition ?, where: 
prior(?, ?) = {???relevant(?, ??) ? ???? ?. ?.  relevant(?, ???) ? ?? ? ??? ? ?}. 
next(?, ?) = {???relevant(?, ??) ? ???? ?. ?.  relevant(?, ???) ? ? ? ??? ? ??}. 
3.2.3  Plausible Explanations 
The proximate cause of an event occurrence (?, ?) is an occurrence ? that satisfies the following 
three conditions with respect to some proposition ?:  
1. ? ? preconds(?) 
2. knownafter(?, ?) 
3. There is no other occurrence ??such that ? ? ?? ? (?, ?). 
Every event occurrence (?, ?), must have at least one proximate cause, so by condition 3, every 
event occurrence must occur immediately after its preconditions are satisfied. An inconsistency is 
a tuple (?, ?, ??) where ? and ?? are two occurrences in ? such that knownafter(??, ?), 
knownbefore(?, ??), and there is no other occurrence ??? such that ? ? ??? ? ?? ? ? and ? is 
relevant to ???. 
An explanation  ? = (?, ?) is plausible if and only if the following holds: 
1. There are no inconsistencies in ?. 
2. Every event occurrence (?, ?) ? ? has a proximate cause in ?. 
3. For every pair of simultaneous occurrences such that ?, ?? ? ? and occ(?) = occ(??), 
there may be no conflicts before or after. That is, for all ?, knownafter(?, ?)?
 ?knownafter(??, ??), and knownbefore(?, ?)? ?knownbefore(??, ??). 
4. If preconds(?) of an event ? are all satisfied at an occurrence point ?, ? is in ? at ?. 
3.3  Modeling Surprise 
We now give a precise definition of surprise as it affects various agents, in order to conscribe our 
task. Informally, we will say that surprise occurs when an observation contradicts an agent's 
expectations. In some cases, the observations also contradict an agent's model of the environment. 
69
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
It follows from this that an agent which neither generates expectations nor models the 
environment, such as a random or a greedy agent, cannot be surprised. However, disparate agents 
such as those that instantiate a cognitive architecture or a reinforcement learning agent, can be 
surprised. Recognizing when a surprise is caused by an environment model contradiction is 
necessary to correctly detect and model unknown events. 
3.3.1  Surprise as Contradiction of Expectations 
Formally, we denote the a priori expectations of a logical agent ? about the state of its 
environment at time ?, before making an observation, as expectations(?, ?), and its 
observations at time ? as observations(?, ?). Furthermore, if it has a model (or background 
theory) of its environment that relates expectations to observations, then we shall denote that 
theory as ?. In simple cases, expectations and observations may be contradictory assertions about 
the state, and ? may be empty. Given this, we define the condition of an agent being surprised by 
a contradiction to its expectations as follows: 
  surprisex(?, ?) ? expectations(?, ?) ? observations(?, ?) ? ? ? ?. (1) 
In terms of a GDA agent, we can describe expectations(?, ?) as ?? and 
observations(?, ?) as obst. While the GDA framework describes no semantics of logical 
entailment, the comparison process that takes place in discrepancy detection is sometimes 
equivalent to an entailment test. In particular, ARTUE?s discrepancy detection process 
(Molineaux et al., 2010) detects discrepancies precisely when it is surprised under this definition; 
every time ARTUE detects a discrepancy, it is surprised by a contradiction to its expectations. 
3.3.2  Surprise as Contradiction of an Environment Model 
In many agents, all surprises result from a contradiction of the environment model. This is 
because the agent?s expectations are a function of only prior observations and the environment 
model itself; since prior observations (by definition) cannot change, only the model can be wrong. 
This holds in particular for many agents that reason with uncertainty, such as those based on 
Partially Observable Markov Decision Processes. Because these agents? beliefs are so all-
 encompassing, they are rarely contradicted and cannot be surprised unless the model itself is 
wrong. On the opposite extreme, many agents assume that uncertainty is entirely absent. Their 
environment models do not accommodate external change, and therefore every (frequent) surprise 
contradicts their environment models. 
However, in some cases expectations are a function of assumptions about the state. We define 
assumptions as properties of the environment that the agent reasons about despite not being able 
to observe them. For example, after inferring a model of fire spouts, Westley might assume, in the 
absence of information, that there is no fire spout behind a tree in front of him, even though he 
cannot yet observe the location. In algorithms that infer expectations based on assumptions, 
surprises often result from faulty assumptions rather than a faulty model, and contradiction of the 
model therefore has a special status. We define a set of possible assumptions ?, and a function 
derivedexpectations(?, ?, ?) that yields the expectations derived from the set of 
assumptions ??? taken by the agent as true. We define the condition of an agent being surprised 
by a contradiction to its model as follows:  
 
surprisem(?, ?) ? ????: [derivedexpectations(?, ?, ?) ? observations(?, ?) ? ? ??]  (2) 
70
M. MOLINEAUX AND D. AHA 
 
 
From this, we can derive the fact that  
 [??? ?: expectations(?, ?) = derivedexpectations(?, ?, ?)]                  (3) 
? ?surprisem(?, ?) ?  surprisex(?, ?)?.  
This second type of surprise (Equation 2) requires that the expectations derived from all possible 
sets of assumptions be inconsistent with the observations. In this case, we say that ?, the model 
itself, contradicts the observations. As derived in Equation 3, a model contradiction surprise will 
always result in an expectations surprise, if the agent's expectations at time ? are derived from a 
set of possible assumptions.  
In our model of explanations, the definition of a consistent explanation is based on logical 
entailment of observations from some set of assumptions and the model. Therefore, if a plausible 
explanation exists, an agent?s surprise is not due to a contradiction with its environment model. 
Therefore, the occasions when a GDA agent using this explanation formalism is surprised due to 
a model contradiction (i.e., the set of all ? such that surprisem(?, ?)) must be a subset of those 
occasions when a discrepancy is detected and no consistent explanation is found. Below, we 
describe an agent that uses this method as a means of identifying when model contradictions 
occur. By using this procedure to identify model contradictions, we avoid the computational 
complexity of testing every possible set of assumptions, instead incurring only the complexity of 
the search for consistent explanations.  
3.3.3  Surprise example 
If Westley has a model of a fire spout, he may still be surprised by one; if a fire spout exists at a 
location X, and Westley has not observed it, and assumes there is no fire spout at location X, then 
his expectations do not predict that a flame will spurt at location X. Once this spurt occurs, he is 
surprised; this surprise contradicts his expectations, but not his model. If Westley then adopts the 
assumption that a fire spout exists at location X, his expectations change and the contradiction 
disappears. 
In contrast, without a model of fire spouts, Westley will be surprised by the flame spurt even 
with the assumption that a flame spout exists at location X, because his model fails to predict that 
the flame spurt occurs. In this case, Westley's expectations are contradicted as well as his model. 
4.  Learning Event Models 
We perform our investigation of learning from surprise by creating FOOLMETWICE, an agent that 
extends ARTUE (Section 4.1) (Molineaux et al., 2010a) with the ability to learn models of 
unknown events whose observations caused model contradictions. Our process for learning these 
models has three steps: (1) recognizing unknown events (Section 4.2), (2) generalizing event 
preconditions (Section 4.3), and (3) hypothesizing an event model (Section 4.4). 
4.1  ARTUE 
ARTUE performs the four GDA tasks as follows: (1) discrepancy detection is performed by 
checking for element-wise contradictions between its observations and expectations, (2) 
71
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
explanation generation is performed by searching for consistent explanations using 
DISCOVERHISTORY (Molineaux et al., 2012), (3) goal formulation uses a rule-based system to 
generate new goals with associated priorities, and (4) goal management enacts the goal with the 
highest current priority. ARTUE uses a version of the hierarchical network (HTN) planner 
SHOP2 (Nau et al., 2003) to generate plans. To predict future events, Molineaux, Klenk, and Aha 
(2010b) extended SHOP2 to reason about planning models that include events in the PDDL+ 
representation. To work with an HTN planner, ARTUE uses a pre-defined mapping from each 
possible goal to an HTN task that accomplishes it. 
4.2  Recognizing Unknown Events 
As described in Section 3.3.2, in FOOLMETWICE a surprise due to model contradiction can occur 
only when a discrepancy is detected and no consistent explanation can be found. In our current 
work, we assume that (1) a model contradiction has occurred each time no consistent explanation 
can be found and (2) the surprise that triggered discrepancy detection was caused by some 
unknown event ?. An explanation that contains all correct events other than unknown events must 
be inconsistent with regard to the effects of each unknown event e. However, this event need not 
be the proximate cause; ? may have instead triggered another event or event sequence that was 
directly responsible for the contradictory observation. For this reason, the unknown event may 
have occurred in advance of the surprise. To find an explanation that is correct with respect to all 
known events, FOOLMETWICE searches for a minimally inconsistent explanation that is more 
plausible than any other inconsistent explanation that can be described based on the current model 
and observations. This inconsistent explanation does not fix the model contradiction, but does 
help to pinpoint the unknown events that caused it. 
DISCOVERHISTORY searches through the space of possible explanations by iteratively 
refining an existing inconsistent explanation (Molineaux et al., 2012). These refinements can 
include event removal, event addition, and hypothesis  of different initial conditions. At each 
successive iteration, a refinement can cause additional inconsistencies. Search ends when the 
entire explanation is consistent or a search depth bound is reached. 
To search for minimally inconsistent explanations, we extend DISCOVERHISTORY with an 
additional refinement that ignores a single inconsistency by creating an inconsistency patch. 
Given an inconsistency (?, ?, ??), it refines the explanation by adding a patch occurrence ?? =
 (?? , ??). Here, eh is a patch event that satisfies effects+(??) = {?} and precond(??) =
 {??}, and ?? is an occurrence point such that occ(?) ?  ?? ? occ(?? ). This operation will not 
change any other literal, and thus will never cause an inconsistency. An explanation containing a 
patch event is not consistent and all patched inconsistencies are considered for purposes of 
determining whether the explanation is minimally inconsistent. 
The extended DISCOVERHISTORY used by FOOLMETWICE conducts a breadth-first search, 
stopping only when all inconsistencies are resolved or patched. We define the minimally 
inconsistent explanation as the inconsistent explanation with the lowest cost, where cost is a 
measure of the explanation?s plausibility. In particular, we define the cost for patching an 
inconsistency to be much greater (10) than other refinements (1). Since we define lower cost 
explanations as more plausible than higher cost explanations, this cost differential reflects that a 
known and modeled event is a much more likely cause than an unknown event. As a result, the 
search process heavily favors explanations with fewer patches. If all correct events are described 
72
M. MOLINEAUX AND D. AHA 
 
by the explanation, unknown events correspond directly to the inconsistency patches; the 
unknown effects are the same as those of the patch events. 
The predominant computational cost of DISCOVERHISTORY and the extended version 
presented here is the breadth-first search for explanations. We bound this depth to a constant 
factor to ensure manageable execution times; a depth bound of 20 was used in this paper, 
resulting in a worst-case complexity of ?(?20), where n is the branching factor (i.e., the number 
of possible refinements available at each node). Typical values range between 2 and 10. Each 
explanation search conducted in these experiments took less than 60 seconds to perform.  
4.3  Generalizing Event Preconditions 
Once we have determined when unknown events occur using a minimally inconsistent 
explanation, we must generalize over the states that trigger that events to create a model of its 
preconditions. We chose FOIL (Quinlan, 1990) for our preliminary investigation on learning 
event models because it is well-known, operates on relational data, and generates logical 
hypotheses (called concept definitions). FOIL takes as input a set of positive and negative 
examples of a target relation (i.e., ground literals that are true and false in the domain), as well as 
an extensional definition of other relations (i.e., a set of all true ground literals for each relation). 
To find a target definition, FOIL recursively adds non-ground literals to a Horn clause until some 
positive examples but no negative examples are covered. FOIL greedily chooses a literal that 
produces the most information gain to add to the Horn clause at each step of this search. When a 
rule is discovered, the positive examples covered by that rule are removed and the process repeats 
until all positive examples are covered. The resulting clauses form a set of rules from which the 
concept can be inferred. To prefer shorter concept definitions, we employ an iterative deepening 
search through the FOIL target definition space. 
 For the purpose of event learning, the target concept we wish to learn is the state that triggers 
an unknown event. We create a relation event-occurs to represent this concept; the 
arguments of this literal include the name and arguments of the inconsistent literal p we believe to 
be caused by the event, as well as the occurrence point at which a patch was created, and a unique 
symbol referring to the scenario during which it occurred. For example, a minimally inconsistent 
explanation for the Princess Bride Fire Swamp environment might include a patch occurrence 
(??, ??) where effects(??) = {(sinking?rapidly Buttercup)}. Here, the target 
concept is the event that causes the effect literal. The positive example literal here would be 
(event?occurs sinking?rapidly Buttercup ?? PrincessBride). 
Along with these examples, we must provide an extensional definition of the environment 
that includes all ground literals explained by the current minimally inconsistent explanation ?. 
Like the event-occurs predicate, we extend each predicate in the domain to include an 
occurrence point at which it is known to be true and a unique id for the scenario in which it 
occurred, so that literals describing the same state can be grouped. For example, the extensional 
definition for the Fire Swamp could include the following literals:  
[(friend-of Westley Buttercup ?0 PrincessBride) 
(friend-of Buttercup Westley ?0 PrincessBride) 
(location Buttercup house ?0 PrincessBride) 
(location Westley stable ?0 PrincessBride) 
(friend-of Westley Buttercup ?? PrincessBride) 
(friend-of Buttercup Westley ?? PrincessBride) 
73
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
(location Buttercup under-a-tree ?? PrincessBride) 
(location Westley on-the-path ?? PrincessBride) 
(sandy-location under-a-tree ?? PrincessBride)]. 
4.4  Hypothesizing an Event Model 
After FOOLMETWICE completes a scenario, it searches for a minimally inconsistent explanation 
that uses none of the previously learned event models. These models are kept out of this 
explanation, which is input to the learning process, to avoid compounding errors between 
multiple learning iterations. All literals consistent with this explanation are added to a persistent 
extensional definition for their respective relations, as well as event-occurs literals 
corresponding to all inconsistency patches. Then, FOIL is called once for each ground literal 
covered by an inconsistency patch during the most recent scenario.  
Each Horn clause output by FOIL is used to construct a new learned event model that has as 
its condition the Horn clause output, and as its effects, the single ground literal believed to be 
found to be inconsistent. While all original relation terms are ground in the target concept to 
learn, the occurrence point and scenario id are free variables. For example, the target concept for 
the Princess Bride Fire Swamp example would be (event-occurs sinking-rapidly 
Buttercup ?t ?scn). If FOIL then output as its result the Horn clause (event-occurs 
sinking-rapidly Buttercup ?t ?scn) ? (location Buttercup ?x ?t 
?scn) (sandy-location ?x ?t ?scn), FOOLMETWICE would construct the event: 
(:event new-event51 
 :conditions ((location Buttercup ?x) (sandy-location ?x)) 
 :effect (sinking-rapidly Buttercup) 
) 
FOOLMETWICE adds formulated events to its environment model, which can be used for planning 
and explanation during future scenarios. With each scenario, the extensional definition 
knowledge is increased, which allows FOOLMETWICE to induce more accurate models, if 
necessary, after experiencing additional scenarios. Thus, models improve over time. 
5.  Experiment 
While the learning task is to construct accurate event models, multiple models may accurately 
predict the same phenomena. Therefore, we evaluate FOOLMETWICE based on its capability to 
construct better plans with the learned models than without. 
5.1  Environment and Hypothesis 
For this study, we use a simple deterministic environment called MudWorld, which consists of a 
discrete navigational grid on which a simulated robot can move in the four cardinal directions. 
The robot is aware of its location and destination, and its only obstacle is mud. Each location can 
be muddy or not muddy, and the robot can see the mud when it enters an adjacent grid location. If 
the robot enters the mud, its movement speed is halved until it leaves. The robot?s plan cost 
function attempts to minimize traversal execution time, so spending time in mud will decrease its 
performance. However, the initial model given to our robot does not describe this decrease in 
74
M. MOLINEAUX AND D. AHA 
 
speed. It can observe its current speed, and it will therefore be surprised when it changes due to 
the mud. We hypothesize that, by learning, the robot can improve its performance, spending less 
time achieving its navigation goals than it would otherwise. 
5.2  Experiment Description 
We randomly generated 50 training and 25 test scenarios in MudWorld, where each scenario 
consists of a 6x6 grid with random start and destination locations, and a 40% chance of each grid 
location being muddy. Start and destination locations were constrained so that all routes  between 
start and destination locations contain at least 4 steps, irrespective of mud. We conducted 10 
replications. In each replication, we measured its performance on each of the 25 test scenarios 
before and after learning on each of 5 successive training scenarios (i.e., each of the 50 training 
scenarios was used once in our experiment). 
5.3  Results 
Figure 2 shows the results of our experimental evaluation. We depict the results of testing 
FOOLMETWICE in blue, and for comparison show results achieved on the same test scenarios for 
a non-learning version with a complete hand-engineered model (in green) and without (in red). 
The vertical axis depicts the simulated time required to complete the test scenarios, where lower 
numbers are better (faster completion times). The horizontal axis depicts the number of training 
scenarios provided. Each point on the red (square markers) and green (circle markers) curves is 
an average of performance on the 25 testing scenarios. Each point on the blue (triangle markers) 
curve is an average of performance on the 25 testing scenarios across all 10 replications of the 
experiment.  
In our tests, FOOLMETWICE was always able to achieve the same maximal performance as an 
agent with a complete model after only two training scenarios. After even one scenario, we can 
Figure 2: Time required by FOOLMETWICE to complete a scenario. 
75
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
state with high statistical confidence (p < .001) that average performance is improved over the 
prior model. 
We expect that similar results could be obtained for similar domains in which unknown 
events are deterministic and based only on predicate literals. Our results do not currently 
generalize to nondeterministic events, willed actions, or events dependent on values of function 
literals. We discuss some future research topics in Section 6. 
6.  Conclusions 
We described an initial investigation into the problem of learning from surprises in the context of 
Goal-Driven Autonomy. We provided a novel definition of surprise that distinguishes types of 
surprise (i.e., contradiction of expectations versus contradiction of the environment model) that 
has not been previously recognized. We described a novel agent, FOOLMETWICE, which uses a 
new technique for identifying contradictions present in a model based on surprise and explanation 
generation, and a method for using relational learning to update an environment model in such a 
context. Finally, we conducted an initial evaluation of FOOLMETWICE in an execution context. 
This evaluation showed that it is possible to learn a better environment model rapidly under some 
conditions. 
FOOLMETWICE's mechanism for detecting unknown events is not infallible; while we have so 
far assumed that an expectation?s surprise which cannot be explained is the result of a model 
contradiction, it is possible that an existing explanation simply was not found, perhaps due to 
computational constraints. In such cases, FOOLMETWICE will incorrectly attempt to learn a new 
model to explain the contradiction. In other cases, unknown events do not cause a model 
contradiction, because an incorrect explanation can be found for a surprise. These false positives 
and false negatives are an important area for future investigation. 
In addition to improving detection of unknown events, future work will focus on 
demonstrating the performance of FOOLMETWICE in domains with greater complexity. In 
particular, research into opportunistic domains, where surprises provide affordances rather than 
represent obstacles, is an important next step. In addition, after our agent reaches an acceptable 
level of performance at learning unknown event models for these more complex domains, we will 
investigate the problem of learning process models that represent continuous change, as well as 
models of the actions of other agents and their motivations. 
We also intend to apply the algorithms described here to the problem of active transfer 
learning, in which an agent acting in a similar domain to one it understands quickly acquires 
environment models in that domain with minimal expert intervention. FoolMeTwice can 
theoretically perform transfers between similar domains by treating the environment model of a 
source domain as an incomplete model of its new domain. However, additional research into 
integrating expert feedback and removing prior incorrect models are necessary to fulfill this 
promise. 
Acknowledgements 
Thanks to OSD ASD (R&E) for sponsoring this research, and to the anonymous reviewers for 
their recommendations. The views and opinions contained in this paper are those of the authors 
and should not be interpreted as representing the official views or policies, either expressed or 
implied, of NRL or OSD. 
76
M. MOLINEAUX AND D. AHA 
 
References 
Bridewell, W., Langley, P., Todorovski, L., & D?eroski, S. (2008). Inductive process modeling. 
Machine learning, 71(1), 1-32. 
Chernova, S., Crawford, E., & Veloso, M. (2005). Acquiring observation models through reverse 
plan monitoring. Proceedings of the Twelfth Portuguese Conference on Artificial Intelligence 
(pp. 410-421).  Covilh?, Portugal: Springer. 
Cox, M.T., & Veloso, M.M. (1998). Goal transformations in continuous planning. In M. 
desJardins (Ed.), Proceedings of the AAAI Fall Symposium on Distributed Continual Planning 
(pp. 23-30). Menlo Park, CA: AAAI/MIT Press. 
Ghallab, M., Nau, D., & Traverso, P. (2004). Automated planning: Theory & practice. San 
Mateo, CA: Morgan Kaufmann. 
Goldman, W. (1973). The princess bride. San Diego, CA: Harcourt Brace. 
Gspandl, S., Pill, I., Reip, M., Steinbauer, G., & Ferrein, A. (2011). Belief management for high-
 level robot programs. In Proceedings of the Twenty-Second International Joint Conference on 
Artificial Intelligence. Barcelona, Spain: AAAI Press. 
Hiatt, L.M., Khemlani, S.S., & Trafton, J.G. (2012). An explanatory reasoning framework for 
embodied agents. Biologically Inspired Cognitive Architectures, 1, 23-31. 
Klenk, M., Molineaux, M., & Aha, D.W. (2013). Goal-driven autonomy for responding to 
unexpected events in strategy simulations. Computational Intelligence, 29(2), 187-206. 
Leake, D. B. (1991). Goal-based explanation evaluation. Cognitive Science, 15, 509?545. 
Molineaux, M., Aha, D.W., & Kuter, U. (2011). Learning event models that explain anomalies. In 
T. Roth-Berghofer, N. Tintarev, & D.B. Leake (Eds.) Explanation-Aware Computing: Papers 
from the IJCAI Workshop. Barcelona, Spain.  
Molineaux, M., Klenk, M., & Aha, D.W. (2010a). Goal-driven autonomy in a Navy strategy 
simulation. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. 
Atlanta, GA: AAAI Press. 
Molineaux, M., Klenk, M., & Aha, D.W. (2010b). Planning in dynamic environments: Extending 
HTNs with nonlinear continuous effects. In Proceedings of the Twenty-Fourth AAAI 
Conference on Artificial Intelligence. Atlanta, GA: AAAI Press. 
Molineaux, M., Kuter, U., & Klenk, M. (2012). DiscoverHistory: Understanding the past in 
planning and execution. Proceedings of the Eleventh International Conference on Autonomous 
Agents and Multiagent Systems (Volume 2) (pp. 989-996). Valencia, Spain: International 
Foundation for Autonomous Agents and Multiagent Systems.  
Nau, D.S. (2007). Current trends in automated planning. AI Magazine, 28(4), 43?58.  
Nau, D., Au, T.-C., Ilghami, O, Kuter, U, Murdock, J.W., Wu, D., & Yaman, F. (2003). SHOP2: 
An HTN planning system. Journal of Artificial Intelligence Research, 20, 379-404. 
Nguyen, T.H.D., & Leong, T.Y. (2009). A surprise triggered adaptive and reactive (STAR) 
framework for online adaptation in non-stationary environments. In Proceedings of the Fifth 
Artificial Intelligence and Interactive Digital Entertainment Conference. Stanford, CA: AAAI 
Press. 
77
LEARNING MODELS FOR PREDICTING SURPRISING EVENTS 
 
Pang, W., & Coghill, G.M. (2010). Learning qualitative differential equation models: A survey of 
algorithms and applications. Knowledge Engineering Review, 25(1),: 69-107. 
Peot, M., & Smith, D.E. (1992). Conditional nonlinear planning. Proceedings of the First 
International Conference on Artificial Intelligence Planning Systems (pp. 189-197). College 
Park, MD. 
Pryor, L., & Collins, G. (1996). Planning for contingencies: A decision-based approach. Journal 
of Artificial Intelligence Research, 4, 287-339. 
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine learning, 5(3), 239-
 266. 
Ramisinghe, N., & Shen, W.-M. (2008). Surprised-based learning for developmental robotics. 
Proceedings of the ECSIS Symposium on Learning and Adaptive Behaviors for Robotic 
Systems (pp. 65-70). Edinburgh, Scotland: IEEE Press. 
Sohrabi, S., Baier, J. A., & McIlraith, S. A. (2010). Diagnosis as planning revisited. In 
Proceedings of the Twelfth International Conference on Principles of Knowledge 
Representation and Reasoning. Toronto, Ontario, CA: AAAI Press. 
Sutton, R.S., & Barto, A.G. (1998). Reinforcement learning: An introduction. Cambridge, MA: 
MIT Press. 
Weber, B., Mateas, M., & Jhala, A. (2012). Learning from demonstration for goal-driven 
autonomy. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. 
Toronto, Canada: AAAI Press. 
Zhuo, H.H., Hu, D.H., Hogg, C., Yang, Q., & Mu?oz-Avila, H. (2009). Learning HTN method 
preconditions and action models from partial observations. Proceedings of the Twenty-First 
International Joint Conference on Artificial Intelligence (pp. 1804-1810). Pasadena, CA: AAAI 
Press. 
78
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning  
 
 
 
Goal-Driven Autonomy in Dynamic Environments 
 
 
Matt Paisner MATTHEW.PAISNER@GMAIL.COM 
Michael Maynord MAYNORD@UMD.EDU 
Michael T. Cox* MCOX@CS.UMD.EDU 
Don Perlis PERLIS@CS.UMD.EDU 
Department of Computer Science, University of Maryland, College Park, MD 20742 USA 
*Institute for Advanced Computer Studies, University of Maryland, College Park, MD USA 
Abstract 
Dynamic environments are complex and change in often unexpected ways. Given such an 
environment, many autonomous agents have difficulty when the world does not cooperate with 
design assumptions. We present an approach to autonomy that seeks to maximize robustness rather 
than optimality on a specific task. Goal-driven autonomy involves recognizing possibly new 
problems, explaining what causes the problems, and autonomously generating goals to solve the 
problems. We present such a model within the MIDCA cognitive architecture and show that under 
certain conditions this model outperforms a less flexible approach to handling unexpected events. 
1.  Introduction 
Humans are astonishingly versatile, dealing with a wide range of unanticipated circumstances on 
the fly while still making headway on high-level goals. Humans can also recognize new problems 
and opportunities when they arise and react appropriately to them. Yet for the most part our 
machines cannot; they are like idiot-savants, very good at one narrow task and useless for anything 
else, even tasks which differ only slightly from the task they were designed for. This is the so-
 called brittleness problem (Lenat & Guha, 1989), a major stumbling block for AI. What we appear 
to need is the opposite of expert systems: machines that might not be experts at anything, but that 
can muddle through a wide range of circumstances and keep a strategic perspective. Yet more than 
50 years of intense effort has failed to produce such machines. One approach to the problem of 
brittleness combines what we call goal-driven autonomy and the metacognitive loop. Here we will 
show how together they can address significant issues of performance in dynamic environments. 
In our view, the community has not fully exploited the computational power and promise of 
metacognition. We propose to test the hypothesis that metacognitive monitoring and control of 
reasoning can help confront the brittleness problem, leading to autonomous systems that are much 
more robust and long-lived than current systems, and that require significantly less domain-specific 
insight on the part of system designers. The Metacognitive Loop (MCL) (Anderson & Perlis, 2005) 
involves a three-step approach: First an agent notes any failure of match (a so-called anomaly ? 
whether in its reasoning, its KB, or its behavior) between what it expects and what it observes; 
second it assesses the anomaly in causal terms of the failure; and third it guides an appropriate 
 
79
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
response to repair or learn from the failure. This is the Note-Assess-Guide procedure. A particularly 
useful instantiation of the procedure can be found in the methods of goal-driven autonomy. 
Goal-Driven Autonomy (GDA) is a unique conception that gives more independence to 
autonomous agents (Cox, 2007; Klenk, Molineaux, & Aha, 2013; Munoz-Avila, Jaidee, Aha, 
Carter, 2010). Here the MCL steps of Note-Assess-Guide map onto the tasks of recognizing a 
problem, explaining what causes the problem, and generating a goal to remove the problem. That 
is, rather than arbitrary anomaly-detection, the agent detects a problem (possibly relevant to its 
current goals and mission). Not all anomalies are problems, nor are all anomalies important enough 
to attend to. Rather than general assessment, the agent should abductively explain the causal factors 
giving rise to the problem situation. The response to the problem may be to generate a (possibly 
new) goal that solves the problem (e.g., by removing its supporting conditions). In these terms, 
GDA is as much about problem recognition as it is problem-solving (Cox, in press). Planning for 
the goal as a result of intending to achieve it would presumably result in actions that then bear on 
the problem. But we argue that the key to dealing effectively with anomalous situations in dynamic 
environments is for the agent to understand problems that the world presents; the success of 
planning and acting will then follow (or not). But without recognition of a problem and of a goal 
to solve it, planning and acting make no sense at all. 
Consider a fire that breaks out at a construction site. This is a problem in many ways, not the 
least of which is that preconditions for actions (e.g., the integrity of building materials) will become 
unsatisfied. A subgoal to remove the burning condition is thus warranted in such cases, and the fire 
will be extinguished. However a subsequent fire might prompt the recognition by a supervisor of a 
long term threat to the construction site. An arsonist explanation would hypothesize the presence 
of criminal activity and license the goal of having the perpetrator in jail. Hence a direct reactive 
approach to fires would be to put them out; a GDA approach to this same situation would recognize 
the larger problem in terms of its threat to the future of the enterprise.  
This paper will examine the distinction between such approaches to intelligent reasoning and 
behavior in a metacognitive architecture called MIDCA and will report the results of a simple 
empirical study to evaluate these differences. In section 2, we present the MIDCA architecture 
containing an implemented instantiation of the GDA model. In section 3 we evaluate the 
performance of systems making use of three distinct goal generation methods: exogenous goals; 
statistically generated goals; goals produced by a knowledge rich explanation system. Section 4 
presents an overview of future work, section 5 surveys related work, and section 6 concludes. 
2.   Goal-Driven Autonomy in a Cognitive Architecture 
The Metacognitive, Integrated, Dual-Cycle Architecture (MIDCA) (Cox, Maynord, Paisner, Perlis, 
& Oates, 2013; Cox, Oates, & Perlis, 2011) consists of ?action-perception? cycles at both the 
cognitive (i.e., object) level and the metacognitive (i.e., meta-) level. Figure 1 shows the 
implemented components of the object level with the meta-level abstracted. The output side of each 
cycle consists of intention, planning, and action execution, whereas the input side consists of 
perception, interpretation, and goal evaluation. A cycle selects a goal and commits to achieving it. 
The agent then creates a plan to achieve the goal and subsequently executes the planned actions to 
make the domain match the goal state. The agent perceives changes to the environment resulting 
from the actions, interprets the percepts with respect to the plan, and evaluates the interpretation 
with respect to the goal. At the object level, the cycle achieves goals that change the environment 
(i.e., ground level). At the meta-level, the cycle achieves goals that change the object level. That is, 
 
80
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
the metacognitive ?perception? components introspectively monitor the processes and mental state 
changes at the cognitive level. The ?action? component consists of a meta-level controller that 
mediates reasoning over an abstract representation of the object level cognition. 
Comprehension starts with perception of the world in the attentional field via the Perceive 
component. The Interpret component takes as input the resulting Percepts (i.e., ??????) and the 
expectations in memory (??? and ????) to determine whether the agent is making sufficient progress. 
A GDA interpretation procedure implements the comprehension process. The procedure is to note 
whether an anomaly has occurred; assess potential causes of the anomaly by generating explanatory 
Hypotheses; and guide the system through a response. Responses can take various forms, such as 
(1) test a Hypothesis; (2) ignore and try again; (3) ask for help; or (4) insert another goal (????). 
Otherwise given no anomaly, the Evaluate component incorporates the concepts inferred from the 
Percepts thereby changing the world model (?????), and the cycle continues. This cycle of problem-
 solving and action followed by perception and comprehension functions over discrete state and 
event representations of the environment.  
Here the blue meta-level executive shell simply provides the goal set ??. In this capacity, the 
meta-level can add initial goals (??0) or new goals (????) to the set. In problem solving (left side), the 
Intend component commits to a current goal (????) from those available. The Plan component then 
generates a sequence of Actions (???, i.e., a hierarchical-task-net plan). The plan is executed by the 
Act component to change the actual world (?) through the effects of the planned Actions (????). The 
agent stores the goal and plan in memory to provide it with expectations about how the world will 
change in the future. The agent will then use these expectations in the next cycle to evaluate the 
execution of the plan and its interaction with the world with respect to the goal.  
Figure 1. The MIDCA_1.1 object level structure. Note that execution-time subgoaling 
(dashes) is not currently implemented. TFT stands for TF-Tree (see section 2.2.1), and 
XP stands for eXplanation Pattern (see section 2.2.2). 
Goal Gen
 percepts
 Detect
 anomaly
 Memory
 Goals( ) 
Semantic Memory 
& Ontology
 Plans( ) & 
Percepts ( )
 Problem
 Solving
 Comprehension
 goal input
 goal
 insertion
 SHOP2 Interpret
 Goals
 Actions Percepts
 EvaluateIntend
 Act Perceive
 Executive Shell
 ?? ?
 World =?
 subgoal
 Recognize
 Explain
 Goal Gen
 problem
 XP
 percepts
 D-track K-track 
 
81
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
2.1  Metacognitive, Integrated, Dual-Cycle Architecture Version 1.1  
MIDCA_1.1 is an early version of the architecture whose components are shown in the schematic 
of Figure 1. It implements each phase of the cognitive loop, allowing the MIDCA agent to notice, 
analyze and respond to events in a simple blocksworld domain. MIDCA_1.1 employs a central 
memory structure through which the components implementing each individual box shown in 
Figure 1 interact. Currently, memory is implemented as a simple key/value map through which 
individual data structures can be accessed. In future versions of MIDCA, we plan to add several 
capabilities to this, including a model of time and a system for reasoning about it as well as a 
detailed accounting of the sources of the systems knowledge. These plans will be discussed in 
greater detail in the future work section. 
2.1.1 Performance Domain 
To evaluate the performance of MIDCA in goal generation, we place the system in a modified 
blocksworld domain. This version of blocksworld includes both rectangular and triangular blocks, 
which compose the materials for simplified housing construction. The initial goals for problems in 
this domain are to build houses consisting of towers of blocks with a roof (triangle) on each. 
Specifically, the housing domain goes through a cycle of three state classes in building new 
?houses.? Figure 2 shows three intermediate states and the goals that transition the system between 
such states.  
 
We use a simple world simulator that simulates actions specified using predicate logic. The 
types of actions which can be performed are specified prior to startup in a domain file. Actions 
MIDCA produces during the act phase will be simulated, as well as actions performed by other 
agents and natural events. For the purpose of generating interesting anomalies for MIDCA to deal 
with, we have added the possibility that blocks may catch fire (set by a hidden arsonist). 
Furthermore there are three additional actions available to MIDCA to deal with fires. The three 
new types of actions, then, are as follows. 
? light-on-fire(block-b) if block-b is not on fire, lights it; performed only by the arsonist 
? put-out-fire(block-b) if block-b is on fire, extinguishes it  
? apprehend(arsonist-a) imprisons arsonist, rendering incapable of lighting any more fires 
 
 MIDCA_1.1?s task is to build ?towers? of blocks while also dealing appropriately with the fire. In 
the next subsection, we describe the techniques it uses in achieving this. 
Figure 2. Three classifiers exist that recognize goals to get to the subsequent state 
 
82
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
2.1.2 MIDCA_1.1 Reasoning Components 
MIDCA_1.1 is implemented by a series of components, centered about a core memory 
structure. Each of these components implements a single phase of the MIDCA cognitive loop 
shown in Figure 1. Running MIDCA_1.1 is equivalent to running each of these components in the 
order shown in Figure 1. This cycle repeats at each time step, beginning with the perceive 
component. Components interact by storing information in memory, where it can be accessed and 
used later in the cycle and in future cycles. The individual implementations are described below. 
Perceive. The perceive phase is implemented very simply. The perceive component makes a 
copy of the current world state (defined in a predicate logic representation) and stores it in memory. 
As a result, MIDCA_1.1 always has a perfect, noise-free view of the current world state that is 
complete except for the direct actions of the arsonist. In the future work section, we describe plans 
to move away from this model to one in which perception may be flawed. In our simple blocksworld 
example, MIDCA_1.1 would copy to memory the same predicate representation of block 
relationships and attributes that the world simulator maintained as the current state. 
Interpret. The interpret phase has been at the core of our research efforts. It is implemented as 
a GDA interpretation procedure that uses both a bottom-up, data-driven track and a top-down, 
knowledge rich, goal-driven track (Cox, Maynord, Paisner, Perlis, & Oates, 2013).  MIDCA_1.1 
uses both of these processes to analyze the current world state and determine which, if any, new 
goals it should attempt to pursue. The details of this process are described below. In our example, 
this would be the phase in which MIDCA_1.1 noticed an anomaly in the blocksworld ? e.g., a block 
was on fire ? and decided what to do about it.  
Evaluate. In the evaluate phase, the goal generated during the previous step is evaluated. The 
system searches through the world representation stored during the perceive phase, and checks if 
the goal predicate exists in the world state. If so, MIDCA_1.1 notes that goal has been achieved. 
Additionally, during evaluate MIDCA_1.1 checks on the progress of its broader goals and updates 
the relevant performance metrics. In blocksworld, MIDCA_1.1 checks if its current goal, for 
example on(A, B) has been achieved. It then checks to see if a new tower has been built and if so 
how many blocks in it are on fire. All this data is stored in memory, and used later to score 
MIDCA_1.1?s success at achieving its goals in the face of problems. 
Intend. The intend component determines which goals to pursue. If the evaluate phase reports 
that the previous goal has been achieved, MIDCA_1.1 checks to see if a new goal was generated 
during the interpret phase. If so, it adopts that goal. Otherwise, it will do nothing until a new goal 
is generated. If the previous goal has not been achieved, it will also do nothing. In blocksworld, 
this component converts the goal that has been generated into a task that can be taken as input by 
the planner. For example, the goal not(onfire(A)) would be transformed into put-out-fire(A). 
Plan. For the planner, we use SHOP2 (Nau et al., 2003), a domain-independent task 
decomposition planner. If the intend component specified a new task, SHOP2 generates a plan to 
achieve that task given the current world state stored in memory. Otherwise, it does nothing. The 
actions and methods that are used to achieve each task in blocksworld are specified in a domain 
file that we supply. 
Act. MIDCA chooses the next action from the current plan, if one exists. Otherwise, it does 
not perform an action. If an action is chosen, it is sent to the world simulator, which uses it to 
compute the next world state. An example of such an action might be unstack(A, B) if SHOP2 had 
generated a plan containing that step.  
 
83
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
Apart from the central memory structure, MIDCA is designed to be highly modular. Each phase 
of the MIDCA loop is implemented by a component which is passed to the system at startup, and 
these components can be easily swapped in and out. Therefore, the component behaviors described 
above for MIDCA_1.1 can be easily expanded or modified to use different algorithms or 
approaches. For example, in the experiments described in Section 3, we achieve the different 
experimental setups described simply by passing MIDCA_1.1 different components corresponding 
to the interpret phase. We will now describe the nature of those different components. 
2.2  Interpretation 
The interpret phase of MIDCA has been the subject of much of our work, and is the focus of the 
experiments described below. It is implemented by two processes that combine to generate new 
goals based on the features of the world the agent observes. We call these processes the D-track, 
which is a data driven, bottom-up approach, and the K-track, which is knowledge rich and top-
 down. The D-track is implemented by a bottom-up GDA process as follows. A statistical anomaly 
detector constitutes the first step, a neural network identifies low-level causal attributes of the 
anomaly, and a goal classifier, constructed using methods from machine learning, provides the goal 
formulation. The K-track as it currently exists is implemented as a case-based explanation process. 
The representations for expectations significantly differ between the two tracks. K-track 
expectations come from explicit knowledge structures such as action models used for planning and 
ontological conceptual categories used for interpretation. Predicted effects form the expectations 
in the former and attribute constraints constitute expectation in the latter. D-track expectations are 
implicit by contrast. Here the implied expectation is that the probabilistic distribution from which 
observations are sampled will remain the same. When the difference between the expected and the 
perceived distribution is statistically significant, an expectation violation is raised.  
2.2.1 D-Track Goal Generation  
The D-track interpretation procedure uses a novel approach for noting anomalies. We apply the 
statistical metric called the A-distance to streams of predicate counts in the perceptual input (Cox, 
Oates, Paisner, & Perlis, 2012; 2013). This enables MIDCA to detect regions whose statistical 
distributions of predicates differ from previously observed input. These regions are those where 
change occurs and potential problems exist.  
When a change is detected, its severity and type can be determined by reference to a neural 
network in which nodes represent categories of normal and anomalous states. This network is 
generated dynamically with the growing neural gas algorithm (Paisner, Perlis, & Cox, 2013) as the 
D-track processes perceptual input. This process leverages the results of analysis with A-distance 
to generate anomaly prototypes, each of which represents the typical member of a set of similar 
anomalies the system has encountered. When a new state is tagged as anomalous by A-distance, 
the GNG net associates it with one of these groups and outputs the magnitude, predicate type, and 
valence of the anomaly. 
Goal generation is achieved in  MIDCA_1.1 using TF-Trees (Maynord, Cox, Paisner, & Perlis, 
2013), a machine-learning classification structure that results from the conjunction of two 
algorithms both of which work over the predicate representation of the blocks world domain. The 
first of these algorithms is Tilde (Blockeel, & De Raedt, 1997), which is itself a generalization of 
the standard C4.5 (Quinlan, 1993) decision tree algorithm. The second algorithm is FOIL (Quinlan, 
1990), an algorithm which, given a set of examples in predicate representation reflecting some 
 
84
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
concept, induces a rule consisting of conjunctions of predicates that identify the concept. Given a 
world state, a TF-Tree first uses Tilde to classify the state into one of a set of scenarios, where each 
scenario is associated with a rule generated by FOIL. Once that rule is obtained, groundings of the 
arguments of the predicates in that rule are permuted until either a grounding that satisfies the rule 
is found (in which case a goal is generated) or until all permutations have been eliminated as 
possibilities (in which case no goal is generated). This approach to goal insertion is na?ve in the 
sense that it constitutes a mapping between world states and goals which is static with respect to 
any context. The structure of a TF-Tree is a tree where in internal nodes are produced by Tilde and 
leaf nodes are rules produced by FOIL. Figure 3 depicts the structure of the TF-Tree MIDCA_1.1 
uses in cycling through the 3 block arrangements. For example given the middle state of Figure 2, 
triangle-D is clear, it is on the table, and the table is a table. Thus we take the right branch labeled 
?yes.? Now triangle-D is also a triangle, so again we take the ?yes? branch to arrive at the right-
 most leaf of the tree. The leaf rule then binds the variable Y to the clear square-C, and the resulting 
goal is to have triangle-D on square-C. 
The construction of a TF-Tree requires a training corpus consisting of world states and 
associated correct and incorrect goals. In simple worlds TF-Trees can be constructed which have 
perfect or near perfect accuracy using small training corpora. Corpora have to be constructed by 
humans, as labels need to be attached to potential goals in various world states. For simple worlds 
corpora construction does not carry an excessive burden, the burden of corpora construction 
however increases with the complexity of the world. Because a TF-Tree is a static structure trained 
on the specifics of the world, when the world changes, even in minor ways, a new training corpus 
has to be constructed and a new TF-Tree trained. However, the corpus to create a simple tree for 
reacting to fires (see Figure 4) consisted of only four examples.  
 
Figure 3. Depiction of the TF-Tree used in cycling through the 3 block arrangements used in  MIDCA_1.1 
Figure 4. TF-Tree that generates goals to put out fires 
 
85
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
 
2.2.2 K-Track Goal Generation 
The K-track GDA procedure uses the XPLAIN system (Cox & Burstein, 2008), an explanation 
subsystem of the POIROT (Burstein, et al., 2008) multi-strategy learning system for Web Service 
procedures. XPLAIN is built on top of the Meta-AQUA introspective story understanding system 
(Cox and Ram 1999; Lee and Cox 2002) and is used in MIDCA to detect and explain problems in 
the input perceptual representations. The system?s interpretation task is to ?understand? input by 
building causal explanatory graphs that link the individual subgraph representations of the events 
into a coherent whole where an example measure of coherency is minimizing the number of 
connected components. The sub-system uses a multistrategy approach to comprehension. Thus, the 
top-level goal is to choose a comprehension method (e.g., script processing, case-based reasoning, 
or explanation generation) by which it can understand an input. When an anomalous or otherwise 
interesting input is detected, the system builds an explanation of the event, incorporating it into the 
preexisting model of the story. XPLAIN uses case-based knowledge representations implemented 
as frames tied together by explanation-patterns (Cox & Ram, 1999; Schank, 1986; Schank, Kass, 
and Riesbeck 1994) that represent general causal structures.  
XPLAIN relies on general domain knowledge, a case library of prior plan schemas and a set of 
general explanation patterns that are used to characterize useful explanations involving that 
background knowledge. These knowledge structures are stored in a (currently) separate memory 
sub-system and communicated through standard socket connections to the rest of MIDCA_1.1. 
XPLAIN uses an interest-driven, variable depth, interpretation process that controls the amount of 
computational resources applied to the comprehension task. For example an assertion that triangle-
 D is picked up generates no interest, because it represents normal actions that an agent does on a 
regular basis. But XPLAIN classifies block-A burning to be a violent action and, thus according to 
its interest criterion, interesting. It explains the action by hypothesizing that the burning was caused 
by an arsonist. An abstract explanation pattern (see Table 1), or XP, retrieved from memory 
instantiates this explanation, and the system incorporates it into the current model of the actions in 
the input ?story? and passes it as output to MIDCA. 
The ARSONIST-XP asserts that the lighting of the block caused heat that together with oxygen 
and fuel (the block itself) caused the block to burn. The arsonist did this action because he wanted 
the blocks burning state that resulted from the burning. The objective is to counter a vulnerable 
antecedent of the XP. In this case the deepest antecedent is the variable binding =l-o or the lighting-
 object action. This can be blocked by either removing the actor or removing the ignition-device. 
The choice is the actor, and a goal to apprehend the arsonist is generated. 
 
 
 
 
 
 
 
 
 
 
 
 
86
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
Table 1. The arsonist explanation pattern 
(define-frame ARSONIST-XP 
  (isa (value (xp))) 
  (actor (value (criminal-volitional-agent))) 
  (object (value (physical-object))) 
  (antecedent (value (ignition-xp 
                      (actor (value =actor)) 
                      (object (value =object)) 
                      (ante (value (light-object =l-o 
                                     (actor (value =actor)) 
                                     (instrumental-object (value (ignition-device)))))) 
                      (conseq (value =heat))))) 
  (consequent (value (forced-by-states 
                      (object (value =object)) 
                      (heat (value =heat)) 
                      (conseq (value (burns =b 
                                      (object (value =object)))))))) 
  (heat (value (temperature (domain (value =object)) 
                            (co-domain (value very-hot.0))))) 
  (role (value (actor (domain (value =ante)) 
                      (co-domain (value =actor))))) 
  (explains (value =role)) 
  (pre-xp-nodes (value (=actor =consequent =object =role))) 
  (internal-nodes (value nil.0)) 
  (xp-asserted-nodes (value (=antecedent))) 
  (link1 (value (results 
                   (domain (value =antecedent)) 
                   (co-domain (value =consequent))))) 
  (link2 (value (xp-instrumental-scene->actor 
                   (actor (value =actor)) 
                   (action (value =l-o)) 
                   (main-action (value =b)) 
                   (role (value =role))))) 
  (links (value (=link1 =link2))))  
  
3.  Evaluation: Autonomous goal formulation 
The fires are problems because of the effect on housing construction and the supposed profits 
of the housing industry, and they pose threats to life and property. The approach to understanding 
fire problems is to ask why the fires were started and not just how. An explanation of how the fire 
started would relate the presence of sufficient heat, fuel, and oxygen with the combustion of the 
blocks. Generating the negation of the presence of the oxygen for example would result in the goal 
?oxygen and therefore put out the fire. But this does not get to the reason the fire started in the first 
place. To ask why the fire was started would result in possibly two hypotheses or explanations. 
Poor safety conditions can lead to fire or the actions of arsonists can result in fire. In this latter case, 
the arsonist causes the presence of the heat through some hidden lighting action. Given this 
explanation the agent can anticipate the threat of more fires and generate a goal to remove the threat 
by finding the arsonist. Apprehending the arsonist then removes the potential of fires in the future 
rather than just reacting to fires that started in the past. 
We tested the effectiveness of three methods for goal generation under these conditions. The 
first method was a simple baseline using predetermined, exogenous goals. The second method used 
the statistical, D-Track GDA method described in Section 2.2.1. The third method combined the 
D-Track approach with additional analysis using K-Track GDA as described in Section 2.2.2. 
Details appear in Table 2. For each test, MIDCA was run for 1000 time steps (equivalent to 
 
87
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
executing 1000 actions). At each step, the arsonist would have a probability p of starting a fire 
unless he had previously been apprehended. The value of p in the experiments described below was 
0.4, allowing for enough fires to be significant without precluding progress in the tower 
construction project.  
Table 2. Methods for goal generation 
 
We tracked three scoring metrics: the number of towers completed; the overall prevalence of 
fires; and a combined score measuring completion of fire-free towers. Details on each scoring 
metric are shown in Table 3. At each time step in which a tower was completed ? e.g. a triangular 
block was placed on a stack of rectangular blocks, ? all fires were automatically put out, and the 
agent started on a new construction project. 
Table 3. Scoring metrics for testing 
Towers 
Completed 
Total number of 3- and 4-block towers completed in 1000 cycles 
Fire  
Prevalence 
The number of blocks on fire times the number of time steps they were on fire. So, if 3 blocks were 
allowed to burn for three time steps and then put out simultaneously, the fire prevalence score 
would be 3 * 3 = 9. 
Overall  
Score 
This score awarded 1 point for each block in a completed tower that was not on fire at the time the 
tower was completed. So, a 4-block tower in which two blocks were on fire would add 4 ? 2 = 2 
points to this score. 
 
Preliminary empirical results show that GDA approaches using only the D-Track as well as 
using both D-Track and K-Track perform significantly better than a baseline that does not use 
GDA. Also, the combined D- and K-Track implementation outperforms the purely statistical 
variant by a large margin.  
Figure 4 shows the detailed results of testing. The agent that used only exogenous goals 
completed the most towers, but, because it did not deal with fires in any way, most of the towers 
were burning as they were completed and received very low scores. Certainly, this baseline 
behavior does not seem to be sufficient for a fully autonomous house construction agent. The 
second agent used behavior dictated by TF-Trees trained using methods from machine learning to 
fight fires directly. It did not finish as many towers because it divided its attention between building 
Exogenous  
Goals 
MIDCA used a predetermined goal list that cycled between the three states constructing towers. 
MIDCA did not deviate from this list in response to the appearance of fires. The goal list was 
simply: [on(C,A), on(D,C), on(D,A), on(C,A),?on(D,A)], as in Figure 2. 
D-Track  
GDA Goal 
Generation 
MIDCA generated goals using TF-Trees. These trees were trained and implemented such that when 
no fire was present, they would generate the next goal in the 3-part cycle, but when a fire was 
present, they would instead generate a goal to put it out. 
Dual-Track 
GDA Goal 
Generation 
MIDCA generated goals using a combination of TF-Trees and a K-Track GDA approach using 
XPLAIN. XPLAIN contained knowledge about the possible role of arsonists in starting fires and 
therefore suggested a goal to search for and apprehend an arsonist when a fire started. TF-Trees 
generated other goals as in 2. 
 
88
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
and extinguishing towers, but those towers it did construct were much less likely to be on fire. Its 
total score was 367, 54.2% better than the baseline agent. Finally, the dual-track GDA agent 
analyzed the problem logically using XPLAIN, and thereby suggested an explanation of the fires 
as potentially caused by arson. As such, it generated a goal early in the process to apprehend the 
arsonist. This took some time, but afterwards it was able to devote its full attention to house 
construction without devoting time to firefighting. It completed very nearly as many towers as the 
baseline agent, and did so with almost no incidence of fire, since no fires started after the arsonist 
was apprehended. The dual-track agent achieved a score of 584, 245.4% better than the baseline 
agent.  
It should not be surprising that an agent that is capable of reacting to the unanticipated problem 
posed by fire performs better than one that heedlessly continues on a predetermined course of 
action. Perhaps more telling is the large advantage gained by the dual-track agent, which has the 
knowledge to identify and address the true source of the problem, rather than simply treating its 
symptoms. Though this example is too simple to easily generalize, these results at least suggest the 
importance of combining a knowledge-rich approach with low-level data analysis to achieve the 
best possible results. In the next section, we discuss the steps we intend to take in order to duplicate 
these results in a domain that allows for a more convincing role for the data side of this equation. 
 
Figure 4. Results of testing using 3 methods. Note that the value of GDA Goal Generation in the Fire 
Prevalence panel is 2, which is too small to show clearly in the graph.Future Research 
 MIDCA_1.1 was designed with an eye towards expansion into both additional domains and added 
capabilities. We have several ideas about how to effect such expansion. The reader will likely have 
observed that the domain described in this paper is quite simplistic. In the future we plan to adapt 
0
 20
 40
 60
 80
 100
 120
 140
 160
 180
 Towers
 Completed
 Exogenous
 Goals
 Statistical
 Goal
 Generation
 GDA Goal
 Generation
 0
 200
 400
 600
 800
 1000
 1200
 1400
 1600
 Fire
 Prevalence
 Exogenous
 Goals
 Statistical
 Goal
 Generation
 GDA Goal
 Generation
 0
 100
 200
 300
 400
 500
 600
 Overall Score
 Exogenous Goals
 Statistical Goal
 Generation
 GDA Goal Generation
  
89
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
MIDCA for a robotic domain. Specifically, we plan to incorporate the MIDCA architecture into a 
PR2 robot (Cousins, 2010) and attempt to replicate results like those described in this paper. Some 
changes will be necessary (for example, lighting blocks on fire in the lab might not be advisable), 
but we might have the robot attempt to stack blocks into towers and face the unexpected problems 
that occur when the stacked blocks fall.  
 Solving this version of the block-stacking problem would require several additions to the 
current version of MIDCA. The PR2 would first need to find and identify the blocks. One of many 
ways to accomplish this could be by using blob detection combined with a set of rules that define 
what type of object a block is. Since the PR2 also has sensors and software that generate point-
 cloud distance measurements, size and location could also be used as determiners, if, for example, 
we constrain blocks to be small enough to lift and to be located on the table. Once blocks are 
identified, the next step would be to differentiate between them, and to characterize simple 
predicate relationships like those in the modified blocks world domain of this paper such as on or 
on-table. Given location information and a frame of reference, this might be accomplished using 
simple rules, so long as the environment was sufficiently uncomplicated.  
Once the robot has an idea of the objects in its environment, it can generate a plan to achieve a 
tower construction goal. Each step of this plan must be translated from a predicate representation 
to individual actions the robot can perform. So, unstack(b1, b2) might become something like 
[locate(b1), locate(b2), move-arm(near b1), grasp(b1), move-arm(rest-loc)]. Additionally, 
the Evaluate phase of MIDCA would have to determine not only whether goals had been achieved, 
but also if individual actions had in fact had their anticipated effects. If not, for example if the robot 
accidentally knocked down a block while trying to pick it up, MIDCA would have to not only 
replan, but also consider generating a goal to relearn or modify its grasping algorithm. 
The transition being discussed above will not be a trivial one, and the methods described are 
only examples of approaches we might take. Our overall purpose is the integration of a low-level, 
data driven approach with a high-level, knowledge rich one in an environment that is amenable to 
both. These approaches have complimentary strengths. Low-level approaches tend to be robust 
with respect to noise but not in the face of fundamental shifts or unexpected variables. In contrast, 
reasoning-based approaches can provide insight into how to accommodate or (as in the fire 
example) counteract major changes or surprises, but are often difficult to apply to complex, noisy 
data. Uniting the two methodologies in increasingly complex, realistic and interesting ways and 
leveraging the enormous work that has been done on each individually will be a major focus of our 
research going forward. 
Another area of future research is expanding and refining the capabilities of MIDCA?s memory 
structures. First, we have begun work on adding a concept of time, so that each piece of knowledge 
that MIDCA acquires would be tied to two times ? the time it was learned, and the range of times 
it applies to. Using this information, we could leverage work in active logic (see Elgot-Drapkin, 
Kraus, Miller, Nirkhe, Perlis, 1999; Anderson, Gomaa, Grant, Perlis, 2008) to allow MIDCA to 
make inferences about events in its past or future. For an autonomous agent that is expected to 
reason effectively about the consequences of its actions, the inclusion of temporal information in 
that reasoning is essential.  
This is especially true if such an agent is expected to deal with humans, who build implicit and 
explicit temporal constraints into everything. For example, consider a robot which is attempting to 
choose between two goals it has been given: ?Fetch Mary a soda? and ?Make sure Joe gets this 
important document before he leaves.? In this case, the robot can only make an appropriate decision 
about which to pursue first by considering the time. If it is noon and Joe usually leaves at 5 PM, it 
 
90
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
should probably fetch the soda first. Not only does it have plenty of time to deliver the document, 
but it may know that at noon, people have a high likelihood of being at lunch and unavailable to 
receive documents. Mary, on the other hand, might appreciate a soda with her lunch, and the time 
might give the robot a clue about where to start searching for her.  
A second useful addition to the capabilities of MIDCA?s memory would be an account of the 
sources of its knowledge. This is useful in instances in which an agent discovers that something it 
believed was actually false. For example, an agent told that mice were ten feet tall might infer that 
mice lived only outdoors and had no fear of cats. Upon seeing a mouse, it could still, without any 
extra information on knowledge sources, eliminate the original faulty fact from its knowledge base. 
However, unless a connection persisted between the original fact about mouse size and the 
conclusions that the agent inferred from it, it would have no way of eliminating those facts as well. 
In a world in which even humans are constantly wrong and need to readjust to new data, the ability 
to connect a faulty assumption to series of faulty conclusions that spring from it is valuable. 
In addition to expansions of MIDCA?s memory architecture, it may be desirable to update 
MIDCA?s goal generation method to counter the current method?s limitations of scale and the 
requirement for a training corpus which covers most situations. By making use of explanation 
patterns (XP?s), a goal generation scheme which is in part logical in nature can be constructed based 
on the concept of precondition negation. An anomalous situation depends on various preconditions, 
and these causal relations can be represented in an XP. Negating one or more of these preconditions 
can then remedy the anomaly. The question of how to determine which precondition(s) expressed 
in the XP to negate is non-trivial, will require extensive research, and is highly sensitive to the 
number and nature of XP?s available for examination. 
4.  Related Work 
An initial introspective cognitive agent called INTRO (Cox, 2007) combines planning and 
understanding within a Wumpus World environment (Russell & Norvig, 2003) by integrating the 
PRODIGY planning and learning architecture (Veloso, Carbonell, Perez, Borrajo, Fink, & Blythe, 
1995) with the Meta-AQUA story understanding and learning system (Cox & Ram, 1999). Rather 
than input all goals for the agent to achieve, the understanding component compares expected states 
and events in the world with those actually perceived to create an interpretation. When the 
interpretation discloses divergence from those expectations, INTRO generates its own goals to 
resolve the conflict. These new goals are then passed back to the PRODIGY component so that a 
plan can be generated and then executed. 
Work has been done to expand and better the capacities of agents by making use of goal 
manipulations. (Hanheide et al., 2010) created a framework for managing goals to be used by a 
robot exploring an unknown space which autonomously classifies rooms into categories. They ran 
the robot with and without the framework, and concluded that a framework for goal management 
increases the performance of the robot. Our work too demonstrates that by allowing an agent to 
dynamically alter its goal structure performance can be greatly improved. 
Schermerhorn, Benton, Scheutz, Talamadupula, & Kambhampati (2009) seek to use 
modification of a robot's goal structure to confront the challenges of a domain which is partially 
observable, non-deterministic, where prior knowledge about the domain is limited, knowledge 
acquisition is non-monotonic, planning is subject to real time constraints, and goals and utilities 
can dynamically change during execution. Counterfactuals are used to determine actions that could 
lead to goal opportunities, and when opportunities are detected, the goal structure can be modified.  
 
91
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
Other work has taken advantage of the GDA model which we use in our work. For example, 
Munoz-Avila, Jaidee, and Aha (2010) merged the GDA framework with case based reasoning 
(CBR) and ran a comparison between a GDA system using CBR, a rule based variant of GDA, and 
a non-GDA based agent. The CBR based GDA system outperformed the others, and functioned by 
making use of a case base that mapped goals to expectations and a case base that mapped 
mismatches to new goals. Our work as well shows that a GDA based system outperforms non-GDA 
based systems. 
The ARTUE (Autonomous Response to Unexpected Events) system (Molineaux, Klenk, & 
Aha, 2010) instantiates the GDA model. Its purpose is to be a domain independent autonomous 
agent with the capacity to dynamically determine which goals to pursue in unexpected situations. 
ARTUE uses hierarchical task networks for planning, takes advantage of explanations, and 
manages goals.  
5.  Conclusion 
Full implementation of the MIDCA architecture will be a large project. In this paper we have 
introduced  MIDCA_1.1, a working version of the cognitive layer of MIDCA in a very simple 
domain. Our experimental results to this point should be taken as illustrative rather than probative 
? they are intended to demonstrate the kind of problem we intend for MIDCA to be able to solve 
and the ways in which it might do so. We hope to build on the potential for goal driven autonomy 
to increase the flexibility and adaptability of agents in complex, changing environments. The 
demonstrated success of this method in our simple example should be a building block for similar 
work in more challenging domains.  
The second major feature of our method is the synergy between D-track and K-track 
approaches. We have described the use of data-driven techniques in anomaly detection (A-
 distance), neural networks (growing neural gas), and machine learning (Tilde; FOIL) as well as a 
predicate logic state representation and techniques for explanation generation (Meta-AQUA) and 
planning (SHOP2) that rely on such high level formalisms. Both high level and low level 
approaches to AI have been used with great success in their individual spheres. We believe that the 
integration of these approaches is one of the most promising opportunities in modern AI, and one 
of the central focuses of MIDCA.  
Another factor in the relevance of this project is the emergence of robots like the PR2 with 
advanced built-in capabilities in image processing, localization, locomotion and object 
manipulation. Two-way translation between raw input data and reasonable formalisms, and the 
integrated analysis of both levels of information to generate specific commands a robot can follow, 
are now, while still difficult, much more approachable tasks than they have been in the past. Our 
hope is that MIDCA can effect this translation. Our vision is of an agent whose autonomy and 
usefulness grows gradually as we increase both the quality of the algorithms governing each of its 
individual functions and the level of integration between them. This paper has presented the first 
iteration of that process.  
Acknowledgements 
This material is based upon work supported by ONR Grants # N00014-12-1-0430 and # N00014-
 12-1-0172 and by ARO Grant # W911NF-12-1-0471. We thank the anonymous reviewers for their 
comments and suggestions. 
 
92
 GOAL-DRIVEN AUTONOMY IN DYNAMIC ENVIRONMENTS  
References 
Anderson, M. L., Gomaa, W., Grant, J., & Perlis, D. (2008). Active logic semantics for a single 
agent in a static world. Artificial Intelligence, 172(8), 1045-1063. 
Anderson, M., & Perlis, D. (2005). Logic, self-awareness and self-improvement. Journal of Logic 
and Computation 15, 21?40. 
Blockeel, H., & De Raedt, L. (1997). Lookahead and discretisation in ILP. In N. Lavrac & S. 
D?eroski (Eds.), Proceedings of the seventh international workshop on inductive logic 
programming (pp. 77?84) LNAI: Vol. 1297. Berlin: Springer 
Burstein, M. H., Laddaga, R., McDonald, D., Cox, M. T., Benyo, B., Robertson, P., Hussain, T., 
Brinn, M., & McDermott, D. (2008). POIROT - Integrated learning of web service procedures. 
In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (pp. 1274-1279). 
Menlo Park, CA: AAAI Press. 
Cousins, S. (2010). ROS on the PR2 [ROS Topics]. IEEE Robotics & Automation Magazine, 
17(3):23-25. 
Cox, M. T. (2007). Perpetual self-aware cognitive agents. AI Magazine 28(1), 32-45. 
Cox, M. T. (in press). Question-based problem recognition and goal-driven autonomy. To appear 
in D. W. Aha, M. T. Cox, & H. Munoz-Avila (Eds.) , Proceedings of the 2013 Annual Conference 
on Advances in Cognitive Systems: Workshop on Goal Reasoning (Tech. Rep. No. CS-TR-5029). 
College Park, MD: University of Maryland, Department of Computer Science. 
Cox, M. T., & Burstein, M. H. (2008). Case-based explanations and the integrated learning of 
demonstrations. K?nstliche Intelligenz (Artificial Intelligence) 22(2), 35-38. 
Cox, M. T., Maynord, M., Paisner, M., Perlis, D., & Oates, T. (2013). The integration of cognitive 
and metacognitive processes with data-driven and knowledge-rich structures. In Proceedings of 
the Annual Meeting of the International Association for Computing and Philosophy.  
Cox, M. T., Oates, T., Paisner, M., & Perlis, D. (2012). Noting anomalies in streams of symbolic 
predicates using A-distance. Advances in Cognitive Systems 2, 167-184. 
Cox, M. T., Oates, T., Paisner, M., & Perlis, D. (2013). Detecting change in diverse symbolic 
worlds. In L. Correia, L. P. Reis, L. M. Gomes, H. Guerra, & P. Cardoso (Eds.), Advances in 
Artificial Intelligence, 16th Portuguese Conference on Artificial Intelligence (pp. 179-190). 
University of the Azores, Portugal: CMATI. 
Cox, M. T., Oates, T., & Perlis, D. (2011). Toward an integrated metacognitive architecture. In P. 
Langley (Ed.), Advances in Cognitive Systems: Papers from the 2011 AAAI Fall Symposium (pp. 
74-81). Technical Report FS-11-01. Menlo Park, CA: AAAI Press. 
Cox, M. T., & Ram, A. (1999). Introspective multistrategy learning: On the construction of learning 
strategies. Artificial Intelligence, 112, 1-55. 
Elgot-Drapkin, J., Kraus, S., Miller, M., Nirkhe, M., & Perlis, D. (1999). Active logics: A unified 
formal approach to episodic reasoning (Tech. Rep. No. CS-TR-4072). College Park, MD: 
University of Maryland, Department of Computer Science. 
Hanheide, M., Hawes, N., Wyatt, J., G?belbecker, M., Brenner, M., Sj??, K., & Aydemir, A.  
(2010). A framework for goal generation and management. AAAI Workshop on Goal-Directed 
Autonomy. 
 
93
M. PAISNER, M. MAYNORD, M. T. COX, & D. PERLIS 
Klenk, M., Molineaux, M., & Aha, D. (2013). Goal-driven autonomy for responding to unexpected 
events in strategy simulations. Computational Intelligence, 29(2), 187?206, 2013. 
Lee, P., & Cox, M. T. (2002). Dimensional indexing for targeted case-base retrieval: The SMIRKS 
system. In S. Haller & G. Simmons (Eds.), Proceedings of the 15th International FLAIRS 
Conference (pp. 62-66). Menlo Park, CA: AAAI Press. 
Lenat, D., & Guha, R. (1989). Building large knowledge-based systems. Menlo Park, CA: Addison-
 Wesley. 
Maynord, M., Cox, M. T., Paisner, M., & Perlis, D. (2013). Data-driven goal generation for 
integrated cognitive systems. In C. Lebiere & P. S. Rosenbloom (Eds.), Integrated Cognition: 
Papers from the 2013 Fall Symposium (pp. 47-54). Menlo Park, CA: AAAI Press.  
Molineaux, M., Klenk, M., Aha, D. (2010). Goal-driven autonomy in a Navy strategy simulation. 
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. Menlo Park, CA: 
AAAI Press. 
Munoz-Avila, H., Jaidee, U., Aha, D. (2010). Goal-Driven autonomy with case-based reasoning. 
In I. Bichindaritz & S. Montani (Eds.), Proceedings of the 18th international conference on Case-
 Based Reasoning Research and Development (ICCBR'10) (pp. 228-241). Berlin: Springer. 
Munoz-Avila, H., Jaidee, U., Aha, D. W., Carter, E. (2010). Goal-driven autonomy with case-based 
reasoning. In Case-Based Reasoning. Research and Development, 18th International Conference 
on Case-Based Reasoning, ICCBR 2010 (pp. 228-241). Berlin: Springer. 
Nau, D., Au, T., Ilghami, O., Kuter, U., Murdock, J., Wu, D., & Yaman, F. (2003). SHOP2: An 
HTN planning system. Journal of Artificial Intelligence Research 20, 379?404 
Paisner, M., Perlis, D., & Cox, M. T. (2013). Symbolic anomaly detection and assessment using 
growing neural gas. In Proceedings of the 25th IEEE International Conference on Tools with 
Artificial Intelligence (pp. 175-181). Los Alamitos, CA: IEEE Computer Society. 
Quinlan, J. (1993). Programs for machine learning. San Francisco, CA: Morgan Kaufmann. 
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning 5, 239-266. 
Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2ed). NJ: Prentice. 
Schank, R. C. (1986). Explanation patterns: Understanding mechanically and creatively. Hillsdale, 
NJ: Lawrence Erlbaum Associates.  
Schank, R. C., Kass, A., & Riesbeck, C. K. (1994). Inside case-based explanation. Hillsdale, NJ: 
Lawrence Erlbaum Associates. 
Schermerhorn, P., Benton, J., Scheutz, M., Talamadupula, K., Kambhampati, S. (2009). Finding 
and exploiting goal opportunities in real-time during plan execution. In Proceedings of the 2009 
IEEE/RSJ international conference on Intelligent robots and systems (IROS'09) (pp. 3912-3917). 
Piscataway, NJ: IEEE Press.  
Veloso, M., Carbonell, J. G., Perez, A., Borrajo, D., Fink, E., & Blythe, J. (1995). Integrating 
planning and learning: The PRODIGY architecture. Journal of Theoretical and Experimental 
Artificial Intelligence, 7(1), 81-120. 
 
94
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning
 Hierarchical Goal Networks and Goal-Driven Autonomy: Going
 where AI Planning Meets Goal Reasoning
 Vikas Shivashankar SVIKAS@CS.UMD.EDU
 Department of Computer Science, University of Maryland, College Park, MD 20742
 Ron Alford RONWALF@CS.UMD.EDU
 Department of Computer Science, University of Maryland, College Park, MD 20742
 Ugur Kuter UKUTER@SIFT.NET
 SIFT, LLC, 211 North 1st Street, Suite 300, Minneapolis, MN 55401-2078
 Dana Nau NAU@CS.UMD.EDU
 Department of Computer Science and Institute for Systems Research, University of Maryland, Col-
 lege Park, MD 20742
 Abstract
 Planning systems are typically told what goals to pursue and cannot modify them. Some methods
 (e.g., for contingency planning, dynamic replanning) can respond to execution failures, but usually
 ignore opportunities and do not reason about the goals themselves. Goal-Driven Autonomy (GDA)
 relaxes some common assumptions of classical planning (e.g., static environments, fixed goals, no
 unpredictable exogenous events). This paper describes our Hierarchical Goal Network formalism
 and algorithms that we have been working on for some time now and discusses how this particular
 planning formalism relates to GDA and may help address some of GDA?s challenges.
 1. Motivation
 In the real world, intelligent agents typically have to operate in environments that are open-world,
 partially observable and dynamic. Therefore, pure offline planning or even online plan-repair isn?t
 sufficient to handle unanticipated situations in such environments: external events often necessitate
 postponing or even abandoning your current goals in order to pursue newer goals of higher priority.
 Goal-driven autonomy (GDA) (Molineaux, Klenk, & Aha, 2010) is a conceptual model of such a
 goal reasoning process.
 As one can imagine, enabling agents to operate as described above requires advanced planning
 capabilities, in particular capabilities that are not well supported by the current state of the art
 automated planning formalisms and algorithms. In particular, most existing planning approaches
 assume inactive goals; i.e., goals for planning problems are specified at the start of the planning
 process and never change during the course of that process. Reasoning about active goals and
 95
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 planning for them is a core foundation for GDA, and we believe, there should be more research on
 integrating AI planning techniques with active goal reasoning.
 In this paper, we will firstly pin down some key capabilities for active goal reasoning in planning
 that would be useful in the GDA architecture. We will then describe our ongoing research on
 a knowledge-based planning formalism called Hierarchical Goal Network (HGN) planning that
 makes first steps towards achieving some of these capabilities. HGN planning already supports
 some of the features desirable from a GDA standpoint such as:
  Specification of complex domain-specific knowledge as HGN methods These are similar
 to methods in Hierarchical Task Network (HTN) planning (Erol, Hendler, & Nau, 1994), but
 decompose goals into subgoals (instead of tasks into subtasks). When multiple methods can
 be used to achieve the given goal, we can use domain-independent heuristics to automatically
 detect the most promising methods.
  Planning with Incomplete Models. Knowledge-based planning formalisms are often too de-
 pendent on the input domain-specific knowledge; in particular, any missing/incorrect knowl-
 edge can break the planner. HGN planning on the other hand allows specification of arbitrary
 amounts of planning knowledge: it uses the given knowledge whenever applicable and falls
 back to domain-independent planning techniques including landmark reasoning (Richter &
 Westphal, 2010) and action chaining to fill in the gaps.
  Managing Agenda of Goals. In the GDA model, managing the agenda of goals and deciding
 which goals to achieve next is outside the purview of the planner. However, generating plans
 for each candidate goal and comparing the results is often the best way to decide which goal
 should be pursued next. The core data structure in HGN planning is a goal network, which
 captures precisely this functionality.
 HGNs also have a potential to help achieve more advanced planning functionalities including:
  Goal-based Plan Repair. In dynamic environments, it is hard for the domain author to antici-
 pate and provide knowledge for all the various scenarios that the agent could face. Since HGN
 planning is robust to incomplete and incorrect domain models, it is much better equipped to
 plan in unanticipated situations for which the user has not provided any domain knowledge.
  Temporal and Cost-Aware Planning. Since HGN planning uses goals, it is easier to adapt
 techniques from domain-independent planning literature related to cost-aware planning and
 temporal planning than in HTN planning.
 Planning algorithms in the GDA model ought to be able to continuously assess their current
 situation, recognize and reason about both serendipitous and harmful events that might change
 the world around them exogenously, adapt their goals in accordance with these events, and finally
 generate plans to achieve these goals. HGNs provide a formal foundation for developing algorithms
 to provide these capabilities.
 The remainder of this paper is structured as follows. Section 2 provides some background on
 GDA. Section 3 presents the HGN planning formalism (Shivashankar et al., 2012). Section 4 then
 96
HIERARCHICAL GOAL NETWORKS AND GDA
 reviews Goal Decomposition Planner (GDP), the algorithm we proposed to solve HGN planning
 problems in (Shivashankar et al., 2012). Section 5 then presents the Goal Decomposition with
 Landmarks (GoDeL) algorithm, an extension of GDP that can work with partial domain-specific
 knowledge which we proposed in (Shivashankar et al., 2013). Section 6 discusses the potential role
 of HGNs in providing planning capabilities within GDA. Finally, we conclude in Section 7.
 2. GDA Preliminaries
 Figure 1. Conceptual Model of GDA (Molineaux, Klenk, & Aha, 2010)
 Figure 1 shows a schematic representation of the GDA model. It consists of three components:
  Planner This module takes in the current planning problem P = (M ; sc; gc), where M 
is the current model of the environment and sc; gc respectively are the current state and goal,
 and computes a plan p for the agent to execute. It also returns a sequence of expectations
 Xc = [xc; : : : ; xc+n] which models constraints on the states [sc; : : : ; sc+n] that the planner
 expects the agent to pass through when executing p.
  Controller This takes the plan p and sends the actions one by one to the environment  and
 processes the resulting state observations as follows:
 ? Discrepancy Detector compares actual state observations with its expectations and gen-
 erates a set of discrepancies D,
 ? Explanation Generator hypothesizes one or more explanations E for D,
 ? Goal Formulator generates zero or more goals G in response to E,
 97
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 ? Goal Manager updates Gpend, the set of pending goals, with G and optionally changes
 the set of active goals to be given to the planner.
  Environment This takes an action and executes it in the current state and returns observations
 from the resulting state.
 From this model, we can infer some key capabilities that the Planner module has to support:
  Compatibility with Goals Since problems in GDA are modeled as goals that need to be
 achieved, planners need to be able to solve goal achievement problems. In addition, planners
 should also support complex goal representations such as open-world quantified goals (Tala-
 madupula et al., 2010) and maintenance-style goals such as in Linear Temporal Logic (LTL).
  Plan Generation Given a planning problem, the planner should be able to compute plans
 efficiently.
  Online Plan Repair In case exogenous events break the current plan or necessitate change in
 goals, the planner should be able to repair the current plan.
  Online Plan Optimization The planner should continuously optimize the current plan dur-
 ing execution as more time is given. Additionally, It should be able to take advantage of
 serendipitous events to improve the quality of the plan.
  Goal Management In the original GDA model, the goal management is done outside of the
 planner. However, it is often hard to decide which goal to plan for next among multiple alter-
 natives without actually planning for these goals and comparing the plan qualities. Therefore,
 we believe that managing the set of goals and deciding which goal to pursue next should also
 be done by the planner.
 2.1 Evaluating Suitability of Domain-Independent Planning for GDA
 Domain-Independent planning (Ghallab, Nau, & Traverso, 2004) algorithms in principle fits very
 well into the GDA model since they work with goals. However, since they cannot exploit any
 domain-specific knowledge outside of the action models, the performance of these planners rarely
 scales in complex, real-world domains. Moreover, most domain-independent planners follow as-
 sumptions from classical planning which restrict goals to conjunctive boolean formulas.
 There exists some work on augmenting domain-independent planners with plan repair capabil-
 ities (Fox et al., 2006; Hammond, 1990) which perform better than simple replanning strategies.
 There also exist anytime planning algorithms such as LAMA (Richter & Westphal, 2010) which
 compute initial solutions and then proceed to optimize these solutions if given additional time.
 However, similar scalability problems exist as with the plan generation algorithms.
 2.2 Evaluating Suitability of Hierarchical Task Network Planning for GDA
 HTN planning (Erol, Hendler, & Nau, 1994; Nau et al., 2003), in contrast to domain-independent
 planning, models planning problems as accomplishment of complex tasks. Another difference
 98
HIERARCHICAL GOAL NETWORKS AND GDA
 from domain-independent planning is that HTN planning takes as input additional domain-specific
 knowledge in the form of HTN methods which provide ways to decompose tasks into simpler sub-
 tasks. HTN planning algorithms thus can be finely tuned to achieve much better performance,
 especially in complex domains. However, HTN planning algorithms are extremely sensitive to the
 correctness and completeness of the input method set (Shivashankar et al., 2012): missing/incorrect
 methods can result in the algorithm not finding valid solutions1.
 Moreover, while there were a number of attempts to combine HTN planning with domain-
 independent planning (Kambhampati, Mali, & Srivastava, 1998; Mccluskey, 2000; Biundo & Schat-
 tenberg, 2001; Alford, Kuter, & Nau, 2009; Gerevini et al., 2008) to work with incomplete HTN
 method sets, the lack of correspondence between tasks and goals has interfered with these efforts
 necessitating several ad hoc modifications and restrictions.
 The disconnect between tasks and goals has also interfered with efforts to augment HTN plan-
 ning with plan repair capabilities (Ayan et al., 2007; Warfield et al., 2007). (Hawes, 2001; Hawes,
 2002) provide HTN planning algorithms that work in an anytime manner. These algorithms how-
 ever, instead of generating a concrete solution and then iteratively improving it in the remaining
 time, return a plan at varying levels of abstraction based on when the planner is interrupted; this
 limits its usefulness since applications typically require a concrete plan that is executable.
 3. Hierarchical Goal Networks
 As Sections 2.1 and 2.2 indicate, domain-independent planning and HTN planning formalisms have
 different (and in fact, complementary) advantages and disadvantages. Therefore, we attempted to
 come up with a hybrid formalism that is hierarchical, but is based on goals rather than tasks. This
 idea is not new; SIPE (Wilkins, 1984) and PRS (Georgeff & Lansky, 1987; Ingrand & Georgeff,
 1992) both use goal decomposition techniques in order to build planning systems. Instead, our
 contributions lie in (1) building a formal theory and analyzing properites of a hierarchical goal-based
 planning framework, and (2) exploiting connections between HGNs and techniques in domain-
 independent planning such as state-space heuristics and landmark reasoning. Below we formalize
 HGN planning.
 Classical planning. Following Ghallab, Nau and Traverso (2004), we define a classical plan-
 ning domainD as a finite state-transition system in which each state s is a finite set of ground atoms
 of a first-order language L, and each action a is a ground instance of a planning operator o. A
 planning operator is a triple o = (head(o); precond(o); effects(o)), where precond(o) and effects(o)
 are sets of literals called o?s preconditions and effects, and head(o) includes o?s name and argument
 list (a list of the variables in precond(o) and effects(o)).
 An action a is executable in a state s if s j= precond(a), in which case the resulting state
 is  (a) = (s  effects (a)) [ effects+(a), where effects+(a) and effects (a) are the atoms and
 negated atoms, respectively, in effects(a). A plan  = ha1; : : : ; ani is executable in s if each ai is
 executable in the state produced by ai 1; and in this case we let  (s;  ) be the state produced by
 executing the entire plan.
 1. For example, SHOP (Nau et al., 2001) was disqualified from the 2000 International Planning Competition because
 of an error made by SHOP?s authors when they wrote their HTN version of one of the planning domains.
 99
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 Method for using truck ?t to move crate ?o
 from location ?l1 to location ?l2 in city ?c:
 Head: (move-within-city ?o ?t ?l1 ?l2 ?c)
 Pre: ((obj-at ?o ?l1) (in-city ?l1 ?c)
 (in-city ?l2 ?c) (truck ?t ?c) (truck-at ?t ?l3))
 Sub: ((truck-at ?t ?l1) (in-truck ?o ?t)
 (truck-at ?t ?l2) (obj-at ?o ?l2)))
 Method for using airplane ?plane to move crate ?o
 from airport ?a1 to airport ?a2:
 Head: (move-between-airports ?o ?plane ?a1 ?a2)
 Pre: ((obj-at ?o ?a1) (airport ?a1)
 (airport ?a2) (airplane ?plane))
 Sub: ((airplane-at ?plane ?a1) (in-airplane ?o ?plane)
 (airplane-at ?plane ?a2) (obj-at ?o ?a2)))
 Method for moving ?o from location ?l1 in city ?c1
 to location ?l2 in city ?c2, via airports ?a1 and ?a2:
 Head: (move-between-cities ?o ?l1 ?c1 ?l2 ?c2 ?a1 ?a2)
 Pre: ((obj-at ?o ?l1) (in-city ?l1 ?c1) (in-city ?l2 ?c2)
 (different ?c1 ?c2) (airport ?a1) (airport ?a2)
 (in-city ?a1 ?c1) (in-city ?a2 ?c2))
 Sub: ((obj-at ?o ?a1) (obj-at ?o ?a2) (obj-at ?o ?l2)))
 Figure 2. HGN methods for transporting a package to its goal location in the Logistics domain.
 A classical planning problem is a triple P = (D; s0; g), whereD is a classical planning domain,
 s0 is the initial state, and g (the goal formula) is a set of ground literals. A plan  is a solution for
 P if  is executable in s0 and  (s0;  ) j= g.
 HGN planning. An HGN method m has a head head(m) and preconditions precond(m) like
 those of a planning operator, and a sequence of subgoals subgoals(m) = hg1; : : : ; gki, where each
 gi is a goal formula (a set of literals). We define the postcondition of m to be post(m) = gk if
 subgoals(m) is nonempty; otherwise post(m) = precond(m). See Figure 2 for an example set of
 HGN methods for the Logistics domain.
 An action a (or grounded method m) is relevant for a goal formula g if effects(a) (or post(m),
 respectively) entails at least one literal in g and does not entail the negation of any literal in g.
 Some notation: if  1; : : : ;  n are plans or actions, then  1  : : :   n denotes the plan formed by
 concatenating them.
 An HGN planning domain is a pair D = (D0;M), where D0 is a classical planning domain and
 M is a set of methods.
 100
HIERARCHICAL GOAL NETWORKS AND GDA
 A goal network is a way to represent the objective of satisfying a partially ordered sequence of
 goals. Formally, it is a pair gn = (T; ) such that:
  T is a finite nonempty set of nodes;
  each node t 2 T contains a goal gt that is a DNF (disjunctive normal form) formula over
 ground literals;
   is a partial order over T .
 An HGN planning problem is a triple P = (D; s0; gn), where D is a planning domain, s0 is the
 initial state, and gn = (T; ) is a goal network.
 Definition 1. The set of solutions for P is defined as follows:
 Case 1. If T is empty, the empty plan is a solution for P .
 Case 2. Let t be a node in T that has no predecessors. If s0 j= gt, then any solution for P 0 =
 (D; s0; (T 0; 0)) is also a solution for P , where T 0 = T  ftg, and  0 is the restriction of  
to T 0.
 Case 3. Let a be any action that is relevant for gt and executable in s0. Let  be any solution to the
 HGN planning problem (D;  (s0; a); (T 0; 0)). Then a   is a solution to P .
 Case 4. Let m be a method instance that is applicable to s0 and relevant for gt and has subgoals
 g1; : : : ; gk. Let  1 be any solution for (D; s0; g1); let  i be any solution for (D;  (s0; ( 1  
: : :  i 1)); gi), i = 2; : : : ; k; and let  be any solution for (D;  (s0; ( 1 : : :  k)); (T 0; 0)).
 Then  1   2  : : :   k   is a solution to P .
 In the above definition, the relevance requirements in Cases 2 and 3 prevent classical-style
 action chaining unless each action is relevant for either the ultimate goal g or a subgoal of one of the
 methods. This requirement is analogous to (but less restrictive than) the HTN planning requirement
 that actions cannot appear in a plan unless they are mentioned explicitly in one of the methods. As
 in HTN planning, it gives an HGN planning problem a smaller search space than the corresponding
 classical planning problem.
 4. Goal Decomposition Planner
 Algorithm 1 is GDP, our HGN planning algorithm. It works as follows:
 In Line 3, if gn is empty then the goal has been achieved, so GDP returns  . Otherwise, GDP
 selects a goal g in gn without any predecessors (Line 4). If g is already satisfied, GDP removes g
 from gn and calls itself recursively on the resulting goal network.
 In Lines 7-8, if no actions or methods are applicable to s and relevant for g, then GDP returns
 failure. Otherwise, GDP nondeterministically chooses an action/method u from U .
 If u is an action, then GDP computes the next state  (s; u) and appends u to  . Otherwise u is
 a method, so GDP inserts u?s subgoals at the front of g. Then GDP calls itself recursively on gn.
 101
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 Algorithm 1: A high-level description of GDP. Initially, D is an HGN planning domain, s is
 the initial state, gn is the goal network, and  is hi, the empty plan.
 Procedure GDP(D; s; gn;  )1
 begin2
 if gn is empty then return  3
 g  goal formula in gn with no predecessors4
 if s j= g then5
 remove g from gn and return GDP(D; s; gn;  )6
 U  factions and method instances that are relevant for g and applicable to sg7
 if U = ; then return failure8
 nondeterministically choose u 2 U9
 if u is an action then append u to  and set s  (s; u)10
 else insert subgoals(u) as predecessors of g in gn11
 return GDP(D; s; gn;  )12
 end13
 0.01?
 2?
 400?
 0? 20? 40? 60? 80? 100?
 Time?in?se
 conds
 ?
 Number?of?loca9ons?per?city?
 GDP?and?GDP?r? GDP?h? SHOP2? FF?
 2?
 4?
 6?
 8?
 0? 20? 40? 60? 80? 100?
 Plan?Len
 gth
 ?
 Number?of?loca9ons?per?city?
 GDP?and?GDP?r? GDP?h? SHOP2? FF?
 Figure 3. Average running times (in logscale) and plan lengths in the 3-City Routing domain, as a function
 of the number of locations per city. Each data point is an average of 25 randomly generated problems. FF
 couldn?t solve problems involving more than 60 locations while GDP and SHOP2 could not solve problems
 with more than 10 locations. SHOP2-r could not solve a single data point, because of which it was excluded
 from the figure.
 Experimental Results. To evaluate the performance of GDP, we compared it with the state-of-
 the-art HTN planner SHOP2 (Nau et al., 2003) and the domain-independent planner FF (Hoffmann
 & Nebel, 2001) across five domains. GDP-h represents GDP enhanced with a variant of the re-
 laxed planning graph heuristic used in FF. GDP-r and SHOP2-r are identical to GDP and SHOP2
 respectively, but with the input method sets randomly reordered. We did this to investigate the ef-
 fect of methods being provided in the wrong order on the planner?s performance. Here we present
 results from only one domain: One-City Routing; the complete experimental study is detailed in
 (Shivashankar et al., 2012).
 We constructed the 3-City Routing domain in order to examine the performance of the planners
 in a domain with a weak domain model. In this domain, there are three cities c1, c2 and c3, each
 containing n locations internally connected by a network of randomly chosen roads. In addition,
 102
HIERARCHICAL GOAL NETWORKS AND GDA
 there is one road between a randomly chosen location in c1 and a randomly chosen location in c2,
 and similarly another road between locations in c2 and c3. The problem is to get from a location in
 c1 or c3 to a goal location in c2.
 We randomly generated 25 planning problems for each value of n, with n varying from 10 to
 100. For the road networks, we used near-complete graphs in which 20% of the edges were removed
 at random. Note that while solutions to such problems are typically very short, the search space has
 extremely high branching factor, i.e. of the order of n. For GDP and GDP-h, we used a single HGN
 method, shown here as pseudocode:
  To achieve at(b)
 precond: at(a), adjacent(c; b)
 subgoals: achieve at(c) and then at(b)
 By applying this method recursively, the planner can do a backward search recursively from the
 goal location to the start location.
 To accomplish the same backward search in SHOP2, we needed to give it three methods, one
 for each of the following cases: (1) goal location same as the initial location, (2) goal location one
 step away from the initial location, and (3) arbitrary distance between the goal and initial locations.
 As Figure 3 shows, GDP and SHOP2 did not solve the randomly generated problems except the
 ones of size 10, returning very poor solutions and taking large amounts of time in the process. GDP-
 h, on the other hand solved all the planning problems quickly, returning near-optimal solutions. The
 reason for the success of GDP-h is that the domain knowledge specified above induce an unguided
 backward search in the state space and the planner uses the domain-independent heuristic to select
 its path to the goal.
 GDP-r performed at par with GDP since there was only one method, and hence reordering did
 not affect it. SHOP2-r, on the other hand, could not solve a single problem, thus illustrating the
 high sensitivity to the correctness of not only the methods, but also the order in which they are
 provided. We believe that GDP is also similarly sensitive to the method order; on the other hand,
 GDP-h, due its domain-independent heuristic-based method ordering technique, is agnostic to the
 method ordering provided by the domain author.
 FF was able to solve all problems up to n = 60 locations, after which it could not even complete
 parsing the problem file. This has to do with FF grounding all the actions right in the beginning,
 which it could not do for the larger problems.
 Domain Authoring. When writing the domain models for our experiments, it seemed to us that
 writing the GDP domain models was easier than writing the SHOP2 domain models?so we made
 measurements to try to verify whether this subjective impression was correct. Figure 4 compares
 the sizes of the HGN and HTN domain descriptions of the planning domains. In almost all of them,
 the domain models for GDP were much smaller than those for SHOP2. This is the case because the
 HGN task and method semantics obviates the need for a plethora of extra methods and bookkeeping
 operations needed in HTN domain models (see (Shivashankar et al., 2012) for a more detailed
 explanation).
 103
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 239? 171?
 360? 311?
 39?
 1120?
 342? 268?
 1279?
 162? 87?
 2138?
 0?
 1000?
 2000?
 Logis0cs? Blocks?
 World?
 Depots? Towers?
 of?Hanoi??
 3?City?
 Rou0ng?
 TOTAL?
 Number?of?Lisp?
 symbols
 ?
 GDP?
 SHOP2?
 Figure 4. Sizes (number of Lisp symbols) of the GDP and SHOP2 domain models.
 5. The GoDeL Planning Formalism and Algorithm
 GDP is similar to SHOP/SHOP2 in that it is sound and complete only with respect to the methods
 provided to it. It therefore provides no completeness guarantees if the HGN methods are incomplete.
 GoDeL is an extension of GDP that does provide completeness guarantees even if given incomplete
 or empty method sets. It accomplishes this by interleaving method decomposition with subgoal
 inference using landmarks and action chaining.
 Explanation of pseudocode. Algorithm 2 is the GoDeL planning algorithm. It takes as input a
 planning problem P = (D; s;G), a set of methods M , and the partial plan  generated so far.
 Lines 3 ? 6 specify the base cases of GoDeL. If these are not satisfied, the algorithm nonde-
 terministically chooses a goal g with no predecessors and generates U , all method and operator
 instances applicable in s and relevant to g. It then nondeterministically chooses a u 2 U to progress
 the search (Lines 7 ? 9). If u is an action, the state is progressed to  (s; u) (Line 10). Else, the sub-
 goals of u are added to G, adding edges to preserve the total order imposed on subgoals(u) (Line
 12). In either case, GoDeL is invoked recursively on the new planning problem (Lines 12 and 14).
 If GoDeL fails to find a plan in the previous step, it then attempts to infer a partially ordered set of
 subgoals lm that are to be achieved enroute to achieving g using the Infer-Subgoals procedure (Line
 15). The subgoals are added to G, adding edges to preserve the partial order in lm. The algorithm
 is then recursively invoked on the new planning problem (Line 17). If this call also returns failure,
 then the algorithm falls back to action chaining (Lines 19 ? 22), returning failure if no actions are
 applicable in s.
 Below, we shall motivate and describe the Infer-Subgoals procedure in greater detail.
 Subgoal inference. Sometimes, the planner may have methods that tell how to solve some
 subproblems, but not the top-level problem. For instance, the first method in Figure 2 would not
 104
HIERARCHICAL GOAL NETWORKS AND GDA
 Algorithm 2: A nondeterministic version of GoDeL. Initially, (D; s;G) is the planning prob-
 lem, M is a set of methods, and  is hi, the empty plan.
 Procedure GoDeL(D; s;G;M;  )1
 begin2
 if G is empty then return  3
 nondeterministically choose a goal formula g in G without any predecessors4
 if s j= g then5
 return GoDeL(D; s;G fgg;M;  )6
 U  foperator and method instances applicable to s and relevant to gg7
 while U is not empty do8
 nondeterministically remove a u from U9
 if u is an action then10
 res1 GoDeL(D;  (s; u); G;M;   u)11
 else12
 res1 GoDeL(D; s; subgoals(u)  G;M;  )13
 if res1 6= failure then return res114
 lm Infer-Subgoals(D; s; g)15
 if lm 6= ; then16
 res2 GoDeL(D; s; lm  G;M;  )17
 if res2 6= failure then return res218
 A  {operator instances applicable to s}19
 if A = ; then return failure20
 nondeterministically choose an a 2 A21
 return GoDeL(D;  (s; a); G;M;   a)22
 end23
 be applicable to problems involving transporting packages across cities, but it is applicable to the
 subproblems of moving the package between the start and goal locations and the corresponding
 airports. A natural question that then arises is the following: How can we automatically infer these
 subproblems for which the given methods are relevant?
 To answer the above question, we use landmarks. A landmark for a planning problem P (Hoff-
 mann, Porteous, & Sebastia, 2004; Richter & Westphal, 2010) is a fact that is true at some point
 in every plan that solves P . A landmark graph is a directed graph whose nodes are landmarks
 and edges denote orderings between these landmarks. Therefore, if there is an edge between two
 landmarks li and lj , this implies that li is true before lj in every solution to P .
 Therefore, a landmark for a planning problem P can be thought of as a subgoal that every
 solution to P must satisfy at some point. We can, as a result, use any landmark generation algorithm
 (for example, (Hoffmann, Porteous, & Sebastia, 2004; Richter & Westphal, 2010)) to automatically
 infer subgoals (and orderings between them) for which the given methods are relevant.
 105
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 Algorithm 3: Procedure to deduce possible subgoals for GoDeL to use. It takes as input a
 planning problem P = (D; s; g), and outputs a DAG of subgoals. It uses LMGEN, a landmark
 generation algorithm that takes P and generates a DAG (V;E) of landmarks for it.
 Procedure Infer-Subgoals (D; s; g)1
 begin2
 (V;E) LMGEN(D; s; g)3
 L fv 2 V : s 6j= fact(v); g 6j= fact(v) and 9 a method m 2 D s.t. goal(m) is relevant4
 to fact(v)g
 if L = ; then return ;5
 EL  f(u; v) : u; v 2 L and v is reachable from u in (V;E)g6
 return (L;EL)7
 end8
 Algorithm 3 is Infer-Subgoals, the subgoal inference procedure. It uses a landmark generation
 procedure LMGEN (such as ones in (Richter & Westphal, 2010; Hoffmann, Porteous, & Sebastia,
 2004)) that takes as input a classical planning problem P and generates a DAG of landmarks.
 Infer-Subgoals begins by computing LM_graph, the landmark graph for the input problem P .
 It then computes L, the set of nodes in LM_graph that have relevant methods (Line 4). It does not
 consider trivial landmarks such as literals true in the state s or part of the goal g. EL is the set of all
 edges between landmarks li; ij 2 L such that there exists a path from li to lj in LM_graph (Line 6).
 Infer-Subgoals returns the resulting partial order lm = (L;EL).
 Experimental Results. The main purpose of this experimental study was to investigate the
 performance of GoDeL with varying amounts of domain knowledge. In particular, we hypothesized
 that GoDeL would (1) perform at par with state-of-the-art domain-independent planners when given
 low/no knowledge, (2) improve performance as more knowledge is given, and finally (3) perform at
 par with state-of-the-art hierarchical planners when given complete knowledge.
 To verify this hypothesis, we compared (1) GDP-h, the heuristic enhanced version of GDP from
 Section 4, (2) the state of the art domain-independent planner LAMA (Richter & Westphal, 2010)
 and (3) GoDeL run with three different amounts of domain knowledge: GoDeL with complete
 knowledge (GoDeL-C), moderate knowledge (GoDeL-M ), and low knowledge (GoDeL-L). We
 compared these planners across three domains; we present the results only for Depots. The full
 experimental study is reported in (Shivashankar et al., 2013).
 The experimental results on the Depots domain are consistent with our hypotheses. As shown
 in Figure 5, GoDeL-C has the best running times in Depots, and is the only planner to solve all
 of the problems. Even GoDeL-M significantly outperformed GDP-h, which had access to the full
 method set. GoDeL-L, which was given only one Blocks-World method, solved significantly fewer
 problems. LAMA solved the fewest problems, having not solved any problems containing 24
 blocks or more. With respect to plan lengths, GoDeL-C, GoDeL-M and GDP-h produced plans
 of similar lengths for the problems they could solve. GoDeL-L and LAMA however generate
 significantly longer plans for the problems they could solve.
 106
HIERARCHICAL GOAL NETWORKS AND GDA
 10.0?
 100.0?
 1000.0?
 0? 16? 32? 48? 64? 80?
 Plan?len
 gth
 ?
 Problem?Size?(number?of?packages)?
 GoDeL?C? GoDeL?M? GoDeL?L? LAMA? GDP?h?
 0.1?
 1.0?
 10.0?
 100.0?
 1000.0?
 0? 16? 32? 48? 64? 80?
 Planning?0me?in?se
 conds
 ?
 Problem?Size?(number?of?packages)?
 GoDeL?C? GoDeL?M? GoDeL?L? LAMA? GDP?h?
 Figure 5. Average plan lengths and running times (both in logscale) in the Depots domain, as a function of
 the number of packages to be delivered. Each data point is an average of 25 randomly generated problems.
 GoDeL-M could not solve 80-package problems; GDP-h, GoDeL-L and LAMA could not solve problems
 of size > 64, 32 and 16 respectively.
 6. Role of HGNs in GDA
 We presented the HGN planning formalism in Section 3 and two HGN planning algorithms in
 Sections 4 and 5. However, the question that still remains to be answered is: Why use HGNs in
 GDA? In this section, we address this by discussing the main features of HGN planning algorithms
 and how they help address some of the planning challenges posed in GDA. We shall also discuss
 how HGN planning is in a unique position to solve some of the more advanced planning problems
 in goal reasoning.
 6.1 Modeling Domain-Specific Knowledge using HGN Methods
 One of the strengths of using HTNs is that one can model complicated pieces of knowledge as HTN
 methods, thus not only speeding up search but also enabling fine-tuning of the types of solutions
 you want the planner to return. HGNs also provide the same amount of control, but with the added
 benefit of retaining the semantics of goals in the methods (see (Shivashankar et al., 2012) for a
 discussion about incompatibilities between task and goal semantics). We show in (Shivashankar
 et al., 2012) that in fact HGNs are equal in expressivity to total-order HTN planning, thus indicating
 that one will not lose any expressivity when moving from HGNs to HTNs.
 6.2 Managing Agenda of Goals using HGNs
 In the GDA model, managing the agenda of goals and deciding which goals to achieve next is
 outside the purview of the planner. As we mentioned previously, sometimes generating plans for
 the candidate goals and comparing them is often the best way to decide which goal should be
 pursued next. Moreover, certain goal orderings work much better than others; determining a feasible
 order in which goals should be achieved is a problem the planner can help solve. HGNs provide
 a framework to do this reasoning: since problems are modeled as (state; goalnetwork) pairs, the
 agenda of goals can be managed within the planner as a goal network and the planner could use its
 107
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 heuristics in combination with any other domain-specific goal choice strategy to choose next goals.
 In particular, the definition of a goal network could be extended to include utilities over the goals,
 which could even change over time (and thus potentially trigger replanning). (Wilson, Molineaux,
 & Aha, 2013) adopt a similar approach of letting the planner evaluate candidate next goals in a
 domain-independent fashion in the context of HTNs. They do not, however, factor in implications
 that achieving a particular goal might have on the (un)solvability of another goal downstream.
 6.3 Seamless Use of Incomplete HGN Domain Models
 As mentioned previously, one of the main criticisms of HTN planning is the brittleness of the plan-
 ners due to the over-reliance on the correctness and completeness of HTN methods. GoDeL on
 the other hand can work with any arbitrary sets of HGN methods. This is possible mainly due to
 goals in HGN methods which makes it immediately compatible with domain-independent planning.
 Therefore, domain engineers can provide domain knowledge only for the most complicated parts of
 the domain and rely on GoDeL to figure out the rest, thus easing the burden of domain authoring.
 This becomes even more crucial in complex real-world domains where its unrealistic to assume that
 a complete suite of domain knowledge can be made available to the planner.
 6.4 Plan Repair in Dynamic Environments
 The domain authoring issue is further exacerbated in dynamic environments since other agents,
 exogenous events, and erroneous sensor and action models can bring about unanticipated situations
 which the domain author cannot anticipate offline. Therefore, it is even more critical in such cases
 that the planner has a fallback mechanism in case the provided knowledge is insufficient. Therefore,
 GoDeL augmented with plan repair capabilities will be perfectly suited for such use-cases.
 6.5 Temporal Planning and Cost-Aware Planning
 One of the advantages in moving from task networks to goal networks is that we can adapt exist-
 ing planning heuristics from the domain-independent planning literature to work with HGNs. We
 provided a proof-of-concept of this by augmenting GDP, our first HGN planner, with a variant of
 the relaxed planning graph heuristic used in the FastForward planning system (Hoffmann & Nebel,
 2001). This helps the planner in cases when multiple methods are applicable at a certain point in
 the search and the planner needs a way to decide which methods to try out first.
 Along the same lines, we can adapt admissible heuristics (Karpas & Domshlak, 2009) that, if
 used with an algorithm like A*, can assure cost-optimal plans. We can also aim for a principled
 compromise via anytime planning techniques (Richter & Westphal, 2010), where we do not assure
 cost optimality, but instead try to improve the current best solution by exploring promising parts of
 the search space yet unexplored in an anytime fashion and asymptotically reach the optimal solution.
 We can also augment the HGN formalism to reason about temporal aspects of planning via
 durative actions and deadlines on goals. These extensions, we think, will be very useful from a
 GDA standpoint since agents often have to reason about resource constraints, deadlines, and action
 costs; therefore it will be helpful to have planners that can actively optimize with respect to these
 metrics in a scalable manner.
 108
HIERARCHICAL GOAL NETWORKS AND GDA
 7. Final Remarks and Future Work
 We have described our planning formalism, called Hierarchical Goal Networks (HGNs), and our
 planning algorithm based on this formalism. We argued that HGNs are particularly relevant and
 useful for GDA because they provide flexibility and reasoning capabilities in goal reasoning, during
 both planning and execution.
 We?re currently developing a generalization of HGNs for temporal goal reasoning. This work
 involves extending the HGN formalism described here with durative tasks and deadlines on goal
 achievement. The GoDeL algorithm for HGN planning must be generalized for reasoning with
 such temporal constructs. One of the challenges in this work is to model, correctly, the temporal
 semantics between planning and execution that will yield sound behavior in the generated plans.
 Furthermore, to facilitate GDA, we want to extend HGN planning to include ways of reason-
 ing and planning with goals under execution failures. Execution failures typically stem from the
 discrepancies from the planning and goal model that a system has and the ground-truth discovered
 during execution. We?re interested in investigating how a planner can help reorder goal preferences
 during execution when such discrepancies occur, and repair the existing plans accordingly.
 Finally, humans can reason about goals in very abstract levels and semantically, whereas auto-
 mated planning and learning algorithms are computationally effective but has to stay in the realm
 of logical formulations. This comparison is perhaps similar to the comparisons between humans
 and AI in image processing tasks. For example, during their planning and acting, humans can iden-
 tify a goal that may not be on their initial agenda but is a ?low-hanging fruit.? We?re currently
 investigating how to model this concept in an AI planning and execution algorithm.
 Acknowledgements.. This work was supported in part by ARO grant W911NF1210471 and
 ONR grants N000141210430 and N000141310597. The information in this paper does not neces-
 sarily reflect the position or policy of the funders; no official endorsement should be inferred.
 References
 Alford, R., Kuter, U., & Nau, D. S. (2009). Translating HTNs to PDDL: A small amount of domain
 knowledge can go a long way. IJCAI.
 Ayan, N. F., Kuter, U., Yaman, F., & Goldman, R. (2007). Hotride: Hierarchical ordered task
 replanning in dynamic environments. Proceedings of the ICAPS-07 Workshop on Planning and
 Plan Execution for Real-World Systems ? Principles and Practices for Planning in Execution..
 Biundo, S., & Schattenberg, B. (2001). From abstract crisis to concrete relief?a preliminary report
 on combining state abstraction and HTN planning. Proc. of the 6th European Conference on
 Planning (pp. 157?168).
 Erol, K., Hendler, J., & Nau, D. S. (1994). HTN planning: Complexity and expressivity. AAAI.
 Fox, M., Gerevini, A., Long, D., & Serina, I. (2006). Plan stability: Replanning versus plan repair.
 ICAPS (pp. 212?221).
 Georgeff, M. P., & Lansky, A. L. (1987). Reactive reasoning and planning. AAAI (pp. 677?682).
 Gerevini, A., Kuter, U., Nau, D. S., Saetti, A., & Waisbrot, N. (2008). Combining domain-
 independent planning and HTN planning. ECAI (pp. 573?577).
 109
SHIVASHANKAR, ALFORD, KUTER, AND NAU
 Ghallab, M., Nau, D. S., & Traverso, P. (2004). Automated planning: Theory and practice.
 Hammond, K. J. (1990). Explaining and repairing plans that fails. Artif. Intell., 45, 173?228.
 Hawes, N. (2001). Anytime planning for agent behaviour. Twelth Workshop of the UK Planning
 and Scheduling Special Interest Group (pp. 157?166).
 Hawes, N. (2002). An anytime planning agent for computer game worlds. Proceedings of the
 Workshop on Agents in Computer Games at The 3rd International Conference on Computers and
 Games, 1?14.
 Hoffmann, J., & Nebel, B. (2001). The FF planning system. JAIR, 14, 253?302.
 Hoffmann, J., Porteous, J., & Sebastia, L. (2004). Ordered landmarks in planning. JAIR, 22, 215?
 278.
 Ingrand, F., & Georgeff, M. (1992). An architecture for real-time reasoning and system control.
 IEEE Expert, 6, 33?44.
 Kambhampati, S., Mali, A., & Srivastava, B. (1998). Hybrid planning for partially hierarchical
 domains. AAAI (pp. 882?888).
 Karpas, E., & Domshlak, C. (2009). Cost-optimal planning with landmarks. IJCAI (pp. 1728?1733).
 Mccluskey, T. L. (2000). Object transition sequences: A new form of abstraction for HTN planners.
 AIPS (pp. 216?225).
 Molineaux, M., Klenk, M., & Aha, D. W. (2010). Goal-driven autonomy in a navy strategy simula-
 tion. AAAI. AAAI Press.
 Nau, D. S., Au, T.-C., Ilghami, O., Kuter, U., Murdock, J. W., Wu, D., & Yaman, F. (2003). SHOP2:
 An HTN planning system. JAIR, 20, 379?404.
 Nau, D. S., Cao, Y., Lotem, A., & Mu?oz-Avila, H. (2001). The SHOP planning system. AI Mag.
 Richter, S., & Westphal, M. (2010). The LAMA planner: Guiding cost-based anytime planning
 with landmarks. J. Artif. Intell. Res. (JAIR), 39, 127?177.
 Shivashankar, V., Kuter, U., Nau, D. S., & Alford, R. (2012). A hierarchical goal-based formalism
 and algorithm for single-agent planning. AAMAS (pp. 981?988).
 Shivashankar, V., Kuter, U., Nau, D. S., & Alford, R. (2013). The GoDeL planning system: A more
 perfect union of domain-independent planning and hierarchical planning. IJCAI.
 Talamadupula, K., Benton, J., Kambhampati, S., Schermerhorn, P. W., & Scheutz, M. (2010). Plan-
 ning for human-robot teaming in open worlds. ACM TIST, 1, 14.
 Warfield, I., Hogg, C., Lee-Urban, S., & Mu?oz-Avila, H. (2007). Adaptation of hierarchical task
 network plans. FLAIRS (pp. 429?434). AAAI Press.
 Wilkins, D. E. (1984). Domain-independent planning: Representation and plan generation. Artif.
 Intell., 22, 269?301.
 Wilson, M. A., Molineaux, M., & Aha, D. W. (2013). Domain-independent heuristics for goal
 formulation. FLAIRS Conference. AAAI Press.
 110
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning 
 
Breadth of Approaches to Goal Reasoning: A Research Survey 
 
Swaroop Vattam SWAROOP.VATTAM.CTR.IN@NRL.NAVY.MIL 
Naval Research Laboratory, NRC Postdoctoral Fellow, Washington, DC 20375  
 
Matthew Klenk                                                                KLENK@PARC.COM 
Palo Alto Research Center, Embedded Reasoning Area, Palo Alto, CA 94304  
 
Matthew Molineaux MATTHEW.MOLINEAUX@KNEXUSRESEARCH.COM 
Knexus Research Corporation, Springfield, VA 22150375  
 
David W. Aha DAVID.AHA@NRL.NAVY.MIL 
Naval Research Laboratory, Navy Center for Applied Research in AI, Washington, DC 20375  
Abstract 
Goal-directed behavior is a hallmark of intelligence. While the majority of artificial intelligence 
research assumes goals are static and externally provided, many real-world applications involve 
unanticipated changes in the environment that may require changes to the goals themselves. Goal 
reasoning, which emphasizes the explicit representation of goals, their automatic formulation and 
dynamic management, is considered an important aspect of high-level autonomy. Building from 
these three basic requirements, we describe and apply a framework for surveying research related 
to goal reasoning that focuses on triggers and methods for goal formulation and goal management. 
We also summarize current research and highlight potential areas of future work. 
1.  Introduction 
It is generally acknowledged that goal-directed behavior is a hallmark of intelligence (Newell & Simon 
1972; Schank & Abelson 1977). Goal-directed behavior has usually been interpreted as autonomy of 
actions - an intelligent agent should be able to reason about actions in an autonomous manner in order to 
change the state of the world (including itself) as a means to satisfying a given goal. On the one hand, this 
interpretation has provided a clear focus, guiding much AI research from early problem solvers to modern 
day automated planners. On the other hand, it has also limited the reach and richness of AI systems by 
ignoring goals; it is often assumed that an external user or system is responsible for providing goals that 
remain static over a problem-solving episode. Goal reasoning (e.g., Norman & Long, 1996; Cox, 2007; 
Hawes, 2011; Klenk, Molineaux, & Aha, 2013; Jaidee, Mu?oz-Avila, & Aha, 2013) challenges this 
interpretation and strives for autonomy of goals ? in addition to autonomy of actions, an intelligent agent 
should be aware of its own goals and deliberate upon them. As we start to consider designs for intelligent 
systems that are more autonomous and use multiple interacting competencies to solve a wider variety of 
problems in the real world, it becomes increasingly difficult to ignore the issue of goal reasoning. 
To illustrate the importance of goal reasoning for intelligent behavior, consider a fishing craft in the 
Gulf of Mexico. While carrying out a plan to achieve the goal of catching fish, the fishermen receive 
reports of an explosion on a nearby offshore oil rig. Upon hearing the reports, the fishermen change their 
goal from ?catch fish? to ?rescue the rig?s workers?. This goal change results in a far superior outcome, 
rescued workers, but is outside the scope of the original mission, catching fish. 
In this paper, we present a preliminary analysis of research related to goal reasoning in the context of 
planning and problem-solving. (Due to space limitations, we do not also examine research on the role of 
goals in human and machine learning (e.g., Leake 1991; Leake & Ram 1995).) We begin by describing 
111
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
the Goal Reasoning Analysis Framework (GRAF) and use it to focus on the tasks of goal formulation and 
goal management. Next we survey approaches and techniques for these tasks in terms of this framework. 
Finally, we briefly discuss current goal reasoning research and highlight potential areas for future work. 
2.  Goal Reasoning Analysis Framework (GRAF) 
Because the notion of goal reasoning is polymorphous and often interpreted and applied differently in 
different research contexts, it is productive to think about a common framework for analyzing and 
comparing the various techniques and approaches related to goal reasoning. We propose the Goal 
Reasoning Analysis Framework (GRAF) as a first step in this direction. We develop this framework by 
first identifying the following three minimum requirements for goal reasoning. 
Explicit goals: First, the system should explicitly represent and reason about goals. 
Goal formulation: Second, the system should be able to formulate goals. Once we require an intelligent 
system to have explicit goals, we require processes that can generate or identify and select them 
dynamically. We shall refer to these processes as goal formulation processes. Where goals come from is 
often overlooked in intelligent system, which motivated us to address it in this survey. 
Goal management: Third, the system should manage goals and select the ones that should be acted upon. 
An independent goal formulation process can lead to multiple goals. Therefore we require some form of 
management system that accepts goals produced by goal formulation processes, selects which goal(s) 
should be pursued (with reference to any ongoing goal-directed behavior), and triggers the appropriate 
plan generation mechanism to achieve the selected goal. If the goal formulation processes produce goals 
dynamically, asynchronously and in parallel, the management system must accept and manage new goals 
in this manner too. It should not block the operation of the goal formulation processes, as this would 
interfere with the system?s ability to respond to new situations. 
This set of requirements is consistent with those proposed by Hawes (2011). There is a fourth core 
requirement: the system should generate goal-directed behavior from a collection of goals and available 
resources. However, to simplify, we will ignore this requirement and assume that it is fulfilled by a 
planner with its execution system.  
Our framework, GRAF (Table 1), is obtained by applying the five questions What, Where, Why, 
When and How to the three requirements of explicit goals, goal formulation and goal management. 
Table 1. A tabular representation of GRAF. 
                  Questions 
Requirements 
What Where Why When How 
Explicit goals Representation Source    
Goal formulation   Rationale Triggers Methods 
Goal management     Methods 
What is a goal? This applies to the requirement of explicit goals and refers to the nature and 
representation of a goal. Explicit goals can be of two kinds. A declarative goal is a description of the state 
of the world which is sought and a procedural goal is a set of intended tasks to be solved. Consensus has 
it that most declarative goals are attainment goals. These are states an agent should achieve through plan 
execution. Declarative goals can also include maintenance and prevention goals, which refer to states to 
maintain over time or to prevent from occurring. Given our assumption that the required process which 
translates goals into behavior is a planning system, the nature of a goal and how it is explicitly 
represented in a system depends on that planner. 
Where does a goal come from? This also applies to the requirement of explicit goals and refers to a goal?s 
source. We identify three sources of goals: external, self, and hybrid. The goals can be supplied to the 
intelligent system by an external source in the environment (e.g., user or peer agents). Goals can also be 
self-initiated by the goal formulation process. While a majority of intelligent system designs assume the 
112
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
former, goal reasoning architectures focus on the latter. For the sake of completion, we also envision a 
hybrid situation where the goals can be both externally and internally initiated. 
Why self-formulate a goal? This is applicable to the requirement of goal formulation. One reason to 
formulate goals is rational anomaly response: to better respond to developing situations that threaten an 
agent?s interests. A second reason is graceful degradation: while the current goals may no longer be 
achievable, intelligent action may be achieved by degrading them (e.g., ?submitting a full report? is 
predicted to fail given the time constraints, but ?submitting a draft report? may be achievable). A third 
reason for goal formulation is better future performance: we want intelligent systems to avoid dead-ends 
with respect to the current goals, and also to avoid states that jeopardize goal achievement in the future. 
Furthermore, it may be desirable to take actions that increase the system?s capabilities for more actions 
and more potential goals. A fourth reason for goal formulation is societal norms: as the scope of the 
agent?s operation becomes broader and its lifespan longer, humans that interact with autonomous agents 
will have expectations about their behavior. Goals have to be accommodated to meet those expectations. 
When is a goal formulated? This also applies to the requirement of goal formulation and refers to triggers 
for goal formulation. Typically, goal formulation is considered when an anomaly is detected and/or the 
system is self-motivated to explore its actions in the world.  
How are goals formulated? This applies to the requirement of goal formulation and refers to methods for 
achieving the function of goal formulation. 
How are goals managed? This also applies to the requirement of goal management and refers to methods 
for achieving the function of goal management. 
In this survey, we primarily focus on the questions of When and How. That is, our emphasis is on 
triggers of goal formulation, methods for goal formulation, and methods for goal management. 
3.  Triggers for Goal Formulation 
Typically, goal formulation can occur when an anomaly is detected and/or the system is self-motivated to 
explore its actions in the world. In most current implementations a goal is formulated when no active goal 
exists and the intelligent system is self-motivated to pursue additional goals, or an active goal exists but 
an anomaly is detected, and pursuing alternate goals is considered advantageous in light of the anomaly.  
Because a majority of existing approaches are anomaly-driven, we will focus on the latter. A non-
 exhaustive list of anomalies could include: 
? An active plan fails (or is predicted to fail or perform suboptimally) and no contingency plan exists. 
? An affordance is perceived (i.e., pursue a better goal that the agent was considering but hadn't been 
able to pursue). 
? An opportunity is detected (i.e., pursue a better goal that the agent wasn't planning to pursue). 
? An internal drive of a system requires attention (e.g., a battery?s energy level is low and the system 
has an internal drive to maintain its energy level). 
Anomaly triggered goal formulation requires a discussion about how anomalies are detected. Anomaly 
detection typically relies on various kinds of monitoring processes, including the following: 
? Plan monitoring: One source of information for detecting anomalies comes from the plan itself. 
Changes in the environment may prevent the execution of a plan?s future actions. In plan monitoring, 
the agent monitors the plan?s execution by assessing whether its remaining actions? preconditions are 
satisfied in the current state or achievable as an effect of a preceding planned action. If not, a plan 
fails. Similarly, plans may also fail because an agent?s actions do not achieve their intended effects. 
Action monitoring algorithms ensure that the last action was successfully executed (i.e., the effects of 
the action are true in the environment). 
In addition to monitoring the validity of plans during execution, research has identified methods 
for monitoring plan optimality during execution. Fitz & McIlraith (2007) define plan optimality and 
113
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
describe a state space planner that monitors the utility of the current plan with respect to alternatives 
using a variant of A* search. In this context, the agent should replan when it predicts that the plan will 
fail or execute sub-optimally.  
Plan failure has been the subject of replanning and plan repair in traditional AI planning research 
from the beginning (Russell & Norvig, 2003). For example, Darmok implements action monitoring in 
an online case-based planner for a real-time strategy (RTS) game (Onta?on et al., 2010). If an action 
fails, Darmok extends its current plan with new actions to satisfy the failed action?s goal. Also 
focusing on replanning, HOTRiDE employs action monitoring in simulated noncombatant evacuation 
operation planning (Ayan et al., 2007). When an action fails, HOTRiDE uses a dependency graph to 
determine which task decompositions are no longer valid and must be replanned. 
When a plan fails or is predicted to fail (or be suboptimal), replanning systems try to generate 
new plans or repair existing plans using the original goal. In contrast, goal reasoning systems instead 
reason about their goals and try to formulate new goals. For example, ARTUE (Klenk et al., 2013) 
finds discrepancies (for discrete states) using a set difference operation between the expected and 
observed literals. For continuous states, the observed and expected value of each fluent is compared; a 
discrepancy is considered to occur whenever their values differ by more than 0.1% of the (absolute) 
observed value. When a discrepancy is detected, its anomaly response mechanism performs anomaly 
explanation and goal formulation. 
? Periodic monitoring: Instead of focusing solely on the current plan and its execution, agents may 
monitor the entire environment to determine if new goals should be considered. In periodic 
monitoring, the agent considers the current state at set intervals. Periodic monitoring is frequently 
used in systems that perform real-time response. For example, Burkhard et al. (1998) illustrate how 
Belief-Desire-Intention (BDI) agents (Rao & Georgeff, 1995) monitor the environment for changes in 
their beliefs. Their RoboCup soccer agents receive new sensor information every 300ms. PROSOCS 
uses a sensing, revision, planning, and execution cycle to periodically monitor the environment 
(Mancarella et al., 2005). At the start of each cycle, new sensor information is received that can 
inform execution, plan revision, and future planning. A final example is the cognitive architecture 
ICARUS, which executes periodic monitoring during its recognize-act cycle (Langley & Choi, 2006). 
? Expectation monitoring: Expectations are driven by experience from problem solving or interacting 
with an environment. Problem-solving experience can set expectations that can be monitored. A 
change in expectations can then trigger changes in behavior. For example, Veloso, Pollack and Cox 
(1998), in their rationale-based plan monitoring architecture, showed that plan rationales often 
include expectations that result in the adoption of the current plan at the expense of an alternative 
plan. Such expectations lead to (1) generating monitors that represent environmental features which 
affect plan rationale, (2) deliberating, whenever a monitor fires, about whether to respond to it, and 
(3) transforming plans as warranted by modifying goals. Expectation-driven goal-oriented behavior 
based on problem-solving experience is a hallmark of Schank?s approach to intelligent systems 
(Schank 1982; Schank & Owens 1987), which is highly relevant to goal reasoning. 
Agents can also learn a model of how the environment changes through experience from 
interacting with their environment. Expectation monitoring uses this model to assess the nature and 
relevance of a discrepancy. In robotic navigation, Bouguerra, Karlsson, and Saffiotti (2008) used 
semantic knowledge to generate expectations concerning objects that may be encountered during plan 
execution. For example, when moving into a living room, the robot expects to see objects typical to 
that location (e.g., a TV, a sofa). From a cognitive science perspective, INTRO uses a rule-based 
model to generate expectations and detect discrepancies in a Wumpus World environment (Cox, 
2007). Kurup et al. (2012) introduce a cognitive model of expectation-driven behavior in ACT-R. It 
generates future states called expectations, matches them to observed behavior, and reacts when a 
difference exists between them.  
Expectation monitoring can be implemented using anomaly recognition techniques. Typically, 
these approaches can be divided into three groups: (1) signature detection, which matches the current 
114
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
situation to known deviant patterns, (2) anomaly detection, which compares the current situation to 
baseline patterns, and (3) hybrid methods, which include both (Patcha & Park, 2007). 
? Domain-specific monitoring: Monitoring for expectation failures is difficult in environments whose 
future states are difficult to predict. Therefore, some agents utilize domain-specific monitoring 
strategies, which periodically test values of specified state variables during plan execution. Many 
researchers use domain-specific monitoring to directly link unanticipated states to new goals. In a 
simulated rover domain, MADBot uses motivations to monitor specified values in the environment 
(e.g., when the battery?s charge falls below 50%, a new goal is created to recharge it) (Coddington, 
2006). M-ARTUE (Wilson, Molineaux, & Aha, 2013) similarly represents drives to direct goal 
formulation. While MADBot uses domain-specific drives, M-ARTUE does not represent motivations 
using domain knowledge, and is not limited to generating goals for achieving threshold values. Dora 
the Explorer (Hawes et al., 2011) encodes motivators that formulate goals related to exploring space 
and determining the function of rooms, similar to M-ARTUE?s exploration motivator. However, 
Dora?s functions are also domain-specific. Finally, Hawes?s (2011) survey of motivation frameworks 
defines goal management and goal formulation in terms of goal generators or drives. It relates many 
systems in terms of these concepts, and proposes a design for future ?motive management 
frameworks?.  
? Object-based monitoring: In domain-specific monitoring, the monitors specify particular state 
variables. Object-based monitoring also includes the set of objects in the environment. The detection 
of new objects may interrupt plans or cause the creation of new goals. Object-based monitoring 
systems specify which types of new objects to consider as discrepancies. Goldman (2009) describes 
an HTN planner with universally quantified goals that uses loops and other control structures to plan 
for sets of entities whose cardinality is unknown at planning time. Similarly, Cox and Veloso (1998) 
and Veloso, Pollack, and Cox (1998) also discuss and implement universally quantified goals where 
some objects (and hence goals) are not known. Dora generates a goal to explore each newly detected 
room (Hanheide et al., 2010). Open world quantified goals extend these approaches to include 
knowledge about how new objects may be detected (Talamadupula et al., 2010). For example, in an 
urban search and rescue task, plans must be generated to locate objects that are unknown prior to 
execution (i.e., the victims). In real-time games like GRUE (Gordon & Logan, 2004), a more typical 
approach for this kind of monitoring is by authoring game AI using a teleo-reactive program (TRP) 
(Benson & Nilsson, 1995). TRPs dictate which actions to take in specific world states (e.g., if the 
agent is running past a weapon it does not have, then it should pick up the weapon). 
4.  Methods for Goal Formulation 
We identify six types of goal formulation methods based on the knowledge they use. 
? State-Based Goal Formulation: The most straightforward method for generating goals is to pre-
 specify links between specific state variables and specific goals. Consider a helicopter?s low-fuel 
indicator light. When it flashes, the agent pilot may generate a goal to refuel. The new goal depends 
solely on a single variable in the current state (i.e., the low-fuel indicator).  
These approaches are typically applied in fully observable environments. For example, game 
designers who have complete access to the environment can use behavior trees (Champandard, 2007) 
to control non-player characters; this is done in many modern video games. To increase reusability 
and make plans interruptible, Cutumisu & Szafron (2009) use multiple behavior trees to control 
characters interacting in a restaurant. Working with the internal state of the rover, AgentSpeak-MPL 
(Meneguzzi & Luck, 2007) uses motivations to formulate new goals when the value of particular 
state variables drops below individual thresholds. ICARUS (Choi 2010) uses a reactive goal 
management procedure to nominate and prioritize new top-level goals in which <condition, goal> 
pairs in long-term goal memory are considered for nomination at every reasoning step. This 
resembles rule-based goal-formulation, as used in ARTUE (Klenk et al. 2013). M-ARTUE (Wilson et 
115
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
al. 2013) includes a motivation subsystem that formulates goals based on the psychological notion of 
drives, which constitute a hierarchy of heuristic functions representing both external and internal 
needs. M-ARTUE differs from ARTUE only in the way goals are formulated; instead of using 
reactive rules, it uses domain independent heuristics to evaluate potential goals. This approach is 
similar in spirit to CLARION?s goal formulation mechanism (Sun, 2009), where drives are 
represented sub-symbolically and they set the level of activation for explicit goals according to the 
world state. The primary difference between M-ARTUE and CLARION is that the representations of 
internal needs are domain independent and domain dependent, respectively. 
? Interactive Goal Formulation: In realistic domains it is often infeasible to provide goal formulation 
knowledge for every situation. To address this, T-ARTUE (Powell, Molineaux, & Aha, 2011) and 
EISBot (Weber, Mateas, & Jhala, 2012) learn this knowledge from humans: T-ARTUE learns from 
criticism and answers to queries, while EISBot learns from human demonstrations. Each provides a 
domain-independent method for acquiring formulation knowledge, but neither system reasons about 
internal needs alongside external goals. Although based on the GDA model, GDA-C (Jaidee et al. 
2013) differs substantially from ARTUE and M-ARTUE. GDA-C learns its goal selection function 
using Q-learning. While this increases autonomy, it employs a domain dependent reward function; 
indirectly, GDA-C?s goal selection strategy is guided by a human. 
? Object-Based Goal Formulation: While specifying a goal for each state provides an agent designer 
with considerable control over an agent?s actions, these methods are inflexible and difficult to author. 
To promote reuse and flexibility, several systems rely on rules or schemas that specify how to 
formulate goals for a range of possible states. One important problem this solves is the generation of 
goals in response to the discovery of new objects in the environment that were unknown at planning 
time. Consider a robot on a search and rescue mission. Prior to plan execution, the number of rooms 
to search is unknown. Goal formulation allows the robot to formulate an initial plan to detect rooms, 
and then assert new goals to search the rooms as they are located. 
Recently, several researchers have proposed extensions to goal specifications to account for 
unknown objects. For example, goal generators produce goals when new objects are detected that 
satisfy a set of conditions (Hanheide et al., 2010). For example, when a new region is detected by a 
mobile robot, a goal will be generated to identify that region. In addition to generating goals based on 
newly detected objects, open-world quantified goals provide information about sensing actions for 
planning (Talamadupula et al., 2010). Each of these approaches extends the goal specification to 
specify the importance of the newly generated goal. 
? Belief-Based Goal Formulation: In addition to the observed state, an agent may formulate goals 
using its beliefs about the current state. Representing knowledge about the environment that is not 
directly observed, beliefs are generally output by an inference process such as explanation or state 
elaboration. For example, on observing a lightning strike, an agent might infer a belief that a storm is 
approaching. This belief could lead to the formulation of a goal to seek shelter. 
Recent work has demonstrated the effectiveness of this approach in dynamic environments. After 
using explanation to update its beliefs, ARTUE uses rules to specify how to formulate goals based on 
the observed state and the agent?s beliefs (Molineaux et al., 2010). An alternative method for 
generating beliefs is through state elaboration. Using forward inference rules over the observed state, 
ICARUS creates a set of beliefs, which are used by reactive goal management to nominate goals from 
long term memory for use in a simulated driving task (Choi, 2010). 
? Case-Based Goal Formulation: Case-based goal formulation stores applicable goals in cases. 
During goal formulation, a case matching the cue is retrieved and the associated goal is reused in the 
current situation. For example, when a submarine disappears, an agent pilot might remember a 
previous situation in which searching for the submarine with a helicopter was a useful goal to pursue.  
Case-based goal formulation methods differ in their retrieval cues and types of goals generated. 
In EISBot (Weber, Mateas, & Jhala, 2010), the current state is used as a cue to retrieve a gameplay 
116
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
trace, which is a state sequence recorded from an expert?s game play. EISBot selects a future state 
from the trace as the current goal. It performed well in StarCraft games against the built-in AI and 
human players. In another strategy game, CB-gda uses observed discrepancies as a retrieval cue to 
generate task goals (Mu?oz-Avila et al., 2010). Each of these approaches requires minimal 
knowledge engineering as the retrieved cases may be automatically collected by observing human-
 provided traces of activities. 
? Explanation-Based Goal Formulation: While the methods described above require knowledge 
engineering for each possible goal, an alternative approach focuses on explaining a discrepancy when 
generating a goal. When the observed discrepancy may prevent the agent from achieving its goals, 
the agent can generate a new goal by reasoning over its explanation. Consider a helicopter that is 
losing fuel. An agent pilot might explain this anomaly by inferring a leak in the fuel tank. Using this 
explanation, a goal could be generated to stop this leak. 
Explanation-based methods use the explanation to generate goals. For example, INTRO (Cox, 
2007) generates a goal by negating the antecedent of the explanation. In the Wumpus World domain, 
the discrepancy of the screaming wumpus would yield a goal to negate their hunger. In pervasive 
diagnosis, goals are generated to collect information based on the current diagnosis of faults in the 
device (Kuhn et al., 2008). The purpose is to generate plans to achieve production goals while 
refining its explanation for any faults. By focusing on the syntax of the explanation, these approaches 
can be easily applied to new domains. 
Here we discuss four types of methods for explanation generation in response to an anomaly. 
a. Propositional Causal Models: In such models, p causes q implies that p is always followed by q. 
A causal model is typically encoded as a set of rules, provided by a domain expert, which is used 
to infer the cause underlying a set of observations. This approach is exemplified by expert 
systems, such as the MYCIN medical diagnosis system (Shortliffe, 1976). Another deterministic 
approach uses truth-maintenance systems (Forbus & de Kleer, 1993), where facts are either 
assumptions provided to the system or consequences computed by a set of rules. For any 
consequence, it is possible to trace the rules and assumptions that support it. 
Intelligent agents have used deterministic causal models to improve their performance in 
problem-solving domains and simulated environments. For example, using explanation-based 
learning (DeJong, 1993), CASCADE applied overly-general rules to model human learning in 
physics problem solving (VanLehn et al., 1992). The goal reasoning agent ARTUE uses an 
abductive explanation (Josephson & Josephson, 1994) process to assume hidden facts that could 
cause a discrepancy (Molineaux et al., 2010). Using the environment model, ARTUE selects 
assumptions that, if true in the prior state, would predict the discrepancy d and the current state. 
b. Probabilistic Explanation Models: Unlike deterministic models, probabilistic explanation 
models explicitly quantify uncertainty. In probabilistic models, p causes q implies that the 
occurrence of q increases the probability of p. Probabilistic explanation typically uses graphical 
models, such as Bayesian networks (Pearl, 2000), to determine the likely causes of individual 
propositions. These models rely on conditional independence between causes and the subjective 
probabilities can be learned by applying Bayes? rule with experience and a given prior 
probability. A probabilistic model of a ship explosion would include facts describing the 
likelihood of an explosion given a gas leak (or a fuel leak) as high, and the prior probability of a 
gas leak as higher than the prior probability of a torpedo. An agent would reason from this model 
that both a gas leak and a torpedo are possible explanations, with a gas leak being more likely. 
Probabilistic models have been adopted in many AI subfields. In planning under uncertainty, 
the environment is frequently modeled as a partially observable Markov decision process 
(Kaelbling, Littman, & Cassandra, 1998). A typical agent using this model will update an internal 
belief state after each action, which characterizes the probability of the agent being in each 
possible environment state. The update of this belief state is a form of explanation in which the 
observations are explained to result from a given state trajectory. From a goal reasoning 
117
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
perspective, pervasive diagnosis maintains a set of probabilities indicating the likelihood that 
each potential system fault has occurred based on prior observations (Kuhn et al., 2008). 
c. Qualitative Explanation Models: This kind of model provides an alternative approach for 
describing uncertainty by allowing an agent to reason about changes to continuous quantities 
without using precise quantitative measurements. Quantity q1 is qualitatively proportional to 
quantity q2 if, all things being equal, an increase in q1 causes an increase in q2 (Forbus, 1984). A 
qualitative model may explain a ship explosion as the result of a decrease in the engine oil 
pressure that caused its temperature to rise above its flashpoint.  
Qualitative models are useful in domains where numerical models are unknown, inaccurate, 
or computationally expensive. For example, MAYOR (Fasciano, 1996) explains its expectation 
failures in managing a simulated city using a qualitative economic model (e.g., high crime 
decreases housing demand). Using a different qualitative economic model for cities, Hinrichs and 
Forbus (2007) use qualitative explanations to overcome local maxima in a worker placement task 
in the Freeciv turn-based strategy game. 
d. Example-specific Explanation Models: Due to the difficulty of obtaining complete and correct 
models from domain experts (Watson, 1997), another approach is to rely on example-specific 
models, which are easier to elicit from experts. An expert may state that p causes q for a 
particular situation(s), and this knowledge may be used inductively to infer ?? as a cause for ?? in 
a new situation. For example, when faced with a new situation, case-based reasoning (Leake & 
McSherry, 2005) and analogical reasoning (Falkenhainer et al., 1989) approaches retrieve a 
similar example and reuse its example-specific explanation. Examples may be labeled with a 
cause, which can allow supervised learning approaches to infer causes for new instances 
(Mitchell, 1997). To explain a ship?s explosion, an agent may recall another ship that was sunk by 
a submarine?s torpedo and conclude that an enemy submarine is within range of the ship.  
The transfer of example-specific models has been used to improve the performance of AI 
systems. PHINEAS (Falkenhainer, 1988) creates analogies between qualitative behaviors to 
transfer explanatory models in physical domains. META-AQUA uses explanation patterns (Cox 
2007), which are a type of case for explaining expectation violations. Mu?oz-Avila and Aha 
(2004) define a taxonomy of explanation types pertinent to case-based planning for games. 
5.  Methods for Goal Management 
In goal reasoning, agents may need to consider many goals. Given a set of pending goals, goal 
management selects which goal(s) should be pursued. Goal management can be a continuous ongoing 
process or triggered by certain events. For example, Veloso, Pollack and Cox (1998) discuss the use of 
rationale-based planning monitors as triggers for goal change, while Jones et al. (1999) represent goals as 
operators which are triggered at run-time by rules that match predefined states and sensor readings.  
We identify seven types of plan-invariant methods (i.e., approaches that focus solely on pending 
goals) for goal management. They differ in how they store pending goals and how they select which goals 
to pursue. Shapiro et al. (2012) provide formal semantics for goal management by dropping or modifying 
intentions in the context of BDI agents, some of which are applicable to the methods discussed below. 
? Replacement: Replacement remembers and plans for one goal at a time; if a new goal arises, it 
immediately replaces the existing goal. These approaches are useful when the set of goals is small, 
and the agent actively switches between them. For example, in Baltes?s (2002) RoboCup soccer 
agent, the agent switches frequently between offense and defense based on the state of the field. 
? Stack (consider execution history): In lieu of strict replacement, an agent may use a stack to 
manage its goals. In this approach, the execution history is taken into account: a newly generated goal 
is accomplished first, after which the agent pursues the pending goals beginning with the goal that 
was being pursued when the most recent goal was generated. This is a common approach in cognitive 
architectures and other agents focused on long term execution. For example, both SOAR (Laird, 
118
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
2008) and ACT-R (Anderson & Lebiere, 1998) agents have used this strategy to manage their goals 
in a wide range of domains. The same strategy is employed by the rover agents discussed previously 
(Coddington, 2006). 
? Rule-based (consider the state): In rule-based goal management systems, a set of rules is used to 
change the system?s active goals. Each rule is a condition-action pair, where a condition is a 
statement about an event or a world state that, if true, results in an action to modify (e.g., add, drop, 
change) the current goals. 
Rule-based approaches have been used in reactive-planning agent architectures. While typical 
BDI agents (Rao & Georgeff, 1995) change their procedural goals as a result of observed events, 
CANPlan illustrates how observed events can trigger declarative goals that can be reasoned about 
using planning (Sardina & Padgham, 2010). Extending the semantics, the abstract agent language 
CAN specifies abstract goal states (pending, waiting, active, and suspended) for three different types 
of goals (achievement, task, and maintenance) and transitions among them (Harland et al., 2010). 
? Oversubscription planning (consider quantitative goals): Classical planning focuses on generating 
plans that achieve a conjunctive set of goals. If no such plan exists, then classical planning fails. 
Oversubscription planning (Smith, 2004) relaxes this all-or-nothing constraint, and instead focuses on 
generating plans that achieve the ?best? subset of goals (i.e., the plan that gives the maximum trade-
 off between total achieved goal utility and total incurred action cost). While rule-based approaches do 
not include quantitative information in the goals themselves or how they are evaluated in a given 
state, oversubscription planning includes quantitative information in goals. This goal management 
strategy requires that each goal have an associated utility and each action have an estimated cost.  
While this greatly increases the computational complexity of finding an optimal plan, some 
heuristic approaches have been used for oversubscription planning. For example, heuristic Partial 
Satisfaction Planning approaches have been shown to generate plans of similar quality to optimal 
plans (van den Briel et al., 2004). Much of the research in this area has focused on describing the soft 
constraints that impact action costs and goal utilities. For example, goal dependencies (Do et al., 
2007) involve constraints among goals (e.g., mutually exclusive goals), further complicating the goal 
selection process. While most oversubscription approaches do not consider changes to the agent?s 
goals during execution, Han & Barber (2005) introduce a desire-space framework that accounts for 
goal dependencies. A desire-space is a Markov decision process (Sutton & Barto, 1998) in which 
each node is a set of achieved goals and the links between them are costs of a macro-operator that 
achieves the goals in the destination node. This enables the application of decision theory to 
determine which goals are worth the cost of achieving. Cushing, Benton, and Kambhampati (2008) 
describe an extension of oversubscription planning that includes replanning, which is cast as a 
process of reselecting goals. Each top-level goal is associated with rewards and penalties. Rewards 
are accrued when objectives are achieved and penalties otherwise. Newly arriving goals are modeled 
as rewards while existing plan commitments are modeled as penalties. The planner continually 
improves its current plan in an anytime fashion, while monitoring to see if any selected goal is still 
appropriate. Replanning occurs whenever a situation deviates significantly from the model, causing 
the selection of a new set of objectives. 
? Spreading activation (consider execution history and state): While the prior methods use only the 
time of the goal?s formulation to determine the planner?s goals, spreading activation methods 
determine the most relevant goals using the current context of the agent?s working memory. In this 
approach, goals are associated with concepts in a semantic network. The concepts currently in 
working memory spread activation through the network to individual goals. The goal with the highest 
activation is selected for consideration by the agent. Motivated by psychological results which 
indicate that a goal stack insufficiently models human goal processing, some researchers have 
extended ACT-R?s goal management system to select goals based on spreading activation in its 
declarative memory (Anderson & Douglass, 2001; Altmann & Trafton, 2002). Activation is spread 
between goals and cues based on associative links, which are formed when they enter working 
119
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
memory at the same time. This view of goal reasoning emphasizes the role of the environment to 
supply cues that activate the appropriate goals. 
? Priority queue (domain specific methods that incorporate execution history and state to 
prioritize goals): Priority queues generalize spreading activation to allow the ordering of goals along 
any preference metric (i.e., for each goal a number can be generated by some method using the 
current beliefs about the environment).  The highest scoring goal is the one that should be pursued. 
Unlike the priority queue data structure, these approaches allow the priority of goals to change after 
being added to the queue. Therefore, each time an agent selects new goals, it must recompute the 
existing goals? priorities using its current beliefs about the environment. 
This approach has been used in research systems in robotics and game AI, some of which reason 
with learned priorities. For example, goal intensity allows a simulated rover agent to order its goals 
using the goals themselves and its beliefs about the environment (Meneguzzi & Luck, 2007). In 
robotics, the affective goal management method (Scheutz & Schermerhorn, 2009) maintains a recent 
history of previous successes and failures for each action type and uses these to estimate the expected 
utility for each goal. Instead of focusing solely on successes and failures, some systems incorporate 
appraisal theories (Roseman & Smith 2001). For example, the FearNot! framework selects goals 
related to the strongest emotions (Aylett, Dias, & Paiva, 2006), and SOAR 9 uses appraisals for 
intrinsically motivated reinforcement learning (Marinier, van Lent, & Jones, 2010). In game AI, 
GRUE (Gordon & Logan 2004) allows for concurrent goals to be pursued, but does so in a non-
 compensatory manner (i.e., goals with higher priorities receive preference for resources over all other 
goals). Similarly, the multi-queue approach to behavior trees (Cutumisu & Szafron, 2009) makes use 
of qualitative priorities between types of goals, and uses quantitative distinctions within each 
grouping to select the current goals. Young and Hawes?s (2012) work on using evolutionary 
approaches to determine the priorities of high-level tasks in QUORUM also fits into this approach. 
? Goal transformation: Goal transformation involves changing the current goals to enable plan 
generation (Cox & Veloso, 1998). Research on this topic has focused on defining the space of transfer 
formations and methods for applying them. For example, Cox & Veloso (1998) create a taxonomy of 
13 goal transformations and demonstrate how they allow for graceful performance degradation in an 
air superiority planning task (e.g., in air combat planning, if insufficient resources are available to 
destroy a bridge, a new goal to damage the bridge can be generated). Goal Morph introduces costs 
and utilities to goal transformations in a web service composition application (Vukovic & Robinson, 
2005). After constraining the space of applicable transformations using the context, Goal Morph 
applies the transformation that yields the goals with the highest utility. 
6.  Discussion 
Goal formulation determines how an agent responds to an explained discrepancy. Many discrepancies do 
not require goal change. That is, the agent may continue executing the same plan, or it may generate a 
new plan for the same goals. While pure replanning approaches, such as FF-Replan, have been effective 
in many domains, they are susceptible to failures due to execution dead-ends (i.e., states from which the 
current goals cannot be achieved) (Yoon, Fern, & Givan, 2007). In addition to providing information 
about the environment, discrepancies may present threats to current or future plans, opportunities or 
obligations. One reason to formulate goals is to respond to developing situations that threaten the agent?s 
interests, similar to the function of maintenance goals (Dastani, van Riemsdijk, & Meyer, 2006). There 
are other reasons for formulating goals: (1) graceful degradation, (2) improved future performance, and 
(3) societal norms. These other reasons have not been investigated sufficiently in goal reasoning research, 
which provides opportunities for future work. 
With its focus on dynamic, uncertain, and open environments, goal reasoning seeks to increase 
autonomy through a knowledge intensive process. Therefore, goal formulation should not rely solely on 
the observed state, but also on the agent?s beliefs about the environment, as in (Molineaux et al., 2010). In 
addition, it is difficult to specify all potential goals for an agent. Therefore, an important area of future 
120
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
research is to reduce the knowledge engineering burden by learning goal formulation methods, as in 
(Weber et al., 2010). 
The need to consider competing goals is a primary motivation for goal reasoning. Simple replacement 
and stack approaches are well understood, but are too inflexible for more complex tasks. When planning 
failures occur, autonomous behavior requires a graceful degradation of performance, which may be 
achieved (at least partially) through existing oversubscription planning and goal transformation 
approaches. While oversubscription planning endows an agent with a rational method for selecting goals 
based on utility, it is insufficient when the set of goals is dependent on the agent?s continuing 
observations of the environment (i.e., goals are subject to change at plan execution time). Approaches 
combining goal transformations with a definition of goal utility captured in a priority queue appear to be 
promising for handling larger classes of problems. 
Future research should also investigate the interaction of goal reasoning components with traditional 
planning systems. Due to the separation of goal reasoning from planning, it should be possible to 
integrate a single goal reasoning method with multiple planners. Given that a state, a goal, and an 
environment model constitute a planning problem, it is worth exploring whether particular goal reasoning 
methods favor particular planners. In conducting this survey, we observed that the same or similar goal 
reasoning components may be used with the tasks of HTN planning (Molineaux et al., 2010) and the 
state-based goals used in many planning approaches (Hanheide et al., 2010). This suggests that goal 
reasoning is a distinct process worthy of independent investigation. 
Evaluating goal reasoning systems is inherently difficult. AI researchers have produced many 
discussions on agent evaluation strategies (Kaminka & Burghart, 2007). In ablation experiments (e.g. 
Molineaux et al., 2010), a system?s performance is evaluated through a series of trials during which 
components are removed to measure their contribution to the entire system. While there has been some 
research on discrepancy detection, explanation, goal formulation, and goal management, evaluating how 
each component performs within integrated intelligent systems will inform the design of future systems. 
Alternatively, Cassimatis, Bello, and Langley (2008) suggest comparing intelligent systems via metrics 
for capabilities, breadth, and parsimony. These metrics can provide evaluations based on a different view. 
Given the scope of the claims made about goal reasoning agents, a wide array of evaluation 
methodologies is needed to assess them. 
7.  Conclusion 
Goal reasoning is motivated by four challenges to traditional planning approaches: 
? Nondeterministic partially observable environments: An agent?s observations of the current state are 
incomplete and the results of its actions are not deterministic. Furthermore, the environment may 
exhibit unbounded indeterminacy: it is not possible to fully enumerate the future states as a result of 
an agent?s actions. 
? Dynamic environments: The environment changes as a result of actions executed by the agent, events 
in the environment, or actions executed by other agents. 
? Incomplete knowledge: In complex real-world domains, contingencies arise frequently but the 
knowledge of those contingencies may be limited. Furthermore, during execution, environment 
changes may present unidentifiable world states. 
? Knowledge engineering: Capturing complete planning knowledge in complex real-world domains 
may require capturing wickedly large models for exogenous change, a prohibitively large number of 
contingencies, and probabilistic effects of actions. These can each present tremendous knowledge 
engineering challenges. 
To enable intelligent action in these types of situations, we propose that agents should formulate and 
reason about their goals based on environmental changes. Goal reasoning is expected to provide two 
benefits to intelligent agents. First, goal reasoning should enable agents to better respond to unexpected 
circumstances. Second, goal reasoning should decrease the knowledge engineering burden in complex 
real-world domains for a given system by shifting the burden from capturing knowledge for exhaustive 
121
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
planning to that of coding models used by goals reasoning, which we conjecture to be an inherently 
simpler task. While there is some initial evidence supporting each claim (Handheide et al., 2010; 
Molineaux et al., 2010; Mu?oz-Avila et al., 2010; Weber et al., 2010), further investigations are required. 
As intelligent systems execute for longer periods without human intervention on a wide range of 
tasks, it becomes increasingly difficult to pre-specify all its possible goals and contingencies. Therefore, 
the current state-of-the-art relies on human operators to oversee an agent?s execution on narrower tasks. 
But due to the proliferation of robotic and software agents in work, social, and residential environments, 
utilizing omnipresent human operators is not a viable option. Also, creating many systems for narrower 
tasks is inefficient and poses a usability challenge as people interact with each new system. Advances in 
goal reasoning should alleviate these bottlenecks to promote intelligent system development and 
deployment by increasing an agent?s autonomy. 
Acknowledgements 
Thanks to OSD ASD (R&E) for sponsoring this research and to Michael Cox for his extensive 
recommendations. Swaroop Vattam performed this work while an NRC post-doctoral researcher located 
at NRL. The views and opinions contained in this paper are those of the authors and should not be 
interpreted as representing the official views or policies, either expressed or implied, of NRL or OSD. 
References 
Altmann, E. M., & Trafton, J. G. (2002). Memory for goals: An activation-based model. Cognitive 
Science, 26, 39?83. 
Anderson, J.R., & Douglass, S. (2001). Tower of Hanoi: Evidence for the cost of goal retrieval. Journal 
of Experimental Psychology: Learning, Memory, and Cognition, 27, 1331?1346. 
Anderson, J.R., & Lebiere, C. (Eds.) (1998). The atomic components of thought. Hillsdale, NJ: Erlbaum. 
Ayan, N.F., Kuter, U., Yaman F., & Goldman R. (2007). Hotride: Hierarchical ordered task replanning in 
dynamic environments. In F. Ingrand, & K. Rajan (Eds.) Planning and Plan Execution for Real-World 
Systems ? Principles and Practices for Planning in Execution: Papers from the ICAPS Workshop. 
Providence, RI. 
Aylett, R.S, Dias, J & Paiva, A. (2006). An affectively-driven planner for synthetic characters. 
Proceedings of Sixteenth International Conference on Automated Planning and Scheduling (pp. 2-10). 
Cumbria, UK: AAAI Press. 
Baltes, J. (2002). Strategy selection, goal generation, and role assignment in a robotic soccer team. 
Proceedings of the Seventh International Conference on Control, Automation, Robotics and Vision (pp. 
211-214). Singapore: IEEE Press.  
Benson, S. & Nilsson, N. (1995). Reacting, planning and learning in an autonomous agent. In K. 
Furukawa, D. Michie, & S. Muggleton  (Eds.) Machine Intelligence 14. Oxford, UK: Clarendon Press, 
Oxford.  
Bouguerra, A. Karlsson, L., and Saffiotti, A. (2008). Active execution monitoring using planning and 
semantic knowledge. Robotics and Autonomous Systems. 56(11), 942-954. 
van den Briel, M., Sanchez Nigenda, R., Do, M.B., & Kambhampati, S. (2004). Effective approaches for 
partial satisfaction (over-subscription) planning. Proceedings of the Nineteenth National Conference on 
Artificial Intelligence (pp. 562-569). San Jose, CA: AAAI Press. 
Burkhard, H. D., Hannebauer, M. & Wendler, J. (1998). Belief-desire-intention deliberation in artificial 
soccer. AI Magazine, 19(3), 87-93. 
Cassimatis, N., Bello, P., & Langley, P. (2008). Ability, breadth and parsimony in computational models 
of higher-order cognition. Cognitive Science, 32(8), 1304-1322. 
122
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
Champandard, A. (2007). Behavior trees for next-gen game AI. In Proceedings of the Game Developers 
Conference, Lyon. France. 
Choi, D. (2010). Coordinated execution and goal management in a reactive cognitive architecture. 
Doctoral dissertation, Department of Aeronautics and Astronautics, Stanford University, Stanford, CA. 
Coddington, A.M. (2006). Motivations for MADbot: A motivated and goal-driven robot. Proceedings of 
the Twenty-Fifth Workshop of the UK Planning and Scheduling Special Interest Group (pp. 39-46). 
Nottingham, UK: [http://www.cs.nott.ac.uk/~rxq/PlanSIG/PlanSIG06.htm]. 
Cox, M.T. (2007). Perpetual self-aware cognitive agents. AI Magazine, 28(1), 32-45. 
Cox, M.T., & Veloso, M.M. (1998). Goal transformations in continuous planning. In M. desJardins (Ed.), 
Proceedings of the Fall Symposium on Distributed Continual Planning (pp. 23-30). Menlo Park, CA: 
AAAI Press. 
Cushing, W., Benton, J., & Kambhampati, S. (2008). Replanning as a deliberative re-selection of 
objectives (Technical Report). Computer Science and Engineering Department, Arizona State 
University, Tempe, AZ. 
Cutumisu, M., & Szafron, D. (2009). An architecture for game behavior AI: Behavior multi-queues. 
Proceedings of the Fifth AAAI Artificial Intelligence and Interactive Digital Entertainment Conference 
(pp. 20-27). Stanford, CA: AAAI Press.  
Dastani, M., van Riemsdijk, B., & Meyer, J.-J. (2006). Goal types in agent programming. Proceedings of 
the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 1285-
 1287). Hakodate, Japan: ACM Press. 
DeJong, G. (1993). Investigating explanation-based learning. Norwell, MA: Kluwer Academic 
Publishers. 
Do, M.B., Benton, J., van den Briel, M., & Kambhampati, S. (2007). Planning with goal utility 
dependencies. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence 
(pp. 1872-1878). Hyderabad, India: Professional Book Center. 
Falkenhainer, B. (1988). Learning from physical analogies (Technical Report UIUCDCS-R-88-1479). 
Department of Computer Science, University of Illinois, Urbana-Champaign, IL.   
Falkenhainer, B., Forbus, K.D., & Gentner, D. (1989). The structure-mapping engine: Algorithm and 
examples. Artificial Intelligence, 41(1), 1-63. 
Fasciano, M.J. (1996).  Real-time case-based reasoning in a complex world (Technical Report TR-96-
 05). Computer Science Department, the University of Chicago, Illinois. 
Forbus, K. (1984). Qualitative process theory. Artificial Intelligence, 24, 85-168. 
Forbus, K. & de Kleer, J. (1993). Building problem solvers. Cambridge, MA: MIT Press. 
Fritz, C., and McIlraith, S. A. (2007). Monitoring plan optimality during execution. Proceedings of the 
Seventeenth International Conference on Automated Planning and Scheduling (pp. 144-151). 
Providence, Rhode Island: ACM Press. 
Goldman, R.P. (2009). Partial observability, quantification, and iteration for planning: Work in progress. 
In C. Fritz, S. McIlraith, S. Srivastava, & S. Zilberstein (Eds.) Generalized Planning: Macros, Loops, 
Domain Control: Papers from the ICAPS Workshop. Thessaloniki, Greece: 
[http://www.cs.berkeley.edu/~siddharth/genplan09/]. 
Gordon, E. & Logan, B. (2004). Game over: You have been beaten by a GRUE. In D. Fu & J. Orkin 
(Eds.) Challenges in Game AI: Papers of the AAAI?04 Workshop (Technical Report WS-04-04).  San 
Jos?, CA: AAAI Press. 
Han, D. & Barber, K. (2005). Desire-space analysis and action selection for multiple dynamic goals. In 
Computational Logic in Multi-Agent Systems (pp. 249-264). Berlin: Springer.  
Hanheide, M., Hawes, N., Wyatt, J., G?belbecker, M., Brenner, M., Sj??, K., Aydemir, A., Jensfelt, P., 
Zender, H., and Kruijff, G-J. (2010). A framework for goal generation and management. In D.W. Aha, 
123
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
M. Klenk, H. Mu?oz-Avila, A., Ram, & D. Shapiro (Eds.) Goal-directed autonomy: Notes from the 
AAAI Workshop (W4). Atlanta, GA: AAAI Press. 
Harland, J., Thangarajah, J., Morley, D., and Yorke-Smith, N. (2010). Operational behaviour for 
executing, suspending, and aborting goals in BDI agent systems. In A. Omicini, S. Sardina, & W. 
Vasconcelos (Eds.) Declarative Agent Languages and Technologies: Papers from the AAMAS 
Workshop. Toronto, CA: [http://goanna.cs.rmit.edu.au/~ssardina/DALT2010]. 
Hawes, N. (2011). A survey of motivation frameworks for intelligent systems. Artificial Intelligence, 
175(5-6), 1020-1036. 
Hawes, N., Hanheide, M., Hargreaves, J., Page, B., Zender, H., & Jensfelt, P. (2011). Home alone: 
Autonomous extension and correction of spatial representations (pp. 3907-3914). In Proceedings of the 
IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE Press. 
Hinrichs, T.R., & Forbus, K.D. (2007). Analogical learning in a turn-based strategy game. Proceedings of 
the Twentieth International Joint Conference on Artificial Intelligence (pp. 853-858). Hyderabad, India: 
Professional Book Center. 
Jaidee, U., Munoz-Avila, H., & Aha, D.W. (2013). Case-based goal-driven coordination of multiple 
learning agents. Proceedings of the Twenty-First International Conference on Case-Based Reasoning 
(pp. 164-178). Saratoga Springs, NY: Springer. 
Jones, R.M., Laird, J.E., Nielsen, P.E., Coulter, K.J., Kenny, P., & Koss, F.V. (1999). Automated 
intelligent pilots for combat flight simulation. AI Magazine, 20(1), 27-41. 
Josephson, J.R., & Josephson, S.G. (1994). Abductive inference. Cambridge, UK: Cambridge University 
Press. 
Kaelbling, L.P., Littman, M.L. & Cassandra, A.R. (1998). Planning and acting in partially observable 
stochastic domains. Artificial Intelligence, 101, 99-134. 
Kaminka, G.A., & Burghart, C.R. (Eds.) (2007). Evaluating architectures for intelligence: Papers from 
the AAAI Workshop (Technical Report WS-07-04). San Mateo, CA: AAAI Press. 
Klenk, M., Molineaux, M., & Aha, D.W. (2013). Goal-driven autonomy for responding to unexpected 
events in strategy simulations. Computational Intelligence, 29(2), 187-206. 
Kuhn, L., Price, B., de Kleer, J., Bo, M., & Zhou, R. (2008). Pervasive diagnosis: The integration of 
diagnostic goals into production plans. Proceedings of the Twenty-Third Conference of the Association 
for the Advancement of Artificial Intelligence (pp. 1306-1312). Chicago, IL: AAAI Press. 
Kurup, U., Lebiere, C. Stentz, A. & Hebert, M. (2012). Using expectations to drive cognitive behavior. In 
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Ontario, Canada: AAAI 
Press. 
Laird, J.E. (2008). Extending the Soar cognitive architecture. Proceedings of the First Artificial General 
Intelligence Conference (pp. 224-235). Memphis, TN: IOS Press. 
Langley, P., & Choi, D. (2006). A unified cognitive architecture for physical agents. Proceedings of the 
Twenty-First National Conference on Artificial Intelligence. Boston, MA: AAAI Press. 
Leake, D. (1991). Goal-based explanation evaluation. Cognitive Science, 15, 509-545. 
Leake, D. B., & Ram, A. (1995). Learning, goals, and learning goals: a perspective on goal-driven 
learning. Artificial Intelligence Review, 9(6), 387-422. 
Leake, D. & McSherry, D. (2005). Introduction to the special issue on explanation in case-based 
reasoning. Artificial Intelligence Review, 24(2), 103-108. 
Mancarella, P., Sadri, F., Terreni, G., & Toni, F. (2005). Planning partially for situated agents. In 
Computational Logic in Multi-Agent Systems (pp. 230-248). Berlin: Springer. 
Marinier, B., van Lent, M., and Jones, R. (2010). Applying appraisal theories to goal directed autonomy. 
In D.W. Aha, M. Klenk, H. Mu?oz-Avila, A., Ram, & D. Shapiro (Eds.) Goal-directed autonomy: 
Notes from the AAAI Workshop (W4). Atlanta, GA: AAAI Press. 
124
BREADTH OF APPROACHES TO GOAL REASONING: A RESEARCH SURVEY 
Meneguzzi, F.R., & Luck, M. (2007). Motivations as an abstraction of meta-level reasoning.  Proceedings 
of the Fifth International Central and Eastern European Conference on Multi-agent Systems (pp. 204-
 214). Leipzig, Germany: Springer. 
Mitchell, T. (1997). Machine learning. Columbus, Ohio: McGraw-Hill. 
Molineaux, M., Klenk, M., & Aha, D.W. (2010). Goal-driven autonomy in a Navy strategy simulation. In 
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. Atlanta, GA: AAAI 
Press.  
Mu?oz-Avila, H., & Aha, D.W. (2004). On the role of explanation for hierarchical case-based planning in 
real-time strategy games. In D. McSherry & P. Cunningham (Eds.), Explanation in Case-Based 
Reasoning: Papers from the ECCBR Workshop (Technical Report 142-04). Madrid, Spain: Universidad 
Complutense de Madrid, Departamento de Sistemas Inform?ticos y Programaci?n. 
Mu?oz-Avila, H., Jaidee, U., Aha, D.W., & Carter, E. (2010).  Goal directed autonomy with case-based 
reasoning. Proceedings of the Eighteenth International Conference on Case-Based Reasoning (pp. 228-
 241). Alessandria, Italy: Springer. 
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. 
Norman, T.J., & Long, D. (1996). Alarms: An implementation of motivated agency. In Intelligent Agents 
II: Agent Theories, Architectures, and Languages (pp. 219-234). Berlin: Springer.  
Onta?on, S., Mishra, K., Sugandh, N., & Ram, A. (2010). On-line case-based planning. Computational 
Intelligence, 26(1), 84-119. 
Patcha, A., & Park, J.-M. (2007). An overview of anomaly detection techniques: Existing solutions and 
latest technological trends. Computer Networks, 51, 3448-3470. 
Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, UK: Cambridge University 
Press. 
Powell, J., Molineaux, M., & Aha, D.W. (2011). Active and interactive learning of goal selection 
knowledge. In Proceedings of the Twenty-Fourth Florida Artificial Intelligence Research Society 
Conference. West Palm Beach, FL: AAAI Press. 
Rao, A., & Georgeff, M. (1995). BDI agents: From theory to practice. Proceedings of the First 
International Conference on Multi-agent Systems (pp. 312-319). Menlo Park, CA: AAAI Press. 
Roseman, I. & Smith, C. A. (2001). Appraisal theory: Overview, assumptions, varieties, controversies. In 
K. R. Scherer, A. Schorr, & T. Johnstone (Eds.) Appraisal Processes in Emotion: Theory, Methods, 
Research (pp. 3-19). New York and Oxford: Oxford University Press. 
Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd ed.). Upper Saddle 
River, NJ: Prentice Hall. 
Sardina, S., & Padgham, L. (2010). A BDI agent programming language with failure handling, 
declarative goals, and planning. Autonomous Agents and Multi-Agent Systems, 23(1), 18-70. 
Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding: An inquiry into human 
knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. 
Schank, R. C. (1982). Dynamic memory: A theory of reminding and learning in computers and people. 
Cambridge, MA: Cambridge University Press. 
Schank, R. C., & Owens, C. C. (1987). Understanding by explaining expectation failures. In R. G. Reilly 
(Ed.), Communication Failure in Dialogue and Discourse. New York: Elsevier Science. 
Scheutz, M. & Schermerhorn, P. (2009). Affective goal and task selection for social robots. In J. 
Vallverd? & D. Casacuberta (Eds.) The Handbook of Research on Synthetic Emotions and Sociable 
Robotics. Hershey, PA: IGI Publishing. 
Shapiro, S., Sardina, S., Thangarajah, J., Cavedon, L., & Padgham, L. (2012). Revising conflicting 
intention sets in BDI agents. Proceedings of the Eleventh International Conference on Autonomous 
125
S. VATTAM, M. KLENK, M. MOLINEAUX, AND D.W. AHA 
Agents and Multiagent Systems (pp. 1081-1088). Valencia, Spain: International Foundation for 
Autonomous Agents and Multiagent Systems. 
Shortliffe, E.H. (1976). Computer-based medical consultations: MYCIN. New York: Elsevier/North 
Holland. 
Smith, D.E. (2004). Choosing objectives in over-subscription planning, Proceedings of Fourteenth 
International Conference on Automated Planning and Scheduling (pp. 393 ? 401). Whistler, British 
Columbia, Canada: AAAI Press. 
Sun, R. (2009). Motivational representations within a computational cognitive architecture. Cognitive 
Computation, 1(1), 91-103. 
Sutton, R.S., & Barto, A.G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT 
Press. 
Talamadupula, K., Benton, J., Kambhampati, S., Schermerhorn, P., & Scheutz, M. (2010). Planning for 
human-robot teaming in open worlds. ACM Transactions on Intelligent Systems and Technology, 1(2), 
Article 14. 
VanLehn, K., Jones, R. M., and Chi, M. T. H. (1992). A model of the self-explanation effect. Journal of 
the Learning Sciences, 2(1), 1-59. 
Veloso, M. M., Pollack, M. E., & Cox, M. T. (1998). Rationale-based monitoring for continuous planning 
in dynamic environments. In R. Simmons, M. Veloso, & S. Smith (Eds.), Proceedings of the Fourth 
International Conference on Artificial Intelligence Planning Systems (pp. 171-179). Menlo Park, CA: 
AAAI Press. 
Vukovic, M., & Robinson, P. (2005). GoalMorph: Partial goal satisfaction for flexible service 
composition. International Journal of Web Services Practices, 1(1-2), 40-56. 
Watson, I. (1997). Applying case-based reasoning: techniques for enterprise systems. San Francisco, CA: 
Morgan Kaufmann. 
Weber, B., Mateas, M., & Jhala, A. (2010). Applying goal-driven autonomy to StarCraft, In Proceedings 
of Sixth Artificial Intelligence and Interactive Digital Entertainment. Stanford, CA: AAAI Press. 
Weber, B., Mateas, M., & Jhala, A. (2012). Learning from demonstration for goal-driven autonomy. In 
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Toronto, Canada: AAAI 
Press. 
Wilson, M., Molineaux, M., & Aha, D.W. (2013). Domain-independent heuristics for goal formulation. In 
Proceedings of the Twenty-Sixth Florida Artificial Intelligence Research Society Conference. St. Pete 
Beach, FL: AAAI Press. 
Yoon, S., Fern, A., and Givan, B. (2007). FF-replan: A baseline for probabilistic planning. Proceedings of 
Seventeenth International Conference on Automated Planning and Scheduling (pp. 352-359).  
Providence, Rhode Island: ACM Press. 
Young, J. and Hawes, N. (2012) Evolutionary Learning of Goal Priorities in a Real-Time Strategy Game. 
In Proceedings of the Eighth AAAI Conference on Artificial Intelligence and Interactive Digital 
Entertainment. Stanford, CA: AAAI Press. 
126
2013 Annual Conference on Advances in Cognitive Systems: Workshop on Goal Reasoning 
 
 
Towards Applying Goal Autonomy for Vehicle Control 
 
Mark Wilson MARK.WILSON@NRL.NAVY.MIL 
Naval Research Laboratory, Navy Center for Applied Research in AI, Washington, DC 20375  
 
Bryan Auslander BRYAN.AUSLANDER@KNEXUSRESEARCH.COM 
Knexus Research Corporation, 9120 Beachway Lane, Springfield, VA 22153 
 
Benjamin Johnson 
 
BLJ39@CORNELL.EDU 
Cornell University, School of Mechanical and Aerospace Engineering, Ithaca, NY 14853 
 
Thomas Apker THOMAS.APKER@EXELISINC.COM 
Exelis Inc., 2560 Huntington Ave, Alexandria, VA 22303 
 
James McMahon 
 
JAMES.MCMAHON@NRL.NAVY.MIL 
Naval Research Laboratory, Physical Acoustics, Code 7130, Washington, DC 20375  
 
David W. Aha DAVID.AHA@NRL.NAVY.MIL 
Naval Research Laboratory, Navy Center for Applied Research in AI, Washington, DC 20375  
Abstract 
Unmanned vehicles have been the focus of active research on autonomous motion planning, both 
deliberative and reactive.  However, they are fundamentally limited in their autonomy by an 
inability to independently reason about, prioritize, and change the goals they pursue.  We describe 
two new projects in which we are incorporating goal autonomy on unmanned vehicle platforms.  
We will apply the Goal-Driven Autonomy (GDA) model to permit our vehicles to reason about 
their objectives and discuss how properties of the domains affect the application of GDA. 
1.  Introduction 
Unmanned vehicles are often used to explore and act in regions that are dangerous or otherwise 
undesirable for humans to visit.  Many unmanned vehicles are remotely operated: Rather than 
acting autonomously using onboard control systems, they act directly on control commands from 
human operators to execute their missions.  Remote operation may be desirable in some 
circumstances (e.g., to maximize control over the safety of an unusually valuable vehicle, such as 
a Mars rover).  However, in many instances we would prefer that unmanned vehicles operate 
without human input, which would reduce operator load, avoid human error in operating the 
vehicles, and allow the vehicles to continue pursuing their missions when out of contact with 
human operators. 
Most efforts to provide greater autonomy for unmanned vehicles have focused on a problem 
we refer to as motion autonomy, the primary example of which is to navigate autonomously to a 
127
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
desired location or to follow a prescribed route (e.g., Tan, Sutton, & Chudley, 2004; Wooden et 
al., 2010).  Although motion autonomy techniques are broadly adaptable and allow robotic 
vehicles to autonomously accomplish many desired tasks, they do not allow vehicles to 
dynamically self-select goals to pursue or to re-prioritize their existing goals.  This limits motion 
autonomy to predictable environments, as changes in the environment or previously unobserved 
facts may require an agent to select new objectives or mission parameters to act correctly. 
 To address this, we describe two new efforts to enrich unmanned vehicles? reasoning with goal 
autonomy: the ability to dynamically formulate, prioritize, and assign goals1.  Enabling the 
vehicle to decide what goal it should accomplish in any given situation, in addition to existing 
techniques for achieving those goals autonomously, allows the vehicle to act correctly in a 
broader range of situations without supervision.  This is especially valuable in long-duration 
missions in dynamic environments, where the vehicle is likely to encounter a variety of situations 
too complex to enumerate a priori.  For instance, a maritime vehicle on a long mission may 
encounter a broad range of underwater hazards and opportunities for investigation in 
unpredictable configurations.  To provide the ability to select appropriate goals in a wide range of 
situations, we will apply Goal-Driven Autonomy (GDA), a model for responding to unexpected 
occurrences by formulating and reprioritizing goals (Molineaux, Klenk, & Aha, 2010a). 
In one project, Autonomous Behavior Technology for Unmanned Underwater Vehicles, we 
will apply the GDA model to an underwater vehicle, providing it the decision-making ability 
necessary to conduct long duration, independent missions with varying objectives.  In another 
project, Autonomous Systems Integration, we will apply the GDA model to the task of plume-
 tracking, in which ground and air vehicles must cooperate to discover the source of an airborne 
contaminant, while also collecting and transferring power to avoid disruption of activity from loss 
of battery reserves. 
GDA has previously been applied in several simulated test domains inspired by real-world 
scenarios (Molineaux et al., 2010a) as well as game environments (Weber, Mateas, & Jhala, 
2012; Jaidee, Mu?oz-Avila, & Aha, 2013).  However, the projects presented here, although 
currently in simulation, will be our first application of GDA on real-world robots or vehicles. 
In this paper, we present an overview of GDA, discuss the parameters of the application 
domains, present initial architectures for both projects, and discuss aspects of applying goal 
autonomy to situated agents and integrating goal autonomy with motion autonomy in two very 
different problem domains. 
2.  An Overview of Goal-Driven Autonomy 
Goal-Driven Autonomy (GDA) (Figure 1) is a model for online planning with reasoning about 
goal formulation and management (Molineaux et al., 2010a).  It extends Nau?s (2007) model of 
online planning, using the Controller to create and pursue new goals when unexpected events 
occur in complex environments (e.g., stochastic, partially-observable).   
The GDA Controller uses the Planner to create a plan to achieve the current goal ? from the 
current state ?0.  The Planner outputs to the Controller a sequence of actions < ?1, ? , ?? > to 
execute, and a corresponding sequence of expected states < ?1, ? , ?? >, where ?? is a goal state 
for ?. 
                                                 
1 We use ?goal autonomy? rather than ?goal reasoning? throughout, to distinguish from ?motion 
autonomy.? 
128
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
As the Controller executes the plan in the state transition environment, it performs a four-step 
cycle to manage goals in response to unexpected events: 
1. Discrepancy detection:  After the Controller executes action ??, the Discrepancy Detector 
compares the new observed state ?? to the corresponding expectation ??.  If they differ, a 
discrepancy has occurred and the GDA model attempts to explain and resolve it. 
2. Discrepancy explanation:  If discrepancies between the new state and the expectation are 
detected, the Explanation Generator attempts to create an explanation of the discrepancies. 
3. Goal formulation:  The Goal Formulator creates new goals that are appropriate given the 
explanation. 
4. Goal management:  Finally, the Goal Manager prioritizes and selects among the Pending 
Goals, including new goals from the Goal Formulator.  The selected goal is then given to the 
Planner to generate a new plan and expectations. 
3.  Related Work 
Related work on autonomy focuses on the areas of goal autonomy, which addresses management 
of the agent?s objectives, and motion autonomy, which addresses tasks such as safely moving a 
vehicle from one position to another.   
Although the projects presented here represent our first efforts to use the GDA model on 
situated vehicles, GDA has been used in the past to control simulated agents.  The ARTUE agent 
has been used to guide simulated vehicles inspired by Mars rovers (Wilson, Molineaux, & Aha 
2013) as well as teams of simulated naval vessels (Molineaux et al., 2010a), but has never been 
integrated with dynamic motion controllers for real robots.  EISBot (Weber et al., 2012), GRL 
(Jaidee, Mu?oz-Avila, & Aha, 2012), and GDA-C (Jaidee et al., 2013) have all been used to 
successfully control all or part of a player?s forces in real-time strategy games, a form of 
centralized direction for multi-agent systems.  We present an architecture for centralized 
direction, but our system must interface with group control algorithms designed to prevent 
collisions while allowing several agents to work toward a common goal. 
Other types of goal autonomy have also been used to control simulated agents.  The ICARUS 
cognitive architecture (Choi, 2011) has been applied to simulated car-driving domains with a 
Figure 1: The Goal-Driven Autonomy (GDA) conceptual model. 
129
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
reactive goal-management component that introduces new goals taken from a long-term goal 
memory, given general and domain-specific conditions.  Coddington?s (2006) MADBot 
architecture, which can introduce new goals when domain-specific motivational thresholds are 
exceeded, has been used to control simulated ground-vehicle robots. 
Goal autonomy systems have also received attention on robotic platforms.  Dora the Explorer 
(Hawes et al., 2011) is a robot with goal autonomy capabilities, but is limited to goals focused on 
exploring and categorizing its environment.  The SapaReplan planner has been used in the 
DIARC robotic-control architecture (Schermerhorn et al., 2009) to allow a robotic agent to 
optionally pursue soft goals by taking advantage of ungrounded opportunities in the environment, 
which it models using simulated objects called counterfactuals.  However, SapaReplan can 
pursue such soft goals only temporarily and must not allow them to interfere with its required 
hard goals.  This contrasts with our use of GDA, which permits the indefinite suspension of goals. 
An alternative means of encoding multiple objectives onto an autonomous platform is the use 
of correct-by-construction controller synthesis.  Kress-Gazit, Fainekos, and Pappas (2009) present 
a technique for specifying multiple goals and the conditions required to achieve them as Linear 
Temporal Logic (LTL) formulas.  These formulas are used to generate a Finite-State Automaton 
(FSA) controller that is guaranteed to eventually accomplish all specified goals, assuming the 
required conditions are met and the environment meets defined expectations.  However, the 
computational cost of constructing the FSA grows exponentially with the number of goals and 
conditions, and requires pre-specification of goals for all situations in which the robot must 
act.  Thus, for large problems this framework requires a goal manager to provide a receding 
horizon for the controller as in (Wongpiromsarn, Topcu, & Murray, 2009).  Livingston, Murray, 
and Burdick (2012) and Sarid, Xu, and Kress-Gazit (2012) introduce limited forms of goal 
formulation that respond competently to unexpected states and surprising opportunities, 
respectively, for synthesized controllers.  Using controllers generated from LTL formulas will 
allow a task planner to plan atomic actions that can be decomposed into multiple LTL-level goals, 
and ensure that agents that are assigned complex, multi-stage tasks will complete them or provide 
information about unexpected states in the environment. 
Approaches to autonomous control for underwater vehicles can be broadly classed into 
deliberative and reactive motion planning.  Deliberative approaches variously use, among others, 
genetic algorithms (Alvarez, Caiti, & Onken, 2004), rapidly-exploring random trees (Tan et al., 
2004), A* search over discretized environments (Garau, Alvarez, & Oliver, 2005), and gradient-
 descent optimization over cost functions (Kruger, Stolkin, Blum, & Briganti, 2007).  Plaku and 
McMahon (2013) address simultaneous task and motion planning for underwater vehicles using 
LTL task specifications with sampling-based deliberative methods to avoid the complexity of 
guaranteed correctness.  Reactive, or local, planning approaches are particularly useful in regions 
that are large or not well-mapped.  Virtual potential fields (Khatib, 1985) are a common reactive 
system.  Antonelli et al. (2001) alleviate the risk of this approach ?trapping? a vehicle in local 
minima by adding a supervisor module to modify the vehicle?s behavior based on the 
environment?s geometry.  While most of these approaches assume holonomic vehicle models, 
Apker and Potter (2012) describe a means of encoding a vehicle?s dynamic constraints to improve 
performance and reliability.  However, unlike our work, these systems address motion autonomy 
rather than the problem of goal autonomy.   
The IvP Helm (Benjamin et al., 2010) provides a reactive UUV controller based on multi-
 objective optimization rather than potential fields, and exhibits limited goal autonomy by 
130
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
changing modes based on the state.  However, it does not reason about goals the vehicle should 
accomplish in the environment. 
Research on autonomy for individual air and ground vehicles is more mature than for 
underwater vehicles, and recent work has focused on guiding groups of vehicles to accomplish 
given tasks.  Several authors have explored combining potential fields with FSAs to allow their 
systems to react to state changes by changing agent objectives.  Mather and Hsieh (2012) apply 
this approach to robots engaged in surveillance tasks.  Worcester, Rogoff, and Hsiehm (2011) 
develop a finite state representation of a construction task, and use a centralized system to 
partition its components among a team of robots.  Martinson and Apker (2012) describe a 
physics-inspired FSA that operates in the robots? behavior space, changing the way they generate 
motion commands from potential fields depending on their proximity to a target and navigation 
quality. In contrast to this body of work, we instead focus on goal autonomy, and discuss 
applications of these methods to teams of unmanned vehicles in Section 5. 
4.  Application Domains  
4.1  Long-Duration Underwater Autonomy 
Autonomously-controlled unmanned underwater vehicles (UUVs) have been used for underwater 
exploration (Antonelli et al., 2001), observation and inspection of underwater structures 
(Antonelli et al., 2001), scientific observation (Binney, Krause, & Sukhatme, 2010), and mine 
countermeasures (LePage & Schmidt, 2002).  However, these missions typically are of short 
duration (at most eight to sixteen hours) and operate over a small region. 
In our first project we will apply GDA to autonomously direct a UUV on unsupervised long-
 duration missions.  These missions could eventually last weeks or months.  Long-term missions 
may require the vehicle to pursue different goals at different times, such as goals related to 
transiting to a region, avoiding other vessels, surveying oceanic geography, detecting mines and 
other manufactured obstacles, and taking oceanographic measurements.  The ocean environment 
is highly unpredictable, and a UUV on a long-duration mission must be able to react intelligently 
to unexpected events and objects.  Throughout the course of a mission a UUV may need to 
change its objectives, or even abort its mission, due to unforeseen environmental hazards, 
underwater barriers, encounters with other vehicles, or failures of onboard systems. 
These missions may motivate goal autonomy.  Although motion autonomy could correctly 
guide the vehicle on any task selected in response to such anomalies, goal autonomy provides the 
ability to select goals generally and dynamically without reference to a human operator.  Because 
an at-sea UUV has very limited communication with human operators, the vehicle must make 
goal decisions autonomously. 
For example, consider a UUV taking oceanographic measurements (e.g., water salinity) over 
a region, when a surface vessel enters its area and stops.  If the measurements are being taken 
near the ocean surface, attempting to take them at or near the new vessel?s position may risk 
collision.  While motion autonomy systems can likely minimize risk and maximize data quality, 
they cannot consider the broader implications of the vessel?s arrival and how best to respond.  If it 
is a friendly vessel, it may be appropriate for the UUV to surface, broadcast that scientific 
measurements are being taken, and request that the vessel vacate the area.  If the UUV is a 
military vehicle operating in contested or unfriendly waters, and the vessel is not friendly, it may 
131
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
be appropriate to halt and silence the UUV to avoid detection.  If in open waters, the UUV may 
be correct to abort the data-collection mission and notify its operator of the surface vessel?s 
approach.  Goal-driven autonomy is a general model for generating appropriate responses to 
unplanned situations, and is therefore well-suited to the control of unmanned vehicles at sea. 
Key challenges in this domain include: 
? Unpredictable environments:  Existing deliberative motion autonomy techniques for UUVs 
require advance knowledge of the environment in which the path will be planned while 
existing reactive motion autonomy techniques respond to unknown environments 
unpredictably.  Both present challenges in long duration missions where a UUV may venture 
into waters that are not well-charted or for which there are no reliable data on currents. 
Furthermore, deliberative techniques have difficulty planning for dynamic obstacles whose 
motion may not be well understood, while reactive techniques can complicate the task of 
detecting discrepancies that occur during motion plan execution. 
? Computational constraints: The CPUs that our agent will use to control the UUV are not 
powerful, and necessitate an emphasis on computationally efficient solutions. 
? Uncertain environment state:  The lack of many sensors often found on ground vehicles and 
other robots (e.g., for localization, visual inspection, range-finding), combined with noisy 
readings from sensors that are available, presents unique challenges. 
4.2  Airborne Contaminant Detection 
Unmanned air vehicles (UAVs) are used in remote sensing, scientific research, and search-and-
 rescue applications.  Unmanned ground vehicles (UGVs) can be used to explore and act in 
situations that are dangerous to humans, such as in contaminated waste cleanup and explosive 
ordnance disposal missions, and to provide logistics support, such as carrying equipment. 
In our second project, we will apply GDA to direct a team of UAVs equipped with aerosol 
sensors and UGVs with support equipment that includes landing pads, UAV rechargers, and solar 
panels.  We know that the environment is bounded and that autonomous navigation is possible, 
but make no assumptions about initial plume locations, availability of traversable paths for the 
UGVs, or locations of brightly lit areas for solar recharging.  This problem combines motion 
planning, task scheduling, and resource allocation in an unknown environment.  
Conventional motion autonomy methods require a complete output specification for each 
vehicle given possible sensor inputs.  In our scenario this is computationally intractable given the 
potential number of vehicles, sensors, and actions.  Using GDA to make goal and task level 
decisions permits the synthesis of controllers that encode a limited number of relevant responses 
given the current goal, thus making the motion autonomy problem tractable. 
Unlike the UUV domain, in the UAV/UGV domain we must control several vehicles to 
cooperatively achieve goals.  However, if goal decisions are decentralized among vehicles, each 
vehicle would need to model all its teammates? possible goals and plans, or risk interference with 
teammates pursuing different goals.  By centralizing GDA to coordinate the vehicles, we can 
guarantee all vehicles will pursue the same goal at any given time, and that the goal will be 
achieved based on guarantees offered by lower-layer controllers.  This leads to the key challenges 
for GDA implementation in this domain: 
? Motion abstraction: The GDA Controller must direct multiple autonomous vehicles to 
accomplish tasks requiring solutions to continuous-motion problems.  Multiple vehicles must 
132
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
autonomously carry out these tasks without interfering with each other, a problem too 
computationally intensive to solve at the GDA level.  Hence, we require abstract 
representations of the continuous motion problems that are suitable for computation at the 
goal autonomy layer, while supporting goal decisions that can be used as a basis for planning 
and controller synthesis for individual vehicles. 
? Individual discrepancies:  Although vehicles are directed in coordinating teams to achieve 
goals, discrepancies can still occur on the individual level (e.g., one vehicle?s battery may run 
low due to malfunction).  Our solution must manage goals and vehicle task assignments to 
permit responses to each vehicle?s discrepancies, while using abstracted representations of 
goals as team activities that can be continued in spite of individual discrepancies. 
5.  Applying Goal-Driven Autonomy 
GDA is well-equipped for its usual role in providing goal autonomy in task-planning domains.  
However, applying GDA in robotic vehicle domains requires appropriate abstractions from 
motion guidance to task-level actions.  In this section we describe different approaches to this 
multi-layered abstraction in our underwater autonomy and airborne contaminant domains. 
Factors such as environment predictability and the need for cooperation affect how GDA 
should be implemented and applied in a given domain.  For single vehicles operating in dynamic 
or poorly specified environments (e.g., Mars rovers or singleton UUVs), each sense-act cycle 
represents an opportunity to reevaluate and adjust the agent?s goals with respect to the most 
recent state.  Loosely coordinated teams, particularly those working closely with humans, benefit 
from a concurrent control and planning architecture in which the system?s goals are drawn from a 
limited set of easily interrupted goals whose supporting tasks can be learned offline 
(Talamadupula et al., 2011).  In contrast, tightly coordinated teams require team members to 
behave in a predictable manner so that their teammates can respond appropriately.  In this 
context, each individual?s behaviors for achieving goals should be guaranteed; hence, such 
systems can benefit from correct-by-construction controller synthesis (per team member).  In this 
case, goal interruption must occur safely, which requires extra time to make sure that each team 
member can safely interrupt its current goal and start another.  This delay decreases the reactivity 
of the goal autonomy layer. 
The granularity of atomic actions available to the GDA Controller can vary from simple (e.g., 
?go to x, y, z?) to complex (e.g., ?supply landing sites for the UAVs and recharge their batteries?).  
This granularity depends on properties of the underlying control layers, which in turn depend on 
environment predictability and team coordination required.  We present examples at opposite 
extremes of these domain properties, and note how these impact the granularity of the goals used. 
5.1  Autonomous Behavior Technology for UUVs 
While there is a large body of work on UUV motion autonomy, current approaches do not have 
the ability to reason about goals.  In our planned approach, GDA will allow a UUV to respond 
with appropriate actions to unexpected situations whenever the vehicle?s current set of goals is no 
longer satisfactory. 
  
133
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
5.1.1  Integration with Motion Autonomy Systems 
Deliberative motion autonomy techniques for UUVs require advance knowledge of the 
environment in which the path will be planned, any currents that must be taken into account, and 
the future motion of dynamic obstacles.  In a long-duration mission, a UUV may venture into 
waters that are not well-charted or for which there are no reliable data on currents.  Dynamic 
obstacles may include other vessels that are engaged in unpredictable maneuvering, or whose 
motion is not well-understood at the time of planning because sensor data are not conclusive.  
Without such useful constraints on the guidance problem, deliberative path planning alone may 
not be appropriate for a UUV on a long-duration mission. 
We will apply the MOOS-IvP autonomy architecture (Benjamin et al., 2010) to provide 
suitable path guidance.  MOOS is a message-passing middleware system with a centralized 
publish-subscribe model.  IvP Helm is a behavior-based MOOS application that chooses a desired 
heading, speed, and depth for the vehicle in a reactive manner to generate collision-free 
trajectories.  Unlike potential field methods, IvP Helm uses an interval programming technique 
that optimizes over an arbitrary number of objective functions to generate desired heading, speed, 
and depth values and activate or deactivate sensor payloads. 
We developed a new GDA agent architecture based on ARTUE (Molineaux et al., 2010a), are 
using it to control a UUV in simulation, and will later apply it to control our UUV.  The GDA 
Controller will direct the vehicle to perform various tasks (e.g., sensing, navigation) while 
preserving its ability to navigate partially unknown or poorly mapped environments.  It will 
accomplish this by activating and deactivating specified IvP Helm behaviors and altering the 
parameters of active behaviors.  While IvP Helm can make these decisions independently, it is a 
reactive mechanism and cannot deliberate about what goal the vehicle should pursue, which is the 
focus of GDA.  Figure 2 depicts our agent architecture, where GDA will direct goal autonomy, 
IvP Helm will provide motion guidance, and Bluefin?s Huxley control architecture will execute 
low-level control. 
 The UUV domain has few constraints on the environment, which distinguishes it from the 
contaminant detection domain, where we will use a constrained environment and abstractions to 
Figure 2: The GDA agent architecture for controlling a UUV with MOOS-IvP. 
134
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
provide guarantees of motion controller correctness.  The ocean is large, sparsely mapped, and 
dynamic.  Therefore, it is not possible to provide guaranteed-correct motion control (Kress-Gazit 
et al., 2009).  Furthermore, unlike the controllers we use on the UAVs, IvP Helm cannot 
independently recognize that a navigational failure has taken place. 
To allow IvP Helm independent control over motion while preserving the GDA Controller?s 
ability to recognize anomalous situations, we are developing an abstraction that replaces expected 
states in our Discrepancy Detector with semantically richer expectations.  This will allow our 
agent to ignore certain values or expect values in some range between actions, and to resolve 
intervals between actions by checking conditions during execution rather than computing the 
expected duration of a process from a domain model.  This would allow the goal reasoner to, for 
example, expect position values to fall within some range until a motion is completed or some 
other unexpected event (e.g., a barrier) triggers a discrepancy.  Using this technique affords better 
separation of responsibilities between the goal autonomy layer and the motion autonomy layer.  It 
also offers improved performance by eliminating discrepancies caused by allowing the motion 
autonomy layer to independently execute motion tasks and by obviating precise modeling of 
vehicle motion and other lower-level processes during planning. 
5.1.2  Modeling Uncertainty 
Our current model of discrepancies assumes that observations are not noisy.  This assumption 
does not hold in real-world environments, where sensors are noisy and sometimes faulty, which 
can cause uncertainty in observations and the estimated state.  The discrepancy model also 
assumes that observations occur at precise times relative to actions taken (i.e., either immediately 
after one action or immediately after the amount of time necessary for an event to occur as 
predicted by the domain model).  This second assumption is also unrealistic: the sampling rate of 
the sensors may not correspond precisely to the timeline of the expected states, and the 
transmission and reception of the data by asynchronous processes that lack maximum-update-
 time guarantees may interfere with the timely delivery of the state observation.  Hence, when 
detecting discrepancies, observations may not correspond exactly to expected states as generated 
by a planner, though they may be closely correlated. 
To address these issues, we intend to improve our new expectations model by introducing a 
probabilistic model that assigns a distribution to each value or range in an expectation.  This will 
allow for computing a likelihood value for each observed state, which can be used to detect 
discrepancies (i.e., under some conditions, a low likelihood for an observation may indicate a 
high probability that it is anomalous). 
5.2  Autonomous Systems Integration 
In this project, we will apply GDA to the problem of controlling a team of UAVs and UGVs to 
locate the source for a plume of airborne particles.  While the maneuvering of sensors for plume 
source location has been previously studied (Spears, Thayer, & Zarzhitsky, 2009), little work has 
been done on providing autonomous support for such a team.  We will apply goal autonomy to 
simultaneously coordinate search operations and logistics support, including safe landing zones 
and recharging stations. 
  
135
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
5.2.1  Integration with Motion Autonomy Systems 
We use a hierarchical approach for implementing team motion autonomy that involves three 
decision layers.  The highest layer uses GDA to select mission goals.  The GDA Controller uses a 
SHOP2PDDL+ planner (Molineaux, Klenk, & Aha, 2010b) to produce a sequence of actions and 
associated safety conditions.  The bounded nature of the UAVs? flight envelope guarantees that 
this planner will generate achievable plans, which are executed by an FSA on each vehicle to 
allow local trajectory planning, execution, and discrepancy detection.  To increase robustness to 
agent failure and reduce the size of the FSA, we are employing the Physicomimetics swarm 
control algorithm (Apker & Potter, 2012) to reactively generate vehicle trajectories. 
To bridge between high-level goals and low-level tasks in the GDA Controller, we will use 
LTL as a translation mechanism between decision layers.  LTL controller synthesis has been used 
to automatically produce verifiable FSA controllers to accomplish complex tasks on autonomous 
robots (Kress-Gazit et al., 2009).  In this approach, the GDA Controller will generate a set of 
complex actions and constraints for each agent's motion autonomy system, and the LTL 
Controller will generate simpler actions (e.g., ?go to ??, ?, ???) for the agent?s guidance system.  
This contrasts with previous approaches, which required LTL tasks to be pre-specified, or 
required pre-specified templates that can assign newly-discovered areas of interest as new 
destination goals (Sarid et al., 2012).  
For a group of collaborating robots, the LTL controller synthesis problem quickly becomes 
infeasible. We are addressing this by using goal autonomy to alleviate this state-space explosion 
problem by supplementing the mission goal with smaller, short term goals with mission 
constraints. That is, we will use it to decompose the complete task specification into smaller, local 
specifications for individual or small teams of UxVs, thus limiting the goals that are within the 
scope of the task.  This could reduce an infeasible task into smaller, more computationally 
efficient tasks for the LTL synthesizer. 
The FSA that LTL synthesis creates can be used by the GDA Controller to detect unexpected 
events during operation.  Discrepancies can be detected by comparing the FSA?s expected state 
with the agent?s observed state. 
Finally, the FSA is guaranteed to satisfy its underlying task specification, which provides a 
valuable check to ensure that the goals selected by the GDA Controller do not conflict with each 
other or with the mission?s safety constraints.  This guarantee on the FSA?s behavior assumes that 
the environment acts as expected, and that the robot?s sensors and actuators operate without error.  
We can relax these assumptions by using Johnson and Kress-Gazit?s (2012; 2013) method for 
analyzing the behavior of an LTL-synthesized controller, which tolerates errors in the sensing and 
actuation of the robot. After creating a probabilistic model of the robot?s interaction with the 
environment, their method uses model checking to find the probability that the robot exhibits a 
particular behavior (defined by an LTL formula). This will be used by the Discrepancy Explainer 
to diagnose the perceived discrepancy. 
5.2.2  Controlling a Team of Vehicles 
In the contaminant detection domain, several UAVs and UGVs must coordinate to locate the 
contaminant?s source.  While the vehicles are expected to execute maneuvers independently, their 
efforts should be centrally coordinated to complete the mission quickly and with minimal mutual 
interference.  Therefore, the GDA Controller must coordinate the vehicles? efforts.  Our strategy 
136
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
for solving this problem assigns the UAVs to follow plumes of contaminants to their source and 
uses UGVs in a support role. 
Figure 3 depicts our prototype architecture, which uses the MASON simulation toolkit (Luke, 
2005) to simulate vehicle motion and chemical-plume dynamics.  The mission goal is to detect 
possible plume locations.  Initially, the planner assigns all UAVs to small groups and directs each 
group to investigate a possible plume, or remain in reserve.  Each group?s plume assignment is 
passed to a separate intermediate level planner, which creates a lawnmower search pattern to 
follow.  (In our future work, we will replace this with LTL-synthesized controllers.)  All of the 
UAVs use Physicomimetics motion planning to jointly investigate each location in the pattern for 
evidence of a plume.   
The discrepancies that we currently model concern unexpectedly low UAV battery states, 
suspected plume locations, and task completion signals from groups or individual agents.  When a 
discrepancy is encountered, the GDA Controller reassesses its goals and forms new plans.  For 
instance, if an agent?s battery charge becomes crucially low, then the GDA Controller will assign 
a new goal for the agent to recharge its battery, and will change the group?s composition by 
tasking other vehicles to continue searching for plumes.  Later, we will model anomalies such as 
opportunities to deploy solar panels, which may interfere with UAV transport or landing 
operations, and winds that interfere with UAV flight and aerosol sensor performance. 
We will integrate UGVs in this domain.  They will transport UAVs to contaminated regions, 
harvest energy for battery power, and recharge the UAVs? batteries during operations.  Launch, 
landing, search patterns, and battery charging involve precise, coordinated motion control that 
can be achieved only in favorable conditions.  This requires guarantees on the agents? behavior 
throughout a maneuver, which is an ideal application of LTL control.  The GDA Controller 
complements this by managing higher level goals, scheduling these operations, and determining 
their locations. 
  
Figure 3. The GDA architecture for controlling the UAVs. 
137
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
6.  Discussion 
We based our implementation decisions on the degree of predictability in each environment and 
the need for agent cooperation.  These vary substantially between our two projects. 
6.1  Environment predictability 
Ocean currents, ship traffic, and underwater features are generally unknown in advance of 
deployment.  As a result, any motion autonomy algorithm that makes specific guarantees is bound 
to fail in the UUV domain.  There is little benefit in the UUV domain to synthesizing a guidance 
system more complex than a MOOS-IvP behavior, as the GDA Controller may frequently select 
new goals when more accurate states become available. 
In contrast, the plume detection environment can be observed and accurately predicted over 
short time scales, allowing synthesis of controllers that are guaranteed to perform well in those 
conditions.  At longer time scales, much of the environment is static or repetitive (e.g., areas of 
sun vs. shade), allowing a planner to schedule complex tasks with a high probability of success.  
The GDA Controller will detect fewer discrepancies in this environment and will be more 
focused on managing the team's resources. 
The plume detection mission benefits from abstractions of the environment and agent 
behavior that are possible in predictable environments. These abstractions allow goal autonomy 
to largely ignore issues of motion autonomy. 
6.2  Need for cooperation 
The UUV domain involves a single vehicle that has little or no interaction with other agents, and 
reasons about only a few constraints (e.g., to avoid goal oscillations).  This frees GDA to make 
highly independent decisions about the vehicle?s activity by selecting the best available goal for 
its current state.  This level of independence permits a direct connection between GDA and the 
guidance systems, with no need for a controller-synthesis step. 
Cooperation is the defining feature of the plume detection domain.  As a result, no individual 
agent can be allowed to replan its actions in a way that interferes with its peers.  This forces goal 
autonomy to a central node whose role is restricted to issuing clearly defined instructions that will 
be used to synthesize low-level controllers (FSAs) for each team member.  These extra layers of 
abstraction will allow goal autonomy to coordinate the team?s behaviors to ensure that no 
hardware will be lost unexpectedly, although it will introduce delays between selecting and 
implementing new goals. 
Architecture decisions involving cooperative agents need to balance closeness of cooperation 
with the agents? ability to respond to new information quickly.  A continuum of cooperation 
options exists, varying from agents that cluster closely (to form coherent arrays) to fully 
independent agents.  With less cooperation, fewer abstractions are required between GDA and 
low-level control, while close cooperation requires more abstractions and, implicitly, a more 
predictable environment to allow those abstractions. 
  
138
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
7.  Conclusion 
In this paper we described initial architectures and proposed models for projects in which goal 
autonomy (i.e., the GDA model) will be used to control unmanned vehicles.  We identified 
different modeling requirements in the application of GDA to situated agents depending on 
certain domain properties, which affect the capabilities afforded to GDA by lower level layers in 
the autonomy architecture.  In particular, the granularity of actions that are atomic for the GDA 
Controller varies widely according to the computational complexity of motion and the guarantees 
provided by lower level systems. 
In the contaminant detection domain, the motion of a team of vehicles toward a location 
where sensing will take place must be carefully coordinated so as to avoid collisions or other 
interference.  Solving this guidance problem (i.e., finding waypoints that each individual should 
follow) in the goal autonomy layer would be computationally infeasible.  However, specialized 
guidance techniques combined with domain-specific controllers, can reduce computational 
complexity.  Hence, in the contaminant detection domain, the abstraction level of the GDA 
Controller?s actions must be at least as high as instructions for each team of vehicles to follow. 
In contrast, we do not require coordination of many individual agents in the UUV domain.  
Therefore, the GDA Controller?s plans can be more concrete (e.g., specify a sequence of 
waypoints for the vehicle to follow).  Furthermore, the unpredictability of the ocean environment 
requires that GDA detect discrepancies without the aid of guarantees as provided by the LTL 
controllers in the contaminant detection domain.  To support GDA discrepancy detection, 
behaviors implemented by lower level systems should be as predictable as possible.  This 
reinforces our belief that the GDA Controller?s actions should be simpler in this domain. 
Thus, when designing a goal autonomy robotic controller, the required granularity of the 
actions will be dictated by the available reactive and abstraction layers.  Highly granular actions 
improve predictability but impose a higher computational burden on the GDA Controller.  More 
abstract actions reduce this computational burden, but generally require more time to safely 
coordinate goal changes, reducing system reactivity.  They also require more predictable 
environments for low level controllers. 
As we progress to more complex tasks and control of non-simulated vehicles, we will 
develop and implement new models for GDA that address the issues of real-world situated 
agents.  We have argued these models are needed (e.g., probabilistic expectation models for 
discrepancy detection).  We expect to create compelling demonstrations of goal autonomy for 
controlling unmanned robotic vehicles after these models are in place. 
Acknowledgements 
Thanks to NRL for supporting this research. Thanks also to NRL?s Brian Houston and Ben 
Dzikowicz for their insightful comments and corrections. The views and opinions contained in 
this paper are those of the authors and should not be interpreted as representing the official views 
or policies, either expressed or implied, of NRL or the DoD.   
  
139
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
References 
Alvarez, A., Caiti, A., & Onken, R. (2004).  Evolutionary path planning for autonomous 
underwater vehicles in a variable ocean.  IEEE Journal of Oceanic Engineering, 29(2), 418-
 429. 
Antonelli, G., Chiaverini, S., Finotello, R., & Schiavon, R. (2001).  Real-time path planning and 
obstacle avoidance for RAIS: an autonomous underwater vehicle.  IEEE Journal of Oceanic 
Engineering, 26(2), 216-227. 
Apker, T., & Potter, M. (2012).  Physicomimetic motion control of physically constrained agents.  
In W. Spears & D. Spears (Eds.), Physicomimetics: Physics-based swarm intelligence.  New 
York: Springer. 
Benjamin, M., Schmidt, H., Newman, P., & Leonard, J. (2010).  Nested autonomy for unmanned 
marine vehicles with MOOS-IvP.  Journal of Field Robotics, 27(6), 834-875. 
Binney, J., Krause, A., & Sukhatme, G. S. (2010).  Informative path planning for an autonomous 
underwater vehicle.  Proceedings of the 2010 IEEE International Conference on Robotics and 
Automation (pp. 4791-4796).  Anchorage, AK: IEEE Press.  
Choi, D. (2011).  Reactive goal management in a cognitive architecture.  Cognitive Systems 
Research, 12(3), 293-308. 
Coddington, A. (2006).  Motivations for MADBot: A motivated and goal directed robot.  
Proceedings of the Twenty-Fifth Workshop of the UK Planning and Scheduling Special Interest 
Group (pp. 39-46).  Nottingham, UK: University of Nottingham. 
Garau, B., Alvarez, A., & Oliver, G. (2005).  Path planning of autonomous underwater vehicles in 
current fields with complex spatial variability: An A* approach.  Proceedings of the 
International Conference on Robotics and Automation (pp. 194-198).  Barcelona, Spain: IEEE 
Press. 
Ghallab, M., Nau, D., & Traverso, P. (2004).  Automated planning: Theory and practice.  San 
Francisco, CA: Morgan Kaufmann. 
Hawes, N., Hanheide, M., Hargreaves, J., Page, B., Zender, H., & Jensfelt, P. (2011).  Home 
alone: Autonomous extension and correction of spatial representations. Proceedings of the 
International Conference on Robotics and Automation (pp. 3907-3914).  Shanghai, China: 
IEEE Press. 
Ho, C., Mora, A., & Saripalli, S. (2012).  An evaluation of sampling path strategies for an 
autonomous underwater vehicle.  Proceedings of the International Conference on Robotics and 
Automation (pp. 5328-5333).  St. Paul, MN: IEEE Press. 
Jaidee, U., Mu?oz-Avila, H., & Aha, D.W. (2013).  Case-based goal-driven coordination of 
multiple learning agents.  Proceedings of the Twenty-first International Conference on Case-
 Based Reasoning (pp. 164-178).  Saratoga Springs, NY: Springer. 
Johnson, B., & Kress-Gazit, H. (2012). Probabilistic guarantees for high-level robot behavior in 
the presence of sensor error. Autonomous Robots, 33(3), 309-321. 
Johnson, B., & Kress-Gazit, H. (2013). Analyzing and revising high-level robot behaviors under 
actuator uncertainty. To appear in Proceedings of the International Conference on Intelligent 
Robots and Systems. Tokyo, Japan: IEEE Press.   
140
 TOWARDS APPLYING GOAL AUTONOMY FOR VEHICLE CONTROL  
Khatib, O. (1985).  Real-time obstacle avoidance for manipulators and mobile robots.  
Proceedings of the International Conference on Robotics and Automation (pp. 500-505).  St. 
Louis, MO: IEEE Press. 
Kress-Gazit, H., Fainekos, G.E., & Pappas, G.J. (2009). Temporal-logic-based reactive mission 
and motion planning. IEEE Transactions on Robotics, 25(6), 1370-1381. 
LePage, K.D., & Schmidt, H. (2002).  Bistatic synthetic aperture imaging of proud and buried 
targets from an AUV.  Journal of Ocean Engineering, 27(3), 471-483. 
Livingston, S.C., Murray, R.M., & Burdick, J.W. (2012). Backtracking temporal logic synthesis 
for uncertain environments. Proceedings of the International Conference on Robotics and 
Automation (pp. 5163-5170). St Paul, MN: IEEE Press. 
Mather, T., & Hsieh, M. (2012).  Ensemble synthesis of distributed control and communication 
strategies. International Conference on Robotics and Automation (pp. 4248 ?4253).  St. Paul, 
MN: IEEE Press. 
Molineaux, M., Klenk, M., & Aha, D.W. (2010a).  Goal-driven autonomy in a Navy strategy 
simulator. In Proceedings of the Twenty-fourth AAAI Conference on Artificial Intelligence.  
Atlanta, GA: AAAI Press. 
Molineaux, M., Klenk, M., & Aha, D.W. (2010b).  Planning in dynamic environments: extending 
HTNs with nonlinear continuous effects. In Proceedings of the Twenty-fourth AAAI Conference 
on Artificial Intelligence. Atlanta, GA: AAAI Press. 
Nau, D. (2007).  Current trends in automated planning.  AI Magazine, 28(4), 43?58. 
Nau, D., Cao, Y., Lotem, A., & Mu?oz-Avila, H. (1999).  SHOP: Simple hierarchical ordered 
planner.  Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence 
(pp. 968-973).  Stockholm, Sweden: Morgan Kaufmann.  
Plaku, E., & McMahon, J. (2013).  Motion planning and decision making for underwater vehicles 
operating in constrained environments in the littoral. To appear in Towards Autonomous 
Robotic Systems. Oxford, UK: Springer. 
Powell, J., Molineaux, M., & Aha, D.W. (2011). Active and interactive learning for goal selection 
knowledge. In Proceedings of the Twenty-Fourth Florida Artificial Intelligence Research 
Society Conference.  West Palm Beach, FL: AAAI Press. 
Sarid, S., Xu, B., & Kress-Gazit, H. (2012).  Guaranteeing high-level behaviors while exploring 
partially known maps. In Proceedings of Robotics: Science and Systems. Sydney, Australia: 
MIT Press. 
Schermerhorn, P., Benton, J., Scheutz, M., Talamadupula, K., & Kambhampati, S. (2009).  
Finding and exploiting goal opportunities in real-time during plan execution. Proceedings of 
the International Conference on Intelligent Robots and Systems (pp. 3912-3917). St. Louis, 
MO: IEEE Press. 
Spears, D., Thayer, D. and Zarzhitsky, D. (2009).  Foundations of swarm robotic chemical plume 
tracing from a fluid dynamics perspective. International Journal of Intelligent Computing and 
Cybernetics, 2(4), 745?785 
Talamadupula, K., Kambhampati, S., Schermerhorn, P., Benton, J., & Scheutz, M. (2011). 
Planning for human-robot teaming.  In G. Cortellessa, M. Do, R. Rasconi, & N. Yorke-Smith 
141
M. WILSON, B. AUSLANDER, B. JOHNSON, T. APKER, J. MCMAHON, AND D.W. AHA 
(Eds.) Scheduling and Planning Applications: Papers from the ICAPS Workshop. Freiburg, 
Germany: AAAI Press. 
Tan, C.S., Sutton, R., & Chudley, J. (2004). An incremental stochastic motion planning technique 
for autonomous underwater vehicles. Proceedings of IFAC Control Applications in Marine 
Systems Conference (pp. 483-488). Ancona, Italy: Elsevier. 
Warren, C.W. (1990). A technique for autonomous underwater vehicle route planning. IEEE 
Journal of Oceanic Engineering, 15(3), 199-204. 
Weber, B. G., Mateas, M., & Jhala, A. (2012). Learning from demonstration for goal-driven 
autonomy. In Proceedings of the Twenty-sixth AAAI Conference on Artificial Intelligence.  
Toronto, Ontario, Canada: AAAI Press. 
Wilson, M., Molineaux, M., & Aha, D.W. (2013). Domain-independent heuristics for goal 
formulation. In Proceedings of the Twenty-sixth International FLAIRS Conference. St. Pete 
Beach, FL: AAAI Press.  
Wongpiromsarn, T., Topcu, U., & Murray, R. M. (2012). Receding horizon temporal logic 
planning. IEEE Transactions on Automatic Control, 57(11), 2817-2830. 
Wooden, D., Malchano, M., Blankespoor, K., Howard, A., Rizzi, A., & Raibert, M. (2010).  
Autonomous navigation for BigDog.  Proceedings of the International Conference on Robotics 
and Automation (pp. 4736-4741).  Anchorage, AK: IEEE Press. 
Worcester, J., Rogoff, J., & Hsiehm M. (2011). Constrained task partitioning for distributed 
assembly. Proceedings of the International Conference on Intelligent Robots and Systems (pp. 
4790?4796). Vilamoura, Algarve, Portugal: IEEE Press. 
142
 
 
 
 
 
 
Author Index 
 
Aha, David W. 64, 111, 127 
Alford, Ron 95 
Apker, Thomas 127 
Auslander, Bryan 127 
Bobrow, Robert  1 
Brinn, Marshall  1 
Burstein, Mark  1 
Cordier, Am?lie 26 
Cox, Michael T.  10, 79 
Forbus, Kenneth D. 34 
Garnier, Joseph P. 26 
Georgeon, Olivier L. 26 
Hinrichs, Thomas R. 34 
Johnson, Benjamin 127 
Karg, Michael  43 
Kirsch, Alexandra 43 
Klenk, Matthew 111 
Kuter, Ugur  53, 95 
Laddaga, Robert 1 
Maynord, Michael 79 
McMahon, James 127 
Molineaux, Matthew  64, 111 
Mu?oz-Avila, H?ctor  53 
Nau, Dana 95 
Paisner, Matt 79 
Perlis, Don 79 
Shivashankar, Vikas 95 
Vattam, Swaroop 111 
Wilson, Mark 127 
143