ABSTRACT
 Title of dissertation: CREATING AN OBJECTIVE
 METHODOLOGY FOR HUMAN-ROBOT
 TEAM CONFIGURATION SELECTION
 Sharon Michelle Singer
 Doctor of Philosophy, 2012
 Dissertation directed by: Professor David L. Akin
 Department of Aerospace Engineering
 As technology has been advancing and designers have been looking to future
 applications, it has become increasingly evident that robotic technology can be used
 to supplement, augment, and improve human performance of tasks. Team members
 can be combined in various combinations to better utilize their capabilities and
 skills to create more e cient and diversi ed operational teams. A primary obstacle
 to integrating new robotic technology has been the inability to quantitatively com-
 pare overall team performance between very di erent team con gurations without
 limiting the analysis to a few metrics. To-date, mission designers have arbitrarily
 assigned importance to mission parameters, subjectively limiting the search space.
 While this has been e ective at evaluating individual mission plans, the arbitrary
 evaluation criteria has made a straightforward comparison between di erent research
 projects and ranking scales impossible. The question then becomes how to select
 an objective set of criteria for any given problem.
 It is this  nal question that this research sought to answer. A methodology was
 developed to facilitate performance comparison amongst heterogeneous human and
robot teams. This methodology makes no assumptions about mission priorities or
 preferences. Instead, it provides an objective, generic, quantitative method to reduce
 the complexity of the mission designer?s decision space. It employs an heuristic,
 greedy objective reduction algorithm to reduce problem complexity and a multi-
 objective genetic algorithm to explore the design space.
 The human-robot team con guration selection problem was utilized as the ap-
 plication that motivated this research. The methodology, however, will be applicable
 to a wider domain of research. It will provide a structure to enable broader search
 of the design space, exploration of the di erences between performance metrics, and
 comparison of optimization models that facilitate evaluation of the design options.
CREATING AN OBJECTIVE METHODOLOGY
 FOR HUMAN-ROBOT TEAM
 CONFIGURATION SELECTION
 by
 Sharon Michelle Singer
 Dissertation submitted to the Faculty of the Graduate School of the
 University of Maryland, College Park in partial ful llment
 of the requirements for the degree of
 Doctor of Philosophy
 2012
 Advisory Committee:
 Professor David L. Akin, Chair/Advisor
 Professor Shapour Azarm
 Professor Linda Schmidt
 Professor Raymond Sedwick
 Professor Norman Werely
c Copyright by
 Sharon Michelle Singer
 2012
Acknowledgments
 There have been so many supportive people who have helped me along this
 long and windy road. Foremost, I would like to thank Dr. Dave Akin for all of his
 help over the years. As my academic and research advisor, he encouraged me to
 pursue my own vision and research interests. Dr. Akin has an astounding wealth
 of knowledge on everything space related and was a great resource for framing any
 outlandish ideas into a more concrete experimental reality.
 I would like to thank all of the members of my dissertation committee for
 their time, support and contributions over the years. Each member has brought
 their own expertise to the table, creating a constructive, diverse sounding board for
 my research ideas.
 I would like to recognize the phenomenal aid of Dr. Dimo Brockho . Not
 only does his objective reduction algorithm form the backbone of this dissertation,
 but he was very receptive and patient when I reached out to him for assistance in
 understanding a portion of his algorithm. Without his time, aid and collaboration,
 this dissertation would not exist. I am very thankful for all of the support he has
 given me.
 I would also like to recognize and thank Dr. Mary Bowden for her compas-
 sionate advice, advocacy, support, and encouragement over the years. She was the
  rst representative of the Aerospace Engineering Department that I met when I  rst
 came to the University of Maryland, and she has always been warm and welcoming,
 always willing to listen to my new challenges and triumphs.
 ii
To the Space Systems Laboratory at large: Through his collaborative lead-
 ership style, Dr. Akin has fostered a wonderfully supportive research environment
 within the Space Systems Laboratory. Students are encouraged to work together, to
 come up with creative solutions to problems and to continue exploring and asking
 \what if?". In my tenure at the lab, the SSL has drawn several generations of the
 most dedicated, enthusiastic, and inspiring group of graduate students. Whether
 serving as knowledge experts and consultants for each other, sharing their own expe-
 riences as lessons learned, or providing the levity and comic relief necessary to keep
 us all sane through the long hours and long years, the students of the SSL have been
 amazingly supportive. I could not have made it to where I am today without you
 all. Whether it was attending a Viking funeral for the lab?s pet craw sh, competing
 with the other scuba divers for the most time spent underwater in a given year (the
 lab?s \Idle Hands" award), or the retreats away from the lab and out into the real
 world to enjoy sunshine and good company, I thank you all.
 I would especially like to recognize the phenomenal support of a few fellow
 graduate students. To Martin Stolen, for taking me under his wing, teaching me
 all he knew about human and robot interaction, and encouraging me to continue
 into the PhD program. To John Mularski for sharing his love of scuba diving and
 Barrett Dillow for convincing me to take the plunge and become certi ed! To Joe
 Galante, Barrett Dillow, and Craig Lewandowski for making me feel like a part of
 the lab, and for insisting that I have a life outside of it, too! To Kate McBryan
 for her exuberance, friendship, enthusiastic crafting, technical expertise, and for the
 hours she spent helping me disect an algorithm ... To Angela Peace, for aid with
 iii
the aforementioned algorithm and as my math knowledge expert reference. There
 are so many other friends who have helped me along the way, it would take another
 chapter to name them all.
 It goes without saying that my friends and family have been wonderfully sup-
 portive over the years. Your love, con dence, well-wishes, and belief in me have
 carried me through graduate school.
 Finally, I would like to thank Rick Barnard, my soon-to-be husband. He has
 been so understanding and patient of all of my frustrations in the  nal months of
 this dissertation e ort, always believing in me and my abilities. Whether available
 to listen to my most recent mini-problems or mini-triumphs, for insisting on hearing
 the long version of all of my stories, or going for great walks around the lake on
 weekends, he has always been there for me, with love and support for all of my
 endeavors.
 Thank you all for your esteem and support. It has meant the world to me.
 iv
Table of Contents
 List of Figures viii
 List of Abbreviations x
 1 Introduction 1
 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
 1.2 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . 4
 1.2.1 Di erentiating HRT Con gurations . . . . . . . . . . . . . . . 4
 1.2.2 Challenges to Comparing HRT Con gurations - Performance
 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
 1.2.3 Incorporating Optimization to Improve Team Performance . . 7
 1.3 Unique Dissertation Contributions . . . . . . . . . . . . . . . . . . . . 8
 2 HRT Performance Analysis Literature Review 10
 2.1 Characterizing HRT Con gurations . . . . . . . . . . . . . . . . . . . 11
 2.1.1 When to Analyze . . . . . . . . . . . . . . . . . . . . . . . . . 11
 2.1.2 Scenario Perspective . . . . . . . . . . . . . . . . . . . . . . . 14
 2.1.3 Hierarchical and Distributed Decision Making . . . . . . . . . 16
 2.1.4 Autonomy Level . . . . . . . . . . . . . . . . . . . . . . . . . 19
 2.1.5 Situational Awareness . . . . . . . . . . . . . . . . . . . . . . 23
 2.1.6 Communication Architecture . . . . . . . . . . . . . . . . . . . 24
 2.2 Challenges to Comparing HRT Con gurations . . . . . . . . . . . . . 26
 2.2.1 How to De ne and Assess Task Performance? . . . . . . . . . 26
 2.2.2 Team Performance Metrics in Related Fields . . . . . . . . . . 29
 2.2.3 Quantitatively Comparing HRT Task Performance . . . . . . . 31
 2.2.4 Building Performance Metrics into Quantitative Models . . . . 36
 2.2.5 Parasuraman?s Model Categorization . . . . . . . . . . . . . . 37
 2.2.6 Implementation Challenges . . . . . . . . . . . . . . . . . . . . 45
 2.3 Incorporating Optimization Into a Team Performance Model . . . . . 47
 2.4 Conclusions from HRT Performance Analysis Literature Review . . . 49
 3 Research Methodology and Algorithms 51
 3.1 Background to This HRT Research Problem . . . . . . . . . . . . . . 52
 3.1.1 HRT Con guration Problem Decomposition . . . . . . . . . . 53
 3.1.2 Performance Metrics for the HRT Con guration Problem . . . 54
 3.1.3 Top-Level Research Outline . . . . . . . . . . . . . . . . . . . 55
 3.1.3.1 Representing Metrics and Problem Details . . . . . . 58
 3.1.3.2 Additional Experiment Questions . . . . . . . . . . . 60
 3.1.4 Conceptual Simpli cation { The Knapsack Problem . . . . . . 61
 3.2 MOGA Algorithmic Approach . . . . . . . . . . . . . . . . . . . . . . 64
 3.2.1 Use of Matlab?s MOGA Toolbox . . . . . . . . . . . . . . . . 66
 3.2.2 Stochastic Analysis of a MOGA . . . . . . . . . . . . . . . . . 67
 3.2.3 Multi-Objective Constraint Handling Technique . . . . . . . . 68
 v
3.2.4 Hypervolume Indicator Pareto Set Quality Metric . . . . . . . 69
 3.2.5 Many-Objective Optimization Extension . . . . . . . . . . . . 72
 3.3 O ine Objective Reduction . . . . . . . . . . . . . . . . . . . . . . . 74
 3.3.1 Selecting an Objective Reduction Method . . . . . . . . . . . 76
 3.3.2 Brief Overview of Critical De nitions and Concepts . . . . . . 79
 3.3.3 The  -MOSS Objective Reduction Algorithm . . . . . . . . . . 82
 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
 4 Objective Reduction Validation and Knapsack Application Experiment 86
 4.1 Objective Reduction Algorithm Validation . . . . . . . . . . . . . . . 86
 4.2 Knapsack Test Problems Experimentation . . . . . . . . . . . . . . . 88
 4.2.1 Knapsack Simulation Setup and Initial Results . . . . . . . . . 89
 4.2.2 Solving the Knapsack Test Problems . . . . . . . . . . . . . . 92
 4.2.2.1 Results From the 5-Objective Knapsack Problem In-
 stances . . . . . . . . . . . . . . . . . . . . . . . . . 93
 4.2.2.2 Results from the 15-Objective Knapsack Problem In-
 stances . . . . . . . . . . . . . . . . . . . . . . . . . 101
 4.2.2.3 Results from the 25-Objective Knapsack Problem In-
 stances . . . . . . . . . . . . . . . . . . . . . . . . . 106
 4.2.3 Hypervolume Quality Metric Results for the Knapsack Problem113
 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
 5 Case Study: Robotic Reconnaissance Rovers 121
 5.1 Summary of Tunstel?s Method and Results . . . . . . . . . . . . . . . 122
 5.2 Summary of Schreckenghost?s Method and Results . . . . . . . . . . . 126
 5.3 Using Tunstel?s Method to Compare the Four Rovers . . . . . . . . . 128
 5.4 Application of this Dissertation?s Methodology to the Robotic Recon-
 naissance Rover Comparison . . . . . . . . . . . . . . . . . . . . . . . 137
 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
 6 Large HRT Application Demonstration 147
 6.1 Initial Setup: Hubble Space Telescope Servicing Mission 3A . . . . . 147
 6.1.1 Task Primitive Data Preparation . . . . . . . . . . . . . . . . 150
 6.1.2 Team Characterization and Scheduling . . . . . . . . . . . . . 152
 6.1.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 155
 6.1.3.1 Task Load Model Performance Metrics . . . . . . . . 157
 6.1.3.2 Expected Value Statistics Performance Metrics . . . 161
 6.2 Large HRT Con guration Selection Experiment . . . . . . . . . . . . 165
 6.2.1 Inputs: Database Development . . . . . . . . . . . . . . . . . 165
 6.2.2 De ning the MOGA Fitness Function . . . . . . . . . . . . . . 166
 6.2.3 Specifying MOGA Parameters . . . . . . . . . . . . . . . . . . 167
 6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 168
 6.3.1 Objective Reduction for the HRT Con guration Selection Prob-
 lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
 6.3.2 Pareto Results for the HRT Demonstration Problem . . . . . . 172
 vi
6.3.3 Team Con guration Analysis . . . . . . . . . . . . . . . . . . 179
 6.3.4 Discussion of Reproducibility . . . . . . . . . . . . . . . . . . 183
 6.3.5 Discussion of Complexity Reduction in Pareto Decision Space 185
 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
 7 Conclusions and Future Work 188
 7.1 Discussion of Important Results . . . . . . . . . . . . . . . . . . . . . 189
 7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
 A Previous Cooperative HRT Research 199
 A.1 Preliminary Dissertation Research and Results . . . . . . . . . . . . . 199
 A.1.1 Methodology to Assess HRT Task Performance . . . . . . . . 199
 A.1.1.1 Case Study: Hubble Space Telescope Servicing Mis-
 sion 3A . . . . . . . . . . . . . . . . . . . . . . . . . 200
 A.1.1.2 Methodology Assumptions: Relevant Attributes and
 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 201
 A.1.1.3 Characterizing a Robot?s Role in a Servicing Team . 202
 A.1.1.4 Relative Robot Task Performance . . . . . . . . . . . 203
 A.1.1.5 Role of the Third Agent . . . . . . . . . . . . . . . . 204
 A.1.2 Results from the HRT Scheduling Research . . . . . . . . . . . 205
 A.1.3 Analysis of Previous Scheduling Work . . . . . . . . . . . . . . 206
 A.1.3.1 Methdology?s Quantitative Model . . . . . . . . . . . 207
 A.1.4 Challenges Yet to be Addressed . . . . . . . . . . . . . . . . . 209
 B Large HRT MOGA Implementation Details 210
 B.1 Fitness Function Inputs . . . . . . . . . . . . . . . . . . . . . . . . . 210
 B.2 Fitness Function Outline . . . . . . . . . . . . . . . . . . . . . . . . . 211
 B.3 Fitness Function Output . . . . . . . . . . . . . . . . . . . . . . . . . 215
 B.4 Stochastic MOGA Usage . . . . . . . . . . . . . . . . . . . . . . . . . 216
 C Large HRT Con guration Selection Problem - Initial Details and Results 219
 C.1 Initial Setup: Hubble Space Telescope Servicing Mission 3A . . . . . 219
 C.1.1 Task Primitive Data Preparation . . . . . . . . . . . . . . . . 220
 C.1.2 Team Characterization and Scheduling . . . . . . . . . . . . . 221
 C.1.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 221
 C.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
 C.2.1 Analysis of Bias in Solution Selection . . . . . . . . . . . . . . 230
 C.2.2 Pareto Decision Space Analysis . . . . . . . . . . . . . . . . . 234
 C.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
 Bibliography 238
 vii
List of Figures
 3.1 The Three Stages of the Collaborative HRT Problem . . . . . . . . . 54
 3.2 Top Level Outline of Proposed Methodology . . . . . . . . . . . . . . 56
 3.3 Methodology Development for Performance Metric Selection . . . . . 59
 3.4 Methodology Validation in the Knapsack Problem Domain . . . . . . 64
 3.5 Outline of the Algorithmic Approach of This Research . . . . . . . . 65
 3.6 Illustration of Hypervolume Quality Metric . . . . . . . . . . . . . . . 71
 3.7 Brockho ?s Greedy  -MOSS Algorithm [12] . . . . . . . . . . . . . . 83
 4.1 Objective Reduction Algorithm Validation . . . . . . . . . . . . . . . 88
 4.2 Objective Reduction Results From the Knapsack Problems . . . . . . 91
 4.3 KP100 Items, 5 Objectives: Pareto Population Spread . . . . . . . . . 96
 4.4 KP100 Items, 5 Objectives: MOGA Termination Reason and Run
 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
 4.5 KP100 Items, 5 Objectives: MOGA Termination Reason and Run
 Time for 30 Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
 4.6 KP100 Items, 15 Objectives: Pareto Population Spread . . . . . . . . 104
 4.7 KP100 Items, 15 Objectives: MOGA Run Time and Termination
 Reason . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
 4.8 KP100 Items, 25 Objectives: Pareto Population Spread . . . . . . . . 110
 4.9 KP100 Items, 25 Objectives: MOGA Run Time and Termination
 Reason . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
 4.10 KP100 Items, 5 Objectives: Hypervolume Quality Metric . . . . . . . 114
 4.11 KP100 Items, 15 Objectives: Hypervolume Quality Metric . . . . . . 115
 4.12 KP100 Items, 15 Objectives: Hypervolume Quality Metric - Englarged117
 4.13 KP100 Items, 25 Objectives: Hypervolume Quality Metric . . . . . . 118
 4.14 KP100 Items, 25 Objectives: Hypervolume Quality Metric - Enlarged 119
 5.1 Tunstel?s Performance Metrics for MERs . . . . . . . . . . . . . . . . 124
 5.2 Tunstel?s Metric Weightings for MERs . . . . . . . . . . . . . . . . . 125
 5.3 Rover Case Study: Initial Performance Metrics . . . . . . . . . . . . . 130
 5.4 Expanded Performance Metrics for Rover Case Study . . . . . . . . . 133
 5.5 Rover Case Study: Reduced Metrics . . . . . . . . . . . . . . . . . . . 134
 5.6 Rover Case Study: Redundant Metrics . . . . . . . . . . . . . . . . . 136
 5.7 Rover Case Study  -MOSS Metric Sets . . . . . . . . . . . . . . . . . 138
 5.8 Rover Con guration Convergence Success . . . . . . . . . . . . . . . . 140
 5.9 Rover Con guration Selection . . . . . . . . . . . . . . . . . . . . . . 141
 5.10 Rover Con guration Selection: Hypervolume . . . . . . . . . . . . . . 143
 6.1 HRT Problem: Reduced Objective Sets . . . . . . . . . . . . . . . . . 169
 6.2 HRT Problem: Pareto Set Composition . . . . . . . . . . . . . . . . . 173
 6.3 HRT Problem: MOGA Termination Reason and Run Time . . . . . . 175
 6.4 HRT Problem: Hypervolume Quality Metric . . . . . . . . . . . . . . 177
 viii
B.1 Detailed Diagram of the MOGA Fitness Function Call . . . . . . . . 213
 C.1 HRT Problem: Reduced Objective Sets . . . . . . . . . . . . . . . . . 225
 C.2 HRT Problem: Pareto Set Composition . . . . . . . . . . . . . . . . . 226
 C.3 HRT Problem: MOGA Termination Reason and Run Time . . . . . . 227
 C.4 HRT Problem: Hypervolume Quality Metric . . . . . . . . . . . . . . 229
 ix
List of Abbreviations
 C4T Controller-to-Controller Communication and Coordination
 Taxonomy: model of team communication (FAA?s software
 package)
 CD coordination demand (task performance metric)
 CTR critical time ratio (task performance metric)
 EVA extra-vehicular activity
 FAA Federal Aviation Administration
 FAT false alarm time (task performance metric)
 FO fan out (task performance metric)
 HRI human robot interaction
 HRI/OS Human Robot Interaction Operating System (JPL
 software package)
 HRT human and robot team that cooperatively completes
 a mission
 HST SM Hubble Space Telescope servicing mission
 HURON Human-Robot Task Network Optimization (JPL software
 package)
 IE interaction e ort (task performance metric)
 MaOO many-objective optimization: more than three objective
 functions
 MITPAS Mixed Initiative Team Performance Assessment System
 (Perceptronics, Inc. software package, US Army)
 MOGA multi-objective genetic algorithm
 MTBI mean time between interventions (task performance
 metric)
 MTCI mean time completing an intervention (task
 performance metric)
 NASA-TLX model to provide workload values for human mental
 tasks (NASA software package)
 NT neglect time (task performance metric)
 OT occupied time: di erentiates between CD and demands
 of interacting with several independent robots (task
 performance metric)
 RAD robot attention demand (task performance metric)
 RASCAR-CS Robot-Assisted Search and Rescue Coding System:
 urban search and rescue HRI performance model (University
 of South Florida software package)
 RMS remote manipulator system (Space Shuttle)
 RPP robot performance parameter (task performance metric)
 SC situation coverage: percentage of total tasks in which a robot
 understands the directive (task performance metric)
 UAV unmanned aerial vehicle
 WEI work e ciency index (task performance metric)
 x
Chapter 1
 Introduction
 1.1 Background
 As technology has been advancing and designers have been looking to future
 applications, it has become increasingly evident that robotic technology can be used
 to supplement, augment, and improve human performance of tasks. With more
 options developing for how robotic technology can be used in an integrated human
 and robot team (HRT), new methods to assess individual agent performance and
 overall team performance have been developed.
 Whether for on-orbit servicing missions or planetary exploration and habitat
 building, employing a cooperative human and robotic crew in future space missions
 will enable a larger volume of tasks to be completed. Team members can be com-
 bined in various combinations to better utilize their capabilities and skills to create
 more e cient and diversi ed operational teams.
 At the present, robotic technologies have been included in space operations to
 aid in performing tasks that would be di cult for humans to perform. The Space
 Shuttle?s remote manipulator system (RMS) was used to grapple with and maneuver
 large pieces of equipment (including the Hubble Space Telescope) and as a mobile
 platform to anchor astronauts. Dextre, the newest addition to the International
 Space Station?s robotic system, is a much smaller robotic agent that provides similar
 1
support with the added ability to perform a small set of manipulative activities.
 Robonaut was speci cally designed to perform extra-vehicular activity (EVA) tasks
 and to serve as a cooperative team agent with a human crew [73, 36]. Several
 generations of Mars rovers have demonstrated success at employing robotic arms
 for instrument-driven operational tasks in an environment currently uninhabitable
 by humans.
 While robotic technologies have furthered human knowledge of and exploration
 of space, use of robotics in space has been limited to a few operational scenarios to
 replace the activities of the human crew. Interaction between these robotic tech-
 nologies and the human crew has previously relegated the robots to the role of
 supplemental tools (as in the case of the RMS as a maneuvering platform).
 There have been several investigations into the utility of human and robot
 partnerships for space activities. The Ranger Telerobotic Shuttle Experiment was
 developed to test the ability of an on-orbit dexterous robot to perform servicing ac-
 tivities [67]. Analysis has been performed to assess what roles in team collaboration
 robotic technology would be best suited to  ll [3, 68]. Di erent ways to combine
 robotic technologies and hardware to aid in the human crew?s productivity have
 been examined [2].
 The diversity of robotic technology available creates a multitude of new oppor-
 tunities in task performance. Utilizing the new capabilities for hardware, software,
 sensors and system integration, and communication architecture could lead to a
 greater level of mission diversity, and facilitate scenarios that had previously been
 impossible.
 2
Future mission designers will have a large heterogeneous group of distinct
 agents (both human and robotic) from which to select the most productive or e -
 cient team members. How can a designer select the most e ective agents to com-
 plete a series of tasks? An overall team performance analysis would be bene cial
 and would enable quantitative comparison between disparate teams.
 It is this  nal question that this research sought to answer. A methodology
 was developed to facilitate performance comparison amongst heterogeneous robot
 teams. This methodology makes no assumptions about mission priorities, prefer-
 ences, nor importance of performance criteria. Instead, it provides an objective,
 generic, quantitative method to reduce the complexity of the mission designer?s
 decision space.
 In section 1.2, the problem domain addressed by this research e ort will be laid
 out. Three distinct domains will be identi ed to guide the research. Each will be
 brie y outlined in section 1.2, and discussed in much more depth in the Literature
 Review of chapter 2. After discussing the state of the  eld, the third chapter will
 discuss the methodology used, and detail my unique contribution to this problem
 domain. Chapters 4, 5, and 6 present the test problems and applications used
 to demonstrate the utility of the methodology presented in this dissertation, and
 chapter 7 presents a discussion of the results and implications, and o ers concluding
 remarks and suggestions for future work.
 3
1.2 Statement of the Problem
 The problem of selecting the best set of team members for a given task scenario
 can be divided into three orthogonal areas of research. The  rst area of research
 has been in de ning and developing new technology to enable diverse human and
 robot teams (HRT con gurations). Additionally, there has been signi cant research
 on methods to measure and quantify di erences between each agents? performance
 in various aspects of task completion (performance metrics). Third, there has been
 work to optimize various parts of mission design and team selection. The research
 described in this dissertation will create synthesis between these three areas. It
 will develop a methodology that will facilitate comparison between distinct groups
 of team members, and will facilitate selection an optimal team for a given task
 scenario.
 1.2.1 Di erentiating HRT Con gurations
 Substantial research has been done to develop new technology that increases
 the variety of tasks and the precision with which robots can perform them. Integrat-
 ing these technologies into human agent activities has the potential to increase the
 diversity of possible HRT con gurations. As the variety increases, it becomes even
 more necessary to concretely de ne the di erences between teams and to specify the
 performance of each participating agent.
 The primary con guration di erences between HRTs can be grouped into sev-
 eral broad categories. Whether the team?s planning and performance analysis occurs
 4
o ine or in real-time greatly in uences the parameters to be monitored to capture
 the important team performance characteristics. The scenario perspective of the
 mission planners can determine whether the humans or robots are favored to reduce
 workload. For real-time mission performance, the decision and control authority
 structure determines where critical decisions and work allocation is made. The type
 of communication network and the information it is capable of passing also char-
 acterizes a HRT. The autonomy level of each robotic agent determines the amount
 of human intervention potentially needed in robotic tasks. How each agent (hu-
 man and robot) receives information about their surrounding environment can vary
 greatly between di erent con gurations. Each of these categories are described in
 more detail in Chapter 3.
 1.2.2 Challenges to Comparing HRT Con gurations - Performance
 Metrics
 A primary obstacle to integrating new robotic technology into familiar mis-
 sion scenarios is the inability to quantitatively compare overall team performance
 between very di erent team con gurations. Without limiting the analysis to a few
 important metrics, is it possible to objectively and conclusively demonstrate whether
 a standard two-human International Space Station EVA team, a combined human
 and semi-autonomous robot team, or an autonomous robot performs an entire mis-
 sion scenario better than the other con gurations? Can the bene t derived from
 using one team con guration over another be quanti ed? As a designer, is it im-
 5
portant to retain knowledge of the tasks completed by each of the agents in a given
 scenario separately? Or is the interest primarily in the overall team?s performance
 of a scenario? These are the questions that drive the development of an overall team
 performance assessment.
 There are numerous performance metrics in the literature to assess di erent
 aspects of task performance. Determining which criteria to select and how to use
 them to evaluate a team?s overall performance has been much debated in the liter-
 ature (described in more detail in section 2.2). The general approach has been to
 assess the performance ability of each agent, characterized by the important perfor-
 mance parameters (completion time, reliability of completion, mean time between
 failures, resource utilization and cost, mental workload, etc.).
 There has also been great diversity in the computational methods used to
 aggregate the results of the di erent individual task performance metrics. After the
 results for each task are tabulated, the next step is to rate the performance of the
 team as a whole in completing the overall scenario?s goals. Several model types
 will be discussed in section 2.2.5 that demonstrate how to incorporate multiple
 performance metrics into an overall score.
 The purpose of the team analysis is to determine the quality and e ciency
 of the team?s overall performance, with the ultimate goal of enabling quantitative
 comparison between teams.
 6
1.2.3 Incorporating Optimization to Improve Team Performance
 Creating a mission plan for any space operations quickly turns into an over-
 constrained multi-objective optimization problem. There are many di erent kinds
 of constraints on the crew activity scheduling problem [52], including di erences in
 crew performance capabilities (dependent on the crew member?s physical support to
 a structure during the tasks), relational, topographical, and precedent constraints
 between tasks, and time window constraints for nominal EVA operations.
 Team members can be used in various combinations to better utilize their
 capabilities and skills to create more e cient and diversi ed operational teams.
 This involves allocating tasks to provide the most bene t from the partnership, and
 creating the planning, scheduling, and software interfaces to support these e orts.
 To simplify the overall problem into one that has a tractable solution, mis-
 sion designers to-date have arbitrarily assigned importance to mission parameters,
 subjectively limiting the search space. While this has been e ective at evaluating
 individual mission plans, the arbitrary evaluation criteria makes a straightforward
 comparison between two plans based on di erent ranking scales impossible. To facil-
 itate comparison between di erent analyses, being able to specify an objective set of
 criteria would be immensely valuable. How could this be obtained? The intractable
 problem must be simpli ed to enable feasible solutions. The question then becomes
 how to select an objective set of criteria for any given problem.
 Fong [29] described a proof-of-concept experiment to show that robotic recon-
 naissance missions could supplement human exploration. The concept was to send
 7
a mobile robot to scout out the terrain to aid in selecting a traversal path for a two
 human EVA team on the moon. The scout would collect environmental data that
 could be used to improve each day?s mission plan and improve human crew per-
 formance by reducing the operational risk and increasing its situational awareness.
 Individual task metrics for the scout were reported in real-time to the operator,
 were compiled into overall metrics and the data was used by operators to adjust
 operations.
 This type of overall mission analysis to di erentiate team con gurations could
 be an inordinately valuable tool for mission designers. The challenge becomes cre-
 ating a quantitative, overall model to measure a team?s performance for a generic
 mission, to enable broad use of the analysis tool (and remove the need for mission-
 speci c models).
 1.3 Unique Dissertation Contributions
 A generic, objective methodology to select and aggregate performance criteria
 to aid in determining an optimal HRT con guration for a mission scenario will
 transcend these three domains. The HRT con guration selection problem is utilized
 as the application that motivates the problem. The methodology, however, will
 provide a much broader search of the design space between performance metrics
 and optimization models that facilitate evaluation of the design options.
 The unique contributions of this dissertation are brie y itemized here, and
 discussed in more detail in Chapter 3:
 8
 Decomposition of the HRT con guration selection problem into three distinct
 realms, decoupling their analysis: classi cation of agent and mission details,
 planning and scheduling, and selection of metrics and teams.
  Select performance metrics to preserve underlying dominance structure of a
 problem - this is a new approach for how to choose performance metrics.
  Demonstration of a method to compute overall performance evaluation that
 does not rely on aggregating multiple performance metrics into a single per-
 formance function.
  Create a generic, rigorous, objective, quantitative methodology for the HRT
 domain to both select the performance metrics to be used in an overall team
 performance analysis and to reduce the complexity of a mission designer?s
  nal decision space. This type of methodology is novel for the domain and a
 valuable contribution to further the use of human and robot teams.
 9
Chapter 2
 HRT Performance Analysis Literature Review
 The literature review section of this dissertation contains an extensive survey
 of the  eld of collaborative HRT performance metrics, structured to answer the
 challenges laid out in the three domains of the problem statement (section 1.2). It
 assesses the di erent priorities, assumptions, and methodologies incorporated into
 the leading quantitative models, with speci c emphasis on determining which are
 most applicable in space ight applications. It discusses the challenges of imple-
 menting the methods, the shortcomings of the methods, and possible adaptations
 to increase the generality of the methods.
 This is a necessary  rst step to enable analysis of the e ect of the di erent
 evaluation metrics on the design of team participation. In a unique contribution
 to the  eld, this survey (originally published at the 2011 International Conference
 on Environmental Systems [91]) sought to identify the conceptual pieces of existing
 HRI methods that would best synthesize into a universal, objective quantitative
 team task performance evaluation model, valid for a wide range of applications.
 It is this  nal purpose - creating an objective performance evaluation method and
 model - that is the aim of this dissertation.
 This portion of the literature review focuses on monitoring and measuring
 the overall team performance of a human and robot system. For more information
 10
on research about measuring and improving human workload in human and robot
 teams, the author suggests reference [72] which contains a review of human workload
 models based on cognitive resource utilization.
 2.1 Characterizing HRT Con gurations
 Three fundamental aspects that categorize a HRT are the ratio of humans to
 robots, the required amount of human e ort to command and monitor the robots,
 and the capabilities of the robotic agents utilized. These greatly in uence the types
 of coordinated tasks that can be done, and creates bounds on the communication
 architecture needed to provide e ciency in the team?s cooperative operations. The
 realm of possibilities and combinations is virtually endless. In each of the following
 subsections (sections 2.1.1 through 2.1.6), the component pieces of a HRT archi-
 tecture are considered, and the performance metrics that characterize the relevant
 parameters are included for coherency. Properties of performance metrics and how
 to use them in an overall team performance analysis will be discussed in section 2.2.
 2.1.1 When to Analyze
 Task performance analysis can be a useful tool during several di erent periods
 of mission operations. Experimental testing before and during the mission design
 phase provides mission planners with data on how well each agent could perform a
 wide variety of tasks in a future mission. The data evaluates each task separately
 rather than as part of a full mission. The data can be used during the design phase of
 11
a mission to determine which agent should perform each task (task allocation), and
 can facilitate arranging each agent?s schedule to ensure that all tasks are completed
 during the mission and that the workload is distributed between all of the agents.
 A priori analysis can also be performed to assess a team?s ability to resolve
 anomalies during task performance. It is at this point in the design process that
 many di erent options can be considered, and comparisons can be drawn between
 di erent combinations of agents to determine the best performing con guration.
 This in turn will objectively select the  nal team members (agent selection), task
 allocation, mission objectives and task ordering.
 Additionally, task performance analysis can be evaluated in real-time during a
 mission to assess how well the tasks are being completed. Real-time analysis requires
 a signi cantly di erent hardware architecture to support the quantity of computing
 and processing of sensor information that must keep up with the real-time operations
 task data. The advantage of real-time analysis is that details of team performance
 (including task allocation and anomaly resolution) can be addressed and altered as
 needed to improve the team performance for the rest of the mission.
 Whether the mission scenario is in real-time or decided a priori has a signif-
 icant e ect on the task allocation, planning, and scheduling that occurs. Done a
 priori, these can be cycled through software to optimize the scenario over the entire
 mission length. Computational e ciency may still be an issue, but it is removed
 from being mission critical. In real-time, a team?s task allocation, planning, and
 scheduling can be reassessed and changed depending on immediate performance
 needs and capabilities. If a task is taking longer than expected, other subsequent
 12
tasks can be reassigned as necessary to balance out the schedule to maintain e -
 ciency. With distributed levels of intelligence, there may not be an initial team plan
 to deviate from. Decisions could be made as each task-need arises. In operations
 with supervisor decisions made a priori, none of these performance qualities can be
 altered mid-task, and must wait through the end of the task or scenario.
 When to Analyze: Performance Metrics
 A human supervisor can make decisions in real time, but there are limits to
 human information retention, which depends on the quantity to be sifted through,
 and the complexity and heterogeneity of the tasks. These lead to a limit to the
 number of robots that an operator can control simultaneously (\fan out" (FO) [61])
 without degradation of team e ciency or under-utilization of team resources. An
 overloaded supervisor will develop a backlog of actions and decisions. If a robot has
 to wait for an operator?s instructions, an anomaly resolution plan, or con rmation
 of any kind, this reduces the e ciency of the robotic agent.
 Rather than relying on task completion times to measure how well an agent
 performs a task (as done in HURON), Schreckenghost [83] developed the concept of a
 work e ciency index (WEI). The WEI metric represents the ratio of productive time
 to overhead time that occurs in an agents? task schedule. Schreckenghost notes that
 WEI is more valuable in a priori analysis because it utilizes total productive time
 and total overhead time for a mission. Using these metrics, however, is not practical
 for real-time task performance analysis. The values could vary greatly between each
 13
real-time increment and could give a false representation of the overall e ciency of
 an agent.
 2.1.2 Scenario Perspective
 Whether a scenario is designed from a human-centered or robot-centered per-
 spective in uences which variables and resources are the most mission-critical and
 can change the assumptions behind the scenario development. For example, a
 human-centered model might assume that human extra-vehicular activity (EVA)
 time is the most critical commodity because of environmental exposure risks to the
 human outside of the spacecraft. This type of scenario would be planned such that
 the human agent is only used when absolutely needed. Alternatively, in a robot-
 centered model, the robot could be designed to be fully capable and independent of
 the human. Any evaluation of task or team performance would be geared towards
 evaluating e ciency and e ectiveness with respect to the robot.
 Scenario Perspective: Performance Metrics
 One perspective to assess a team?s e ciency, e ectiveness, and productivity is
 by analyzing its resiliency to failure. Kannan [48] de ned a metric for calculating
 how useful fault tolerance is for a multi-robot team. The paper provided a practical
 method to calculate the redundancy of a system. Shah [87] examined the produc-
 tivity of a HRT through the mean time between interventions (MTBI), mean time
 completing an intervention (MTCI), and the probability that an intervention would
 14
be needed. It described the e ect of unplanned interventions on a team?s produc-
 tivity, and demonstrated that the team?s productivity was much more sensitive to
 MTBI than to MTCI.
 Within each scenario perspective, the roles of the humans and robots can vary.
 The roles have signi cant e ect on the way tasks are completed and the workload
 distributed. Scholtz [80] discussed  ve di erent roles that a human can have when
 working together with a robot on a task: supervisor, operator, mechanic, peer,
 and bystander. As a supervisor, the human gives direct instructions to the robot.
 The detail of the instructions (high level commands versus primitive level scripted
 tasks) depends on the robot?s abilities to interpret and implement instructions. In
 addition, as a supervisor the human is responsible for forming and creating a plan to
 pursue the overall goals. A human could also be an operator who directly in uences
 the robot?s actions. As a mechanic, the human is collocated with the robot and
 participates physically with the robot?s hardware and actions. Humans as peers to
 the robot are also collocated. In this role the human gives commands or information
 aid to the robot, but does not participate in robot upkeep. The  nal role of a
 human interacting with a robot is as bystander. In this role the human does not
 participate in any tasks with the robot. The human is only an obstacle in the robot?s
 environment.
 In Singer [88], the e ect of rede ning a robot?s role on a HRT was analyzed.
 The roles were di erentiated by the safety considerations that determined the ability
 of a robot to work within the human crew?s proximity. In each role the robot
 exercised di erent physical capabilities that determined the portion of mission tasks
 15
that it could perform. With the overall objective of minimizing human EVA time,
 the role de nitions were traced through the task allocation and scheduling process
 to determine how the di erent perspectives changed the overall team?s performance
 and e ciency.
 2.1.3 Hierarchical and Distributed Decision Making
 Decision making for a HRT can either be hierarchical or distributed. Most
 HRTs to date have used hierarchical decision making, where a supervisor oversees
 the team and there can be several levels of authority. In distributed decision making,
 a team is made up of individuals who can each make task decisions for themselves,
 relying on their sensory information of the world around them, and relevant infor-
 mation passed to them by a neighboring agent.
 Decision Making: Performance Metrics
 In Fong [31, 32], the Peer-to-Peer Human-Robot Interaction Project was de-
 scribed. In the model, a task executive allocated tasks to agents which were both
 capable of performing the work and were not actively working on another task. A
 feature of this tool was that the task executive had the hierarchical decision author-
 ity to interrupt an agent?s task and send it to perform another (preemption). Once
 a task was assigned, however, decision making was passed to the individual agent
 to plan and schedule its own task.
 Ponda [71] described the development of a real-time decision making frame-
 16
work that allocated tasks for a heterogeneous HRT. The predictive model developed
 schedules for all of the agents based on agent availability, workload, and any coordi-
 nation requirements between the humans and robots needed to complete the given
 task. As the number and diversity of agents in the combinatorial problem increased,
 centralized planning became computationally prohibitive. Using a decentralized ap-
 proach (such as decentralized auction algorithms) to facilitate the task allocation
 reduced this challenge, and facilitated the generation of a more e cient and e ective
 architecture.
 Billman [8] presented an extensive table of performance metrics, their relevant
 parameters, and the human factors concerns associated with each. These were used
 in several experiments to evaluate a mixed-initiative (both robot and human can
 initiate communication and tasks) human-autonomous unmanned vehicle teams for
 the Navy?s Intelligent Autonomy program.
 Saleh [76] expanded Crandall?s model [17, 15] to include two factors that rep-
 resent the level of trust for both the human and the robot during cooperative inter-
 action intervals. Trust in a HRT is representative of the human?s belief that a robot
 understood its instructions, that it will correctly assess its situation, and that it will
 perform tasks correctly without requiring assistance. In Saleh?s [76] work, a human?s
 trust level was proportional to the human?s fan-out (FO), and indirectly proportional
 to the robot attention demand (RAD). RAD was de ned to be a function of both
 direct interaction time and indirect interaction time. Indirect interaction time was
 explained to be the interval when the robot is working autonomously but the hu-
 man, due to reduced trust, monitors the robot?s progress rather than applying the
 17
human?s full attention to another task. Another factor gaged the level of human
 trust in the autonomous system making correct choices and following through with
 those choices. Decreased trust increased the time duration of the interaction.
 Freedy [33] presented a methodology to assess trust levels for a HRT during
 operations. Expected value statistics were used to decide whether to allocate control
 to a robot. This work used a tool developed by Visser [21], the Mixed Initiative
 Team Performance Assessment System (MITPAS), which calculated a compound
 \goodness" score as a relative expected loss score, derived from observed human
 task allocation decision behavior, risk and observed robot performance. This was
 used to maximize the performance of the overall team. The MITPAS system was
 developed and validated for operator training on unmanned vehicles.
 Marble [57] described the level of trust that the human had both for a robot
 making a correct decision, and in the robot completing a decided path e ectively.
 This work assessed trust limitations in mixed-initiative systems. The operator?s
 situational awareness and trust within an experiment in which operators directed
 mobile robots to perform tasks was measured for a given scenario under  ve di erent
 autonomy modes.
 VanWissen [97] conducted a study to investigate how trust and fairness fac-
 tored into human interaction with other humans, and with robotic agents. His
 results demonstrate how trust a ects human decisions, preferences, and ideas of
 reward in a collaborative human and robot setting.
 18
2.1.4 Autonomy Level
 Robots have been developed with increasing levels of autonomy, such that they
 are able to not only carry out scripted tasks by themselves, but identify anomalies,
 and come up with their own resolution plans. Technology has developed not only
 to create distinct autonomy levels for a given robotic system, but also to allow ad-
 justable, or sliding autonomy (allows the autonomy level to change as needed within
 a scenario). Miller [58] has done research with varying levels of robot autonomy to
 determine their e ect on human counterparts. Heger [40] demonstrated that the
 concept of sliding autonomy could reduce the probability of an irrecoverable failure
 and would increase overall team e ciency.
 Due to safety and reliability concerns, the majority of robots used to date in
 space mission applications (as peers to human agents) have been controlled by a
 supervisor (e.g. the Space Shuttle?s remote manipulator system). For this reason,
 much of the literature presumes supervisory control of robots in a scenario that
 actively involves a human presence. In the scenarios where the human supervisor
 is o -scene, robots are designed with a higher level of autonomy (e.g. Mars rovers),
 allowing them to better analyze their own situation and pursue goals and way-points
 independently.
 In Goodrich [37], design principles were developed that would guide the devel-
 opment of human-robot autonomy architectures to make interactions more e cient.
 More recently in Goodrich [38], the success of two operator management styles were
 described for a team of robots with adjustable autonomy. Usually supervised robots
 19
would each be given orders sequentially - the operator?s attention would switch to a
 new robot only when  nished with the current one. Goodrich proposed an alterna-
 tive style which would direct the robots in a method similar to a sports playbook.
 After a play was announced, each robot would determine the course of its own
 actions to facilitate achievement of the team goal.
 Howard [43] examined four di erent human-team leadership styles to identify
 their de ning characteristics and adapt them to HRTs to create more e ective con-
  gurations. Her analysis suggested more e ective teams would be formed if humans
 used a directive style or a transactional style of leadership with their robotic peers.
 In a directive style, the human is the supervisor, giving commands and establishing
 the overall goals. In a transactional style, the human plays the role of the operator,
 monitoring the robot and its task completion, and giving commands at a lower level.
 Metrics for Autonomy
 Glas [35] discussed two new metrics to represent task di culty for a human
 monitoring and controlling a multi-robot team. Situation coverage (SC) is the per-
 centage (of total tasks) in which a robot understands the directive. In multi-robot
 situations, SC measures the upper bound on the ability of the system to operate
 autonomously. Critical time ratio (CTR) is a ratio of the time that a robot is per-
 forming mission-critical tasks to the total amount of time that a robot is actively
 engaged in tasks. When two robots have high CTR, an operator will have a higher
 workload. A human can improve performance by reducing the number of con icts
 20
that occur between robot attention demands.
 A new way of thinking about HRTs was developed by Olsen and Crandall to
 research the most e ective ways for humans and robots to work together on tasks,
 speci cally addressing autonomy levels. They developed a series of metrics that
 addressed the speci c unique human and robot interactions at a generic level. In
 Olsen [61, 62, 63] and Goodrich [37], they introduce several new metrics. Olsen and
 Goodrich concretized neglect time (NT) as a metric to measure robot autonomy [61]
 to reduce interaction e ort (IE) for both a human operator and the robot without
 reducing the e ectiveness of the team.
 This metric, however, considers human attention and does not consider the
 physical abilities, limitations, or usage of the human. NT is the amount of time
 a robot can function independently of the human. This interval is demarcated by
 a user-de ned drop in the e ectiveness of the robot?s task performance to below a
 threshold, which can only then be raised by human intervention. NT and a quantity
 representing a human?s interaction e ort (IE) combine to calculate a quantity for
 robot attention demand (RAD), which is the fraction of the total task time that the
 robot requires attention. Fan out (FO, de ned as the inverse of RAD), represents
 the number of robots a human can monitor and control before the decrease in overall
 team performance drops past a threshold. Crandall [18] used these metrics to predict
 the performance of a multi-robot team and validate the method with experiments
 taxing the operator?s attention. In addition, this method found the performance
 thresholds that maximize team performance, given a team size [17].
 Crandall [15] solidi ed these concepts into a methodology that has signi -
 21
cantly a ected the  eld. Wang [99] extended Crandall?s NT metric to include the
 coordination demands (CD) and the e ects of robot heterogeneity. Occupied time
 (OT) di erentiates between wait times resulting from an operator having low sit-
 uational awareness and wait times resulting from a queue of needy robots. Wang
 [98] expanded the estimation of an operator?s FO limit to allow analysis for N-robot
 teams.
 Elara [25] extended Crandall?s model to include the possibility that a robot
 did not correctly interpret or respond to a user?s command. In a false positive, a
 robot rejects a \correct" interaction. In a false negative, a robot fails to reject an
 \incorrect" interaction. False alarm time (FAT) represents the time that is spent
 identifying a false alarm and recovering from the delay.
 In the two papers by Schreckenghost [83, 84] the authors described metrics
 and a quantitative model to assess real-time adjustable autonomy. They suggested
 measuring the degree of robot independence (to decrease human intervention time)
 based on the time spent on unplanned interventions. The computed performance
 metric results were used in real time by controllers. It should be noted, however, that
 results were based on percentage of mission time but that there was no indication
 given to relate robot to human time. If a robot performed its share of the tasks at a
 slower rate than in the previous scenario run, then all else being the same, it would
 register as spending more time in an autonomous mode. This could skew the results
 because performing tasks slower does not require a di erent level of autonomy, and
 should not be perceived as such.
 22
2.1.5 Situational Awareness
 Situational awareness, as discussed in the literature, can refer to three di erent
 perspectives: that of a human supervisor referring to knowledge of overall mission
 operations, that of a robot control operator referring to knowledge of the robot?s
 immediate environment and obstacles, and of the robot?s awareness of its own envi-
 ronment. Quantifying situational awareness has been a challenge because the idea
 has many de nitions and contributing factors.
 Situational Awareness: Performance Metrics
 Bruemmer [13] proposed using human workload, error, and overall perfor-
 mance as quantities that gage the e ectiveness of robots in a mixed-initiative en-
 vironment. The amount of human error could be seen as a re ection of operator
 fatigue, but more dominantly a lack of human situational awareness of the task en-
 vironment. The model created a text-based dialogue between the human and robot,
 and produced a 3-D representation for shared understanding about the task and the
 environment. While this research was intended to be a proof-of-concept argument
 that increased autonomy and better operator interfaces could improve robot navi-
 gation, it also demonstrated collaborative control where robots were e ectively used
 as trusted peers.
 Scholtz [81] described the implementation of a tool that evaluates the situa-
 tional awareness provided to an operator through a user interface used for supervi-
 sory control of autonomous vehicles. Each interface was evaluated to  nd how well
 23
each facilitates a user?s situational awareness.
 Lampe [53] described a metric representing environmental complexity and a
 robot?s information of it as a measure of robot autonomy. Nehmzow [60] presented
 a novel quantitative model of a robot?s interaction with its environment based on
 the robot?s trajectory over time.
 Nehme [59] created a model to evaluate the performance of supervisory control
 of multi-unmanned vehicles. Included in the model were both operator variables and
 variables that pertain to the entire team. A unique contribution of this model was
 that it modeled operator limits by accounting for wait time due to loss of situational
 awareness. The integrated model facilitated comparing design development and was
 capable of searching the large design space to obtain the overall goal of maximizing
 team e ciency.
 Hwang [45] developed a model that used each agent?s knowledge of the other
 agent?s state and of the environment?s state to measure the changing interactions
 between the agents. This model can be used to measure the situational awareness
 of each agent, and the interaction e ort required at each state of the cooperation.
 2.1.6 Communication Architecture
 For a heterogeneous HRT, creating a communication architecture that facili-
 tates passing relevant information when needed, and avoiding information overload
 (passing unneeded sensory data) is imperative for an e ective team. There are
 many di erent architectures that can be utilized in the communication framework,
 24
depending on the agents involved in a mission. In most supervisory control sce-
 narios, human operators or peers communicate to the robots, but the robots are
 not capable of passing con rmation of task completion messages to the humans,
 and are not capable of querying their supervisors with questions or to address task
 di culties.
 Kaupp [50] presented a robot-centric model to facilitate bidirectional commu-
 nication between a human peer and a mobile robot. Within this paradigm, robots
 passed images of the environment to the human to increase the human?s situational
 awareness and to process details from the images. Humans were viewed as a resource
 which could be queried for information and observations about the environment, but
 as with all resource usage, it came at a cost. The cost of querying operators was
 traded o within the architecture against the expected bene t of the new infor-
 mation. This model could be used to determine what and when to communicate
 between a human and robot engaged in collaborative tasks.
 Approaching the di culties of team communication from a di erent direction,
 Trafton [95] presented a cognitive architecture to facilitate perspective-taking for
 collaborative human and robot interaction. The goal of the model was to produce
 intelligent robots that were capable of reasoning from a human-perspective by mod-
 eling how humans integrate multiple information and environmental representations
 into a world model. This type of reasoning allowed a robot to spatially interpret
 relative commands from a human, e.g. \give me the wrench on the right". To select
 the speci ed object correctly, the robot needed to be able to simulate the perspective
 of the human, which required implicit rotations of the world environment.
 25
The Human Robot Interaction Operating System (HRI/OS) software [31, 32]
 was created with several distinct objectives. In addition to facilitating perspective-
 taking, the overall goal was to both maximize the amount of work done by the
 entire team and to reduce the number and duration of EVAs needed to complete
 the task list. To further reduce the human?s workload in the scenarios, human
 and robot communication was designed to have as natural (for humans) interaction
 mechanisms as possible. This includes having a text-to-speech agent that verbal-
 ized a robot?s responses for a human to hear, and a speech recognition agent that
 translated a human?s verbal response into text for a robot to parse.
 2.2 Challenges to Comparing HRT Con gurations
 2.2.1 How to De ne and Assess Task Performance?
 An innumerable variety of performance metrics have been identi ed to help
 de ne the phenomena that occur between a human and robot working cooperatively
 on a task. Some of the metrics are application speci c, and some are more general.
 These individual metrics canvas the range of activities done by each agent. Common
 metrics include task completion time, reliability of task completion, mean time
 between failures, resource utilization and cost, mental workload, and a transition or
 switching cost for transferring mental attention from one task to another.
 Burke [14] described application-speci c HRI metrics for urban search and
 rescue that could be applied to more generic scenarios (search, rescue (extrication),
 structural evaluation, medical assessment and treatment, information transfer, com-
 26
mand and control, and logistics). The project used existing models and software
 systems (Robot-Assisted Search and Rescue Coding System (RASCAR-CS) and the
 FAA?s Controller-to-Controller Communication and Coordination Taxonomy (C4T),
 which capture what is communicated and how it is communicated for a team) to
 examine the robot?s e ects on human task performance within the context of human
 work. In essence, it logs how the robot a ects and aids human performance.
 Keller [51] presented a straightforward example of a human driving a car while
 talking on the phone to break down a multi-part task list to develop metrics that
 characterize the activities. The human?s performance in the example was evaluated
 by considering a task as several simultaneous actions (visual, auditory, cognitive,
 psychomotor, all measured in relative rating scales). Activity in all of these pieces
 could lead to excessive workload demands, leading to performance errors (note,
 without any other agents taking part). This paper measured the scaled values of the
 number of human resource components needed for each task to simulate quantitative
 predictions of the human?s workload over the entire task list.
 Most models that attempt to measure the performance of a team including
 humans have a common metric for measuring human mental workload. The model
 that is commonly used is the NASA-TLX [39], which provides workload values for
 various human mental tasks. This results in subjective data from a post-experiment
 questionnaire to gage test subjects? estimated workload during di erent parts of an
 experiment.
 An innumerable quantity of software planning packages have been developed
 to plan a HRT?s operations. In general, each research group working on a mission
 27
scheduling problem has used their own software packages, and there is no community
 consensus on which is preferred. While each package is unique, most can be applied
 to the same types of scenarios. For example, an overall lunar mission planning
 software package (HURON (Human-Robot Task Network Optimization) [26]) has
 been developed at JPL to facilitate task allocation, planning, and scheduling for
 combined human and robotic activities. It provides the architecture to develop
 optimal task allocation and scheduling for a scripted multi-agent lunar mission. The
 input problem description in this software package is decoupled from the planning
 and agent assignment. This indicates that this software, like many of the other
 packages developed, can be applied more broadly than it was originally designed.
 Arnold?s work [4] created an analytical framework to compare the advantages
 and disadvantages of di erent human-robot systems. Given a set of metrics, the goal
 of the framework was to lead towards optimal task performance. Although tasks
 and schedules were primarily scripted, the framework provided a guide for how to
 incorporate unplanned interventions into the schedule.
 Each metric that quanti es a portion of a HRT interaction provides useful in-
 formation to a mission designer or supervisor. Although there are numerous metrics
 to describe components of a HRT architecture and assess an agent?s performance
 on individual tasks, it is clear that one single metric will not su ciently explain the
 team?s task performance. Quite the contrary. Each of the metrics mentioned in this
 discussion cover a di erent aspect of a HRT?s performance, but successful perfor-
 mance in one category does not necessarily entail good performance in the other
 categories. Several metrics would need to be selected to comprehensively determine
 28
a HRT?s performance by analyzing the di erent facets of the team?s con guration.
 The problem that remains, however, is how to integrate the results from multiple
 metrics into a meaningful overall team performance score. It is only with this type of
 rating system that two di erent HRTs could be compared to determine their relative
 performance on an overall mission, rather than on single tasks or other criteria.
 2.2.2 Team Performance Metrics in Related Fields
 There are many applications that use humans and robotic technology cooper-
 atively to complete a task. Bechar [6, 7] and Oren [64] developed a methodology
 for a performance analysis of human-robot collaboration in visual target recognition
 tasks. It was based on a quantitative model [6] with four levels of human-robot
 collaboration, ranging from manual to fully autonomous. The metrics used were
 from signal detection theory: hit, false alarm, miss and correct rejection. These
 quantify the in uence of the robot, human, environment, and task to determine the
 optimal operational level based on input parameters. Bechar has developed algo-
 rithms to facilitate real-time switching between di erent collaboration levels for the
 human-robot teams [94].
 The development of new computer user interfaces (UI) encounter similar chal-
 lenges to those of HRTs. Stanton [92] demonstrated three experiments to record
 the usability of the UI and how it a ected the user?s performance in the application
 of human supervisors directing urban search and rescue mobile robots. This paper
 collected the data and left selection of an evaluation method for future research.
 29
Yanco [104] demonstrated a proof-of-concept method for evaluating user control
 interfaces from rovers at an arti cial intelligence search and rescue competition.
 Yanco categorized the evaluation of UIs into six categories of methods: e ective-
 ness, e ciency, user satisfaction, inspection methods, empirical methods, and formal
 methods. Analysis of the competition runs and scoring results in several concrete
 UI suggestions for the type of information that, if present to the user, would have
 improved task performance. Bruemmer [13] measured the usability of an interface
 based on how the human workload and human error a ected the performance of the
 autonomous robot.
 Glas [35] developed a UI to increase the performance of multiple social robots
 by designing the UI to provide information to the operator in an intuitive way,
 increasing the human?s ability to monitor several robots while speci cally controlling
 a di erent one. Task di culty, as used in this paper, referred to a task being more
 di cult if a robot required a user?s attention to help with the task. The UI can be
 used to reduce the amount of attention each robot needs caused by task di culty
 (to be di erentiated from attention needed to relay a directive).
 Mobile robots have been used in several experiments to observe their interac-
 tions with autistic children [20]. As seen in the experiments, autistic children have
 found that interacting with a mobile robot is less confrontational and less threat-
 ening than interacting with a human. The rovers passively sat in the room with
 the children, until they were comfortable and curious enough to approach the rover.
 They played tag with the children, and carried on simple conversations. In this case,
 the robots and humans did not have a speci c agenda besides exploring their inter-
 30
action. Assessing the overall performance of the combined child and robot would
 require integration of several di erent metrics that gaged various aspects of their
 combined activities.
 2.2.3 Quantitatively Comparing HRT Task Performance
 The large variety of options available for selecting and organizing team mem-
 bers in HRTs raises the question of how to evaluate their performance in methods
 that transcend the individual details of a team: in essence, how to compare apples
 and oranges. One approach is to select an individual metric and apply it to di erent
 teams. This is not su cient, however, because the di erent priorities of the teams
 cannot be re ected in a single metric. Additionally, if the robot of team A excels in
 a mission, but it is the human of team B that excels in the same mission, how do
 you objectively weigh the two options?
 An integral step in producing the most e cient task allocation and team sched-
 ules depends on the selection of an objective set of task performance metrics to dis-
 tinguish between each agent?s performance, and to measure the e ect of changing
 task allocation or task ordering on the completion of the overall mission objectives.
 Frequently, the metrics used to assess performance are built into the software plan-
 ning and scheduling packages that automatically develop feasible mission plans.
 To reduce the computational load, most software packages create schedules
 that observe all constraints, but often the software is written to only consider a single
 mission objective (e.g. reduce overall mission duration, minimize human time). A
 31
much smaller set of task performance metrics is required to analyze each agent?s
 contribution to meeting the mission objective (e.g. minimize task completion time).
 While this analysis will produce good results, the other performance parameters that
 are neglected (workload, resource depletion) in the analysis might have signi cant
 a ects that are not brought to light.
 Categorization of HRT Performance Metrics
 There have been several attempts recently to create categorizations for in-
 dividual performance metrics. Categorizing the metrics would aid the search for
 commonality between them. It could then be anticipated that a selection of metrics
 could be drawn that span each of the categories and facilitates a wider understand-
 ing of a team?s overall performance. One approach has been to create taxonomies for
 human-robot interaction to specify categories for relevant details. These taxonomies
 have emphasized creating a rubric for comparing and contrasting the design deci-
 sions of di erent HRTs.
 Gerkey [34] presented a taxonomy of task allocation for multi-robot systems
 based on the relative utility of one robot performing a task. This paper provided
 formal categorization of the di erent application problems that can occur (single-
 task robots versus multi-task robots, single-robot tasks versus multi-robot tasks, and
 instantaneous assignment of task allocation versus time-extended assignment, which
 assumes some predictive knowledge of the tasks that will need to be completed in the
 future). Several di erent algorithms were proposed to e ciently calculate the task
 32
allocation for each of these types of problems, and the algorithms were compared
 for computation requirements, communication requirements, and solution quality.
 While this paper focused on purely robotic teams, its methodology can easily be
 extended from the heterogeneous robot teams considered to a HRT.
 Yanco [103] created a taxonomy that emphasized the physical characteristics of
 a team?s dynamics. Team composition (de ned as the ratio of humans to robots and
 including details about the di erent types of robots) was a de ning characteristic
 - a team of one human and one robot will vary signi cantly in its overall task
 performance from a team of one human and a two-robot team. The degree to
 which a robotic agent is dependent on a human decision maker (autonomy level)
 will also signi cantly in uence performance. Does each robotic agent require human
 intervention at some point? Can it resolve problems by itself? Can a human operate
 more than one robot in a given scenario? Does a robot need to resolve con icting
 instructions received from di erent human agents? How much required interaction
 is necessary between team agents to complete tasks?
 Yanco also described the information type, variety, and level provided to op-
 erators to facilitate situational awareness in decision making (including sensor in-
 formation available, sensor fusion, and data pre-processing). Yanco cites Ellis [27]
 for the time and space part of the taxonomy, di erentiating into four categories:
 humans and robots functioning at the same time (synchronous) or at di erent times
 (asynchronous), and physically located in the same place (collocated) or at a sep-
 arated distance (non-collocated). This can clearly be seen to di erentiate between
 a robot teleoperated from a control room and a robot peer working along-side its
 33
human contemporary.
 Yanco?s taxonomy applies to a broad variety of applications. It also includes
 a subjective category to indicate the relative importance of a task?s performance,
 termed its criticality. This allows di erentiation between mission critical perfor-
 mance applications (urban search and rescue) distinguished from applications with
 less severe consequences for failing a task (robot soccer team).
 Steinfeld [93] provided a generalized categorization for common metrics that
 apply to human-robot interaction. Metrics were split into three categories: human,
 robot, and overall system. Each of these categories contains metrics in  ve additional
 categories (navigation, perception, management (including task allocation, resource
 allocation, and coordination), manipulation (interaction with the environment), and
 social).
 These three taxonomies highlight di erent aspects of a HRT. Gerkey?s tax-
 onomy facilitates task assignment depending on team con guration, but it does
 not include categories for the planning and scheduling portion of the design prob-
 lem, including workload, environmental knowledge, completion times, etc.. Yanco?s
 taxonomy emphasizes the interaction mechanisms between the humans and robots
 much more highly, but provides little guidance for task allocation metrics or the
 e ect of a HRT with distributed decision making. In Steinfeld?s taxonomy, the de-
 tails of each agent?s task performance are clari ed, and the interaction mechanisms
 between them are accounted for, but does not o er suggestions on how to com-
 bine a selection of metrics (including overall team performance metrics and agent
 performance metrics) into an overall picture. This includes incorporating multiple
 34
overall team performance metrics into a single ranking. Each of these taxonomies
 has been successfully used independently, and can apply to a broad range of hetero-
 geneous teams composed of humans and robots. A future analysis might  nd value
 in integrating them for a more comprehensive analysis of what occurs during HRT
 interactions.
 A unique approach to categorize team metrics has been developed recently by
 researchers at the Massachusetts Institute of Technology. Building on their previous
 work [15, 24, 19, 70, 69], researchers developed the concept of metric classes under
 the application of human supervisory control. They addressed the problem of which
 metrics should be selected to completely assess team performance. The more metrics
 were included in an analysis, the more computationally complex the analysis became
 and, as pointed out in Donmez [24], using metrics that correlate to the same data
 can result in  nding false signi cant e ects in the data.
 Categories of di erent types of metrics were created in Donmez [24] with the
 goal of facilitating selection of metrics to be applied to a HRT. The resulting guiding
 principle is that to e ciently and comprehensively measure the overall team task
 performance, at least one metric from each category should be selected. Crandall
 [16] applied the formal metric classes framework to the application of UAVs to create
 a methodology for developing a predictive model of the interaction between humans
 and unmanned vehicles.
 Although there are several di erent methods for categorizing performance met-
 rics, the same goal is sought by each: a logical progression between groups of metrics
 would allow a mission designer to select the most relevant from a group and facilitate
 35
including a wider range of metrics to re ect the overall scenario more accurately.
 2.2.4 Building Performance Metrics into Quantitative Models
 Individual performance metrics for components of a HRT provide insight into
 the di erent qualities of team con gurations. What is lacking, however, is a method
 or framework to incorporate these metrics into quantitative models designed to de-
 termine the quality of and e ciency of a team?s overall task performance. This
 type of system-level analysis could determine which team would perform better for
 a given mission scenario without needing to run additional simulations or experi-
 ments, and will result in a better, more comprehensive picture of what is actually
 occurring. Each model will have built into it assumptions, priorities, and di ering
 methodologies. Di erentiating which model would be preferred in di erent scenar-
 ios can be a di cult problem. The following discussion seeks to categorize existing
 models and describe their characteristics.
 Without a formal framework, a designer seeking to compare di erent team con-
  gurations would need to compare numeric results from a multitude of performance
 metrics to discover which con guration is best across the board. Schreckenghost
 [83, 84] described a real-time sliding autonomy robot assessment tool that provided
 several streams of individual performance metric data to an operator, covering the
 important components of robot operations. This type of analysis of several metric
 data streams becomes much more complicated with the increase in metric set size.
 The calculation of composite task scores are used in other models to break
 36
down the complicated analysis such that comparison between team con gurations
 can be done by assessing a single value for each. Rodriguez [74] provided a justi-
  cation for using a composite task score by comparing the analysis to the scoring
 of an athletic competition. A composite numerical score for the overall competition
 was obtained from the scores on each of the individual events. If di erent sets of
 events were selected, the results were di erent. If all events were weighted equally,
 the individual scores were summed. If some events were deemed more valuable than
 others, a weighting factor was used to represent the importance. The  nal result
 from the competition was then a single score for each participant.
 2.2.5 Parasuraman?s Model Categorization
 There are many implicit choices made in deciding which model to use, includ-
 ing whether the level of automation is assessed (including the coordination demands
 placed on other agents), and which components of the mission schedule are most im-
 portant to the analyst. In a frequently cited paper, Parasuraman [66] presented two
 evaluation criteria to be used for this decision. The primary criterion was the e ect
 of a given design selection on human performance, e.g. how was the human?s per-
 formance in uenced by this selection? The secondary evaluative criterion included
 several additional important pieces of a model, including reliability and cost. These
 criteria were for a human-centered model of HRTs, and were intended as a frame-
 work to guide an objective selection of a HRT for a given application. Parasuraman
 [66] identi ed four categories that represent existing quantitative HRT performance
 37
models: task load models, expected value statistics models, cognitive systems mod-
 els, and state transition network models. Each of these types of models combine task
 performance data for each agent and results in an overall scenario-level assessment
 of team performance. It is with these types of quantitative models that it becomes
 possible to compare, in essence, apples and oranges - it facilitates comparison of
 di erent team con gurations to select the best overall team for a mission.
 Task Load Models
 Task load models evaluate the e ort required for each agent to complete a
 task. The goal of task load models is to examine the e ect of the tasks themselves
 on system performance, operator demands, and how task performance responds to a
 range of autonomy levels. This can be in terms of time as a resource (task completion
 time), resources used (either an agent?s or cumulative for the team, e.g. power), or
 overall, repetitive, and fatiguing workload levels for a human. Selecting this model,
 the designer?s goal is to distribute the total workload across the agents such that
 the total task list could be completed in the minimum amount of time, constrained
 by the  nite amount of resources and agents available. This type of model can be
 used to compare which agent should perform portions of a task list, and to compare
 which agent contributes most to the entire mission. It therefore can also be used
 as a task allocation schema, and a method to objectively select amongst a set of
 possible agents for the  nal team con guration.
 A common method to combine metrics for a task load model involves a pair-
 38
wise comparison of e ort and execution time for each agent. This approach has
 been used by many researchers, particularly to determine the  nal task allocation
 amongst a HRT. Howard [41] validated using a composite task score (termed the
 sequence execution parameter) as a task allocation scheme instead of individual
 task performance metrics in an experiment to guide a robot to a target location.
 The overall composite task score was run through a genetic algorithm in which the
 weightings in the  tness function changed with user feedback. In another model,
 Howard [42, 43] used a composite task score for each agent to balance performance
 score and mental workload for teleoperated and autonomous robot control of an au-
 tonomous rendezvous and docking scenario. All of these works used the composite
 task score for the  nal task allocation decisions.
 Task load models can be the most straight-forward of the model types to
 implement because the data required for standard-type missions, primarily execution
 time and workload quantities, will be readily available from previous missions or
 testing. No extra experimentation will be needed to obtain the required inputs to
 the model. Each agent of the HRT will have known capabilities, and the quantized
 e ort required to complete tasks could be estimated based on similar tasks.
 These types of models are preferred in space applications that are planned
 from a human-centered perspective. As an a priori analysis, task load models can
 provide veri cation of the task allocation schema used, and the mission objectives
 can be easily interpreted into the model to produce desired results. It can be much
 more challenging, however, to use task load models if a wealth of performance data
 for an agent is not available, e.g. for a new robotic system. Task completion time
 39
is one of the primary criterion in the model. New technology or tasks that agents
 have not attempted before will require either signi cant experimental testing before
 it can be used in the model, or a method to estimate performance used instead.
 While pausing in the analysis to perform experiments to obtain necessary input
 data is not the most expedient solution to the challenge, it is the most rigorous
 answer. Estimating models are often limited by their level of accuracy, which can
 feed a sizable amount of uncertainty into the task load model, and could propagate
 through in unknown ways.
 Expected Value Models
 The second type of model relies on the expected value to be gained from an
 agent performing a given task. This is a common type of quantitative model and
 can utilize a nearly limitless number of individual metrics. Each task performed
 by each agent is assessed to determine the bene t gained from that task-agent
 combination, and the cost of that combination. The tabulated di erence between
 the bene t and the cost is the expected value. Evaluation is often represented in the
 form of statistics, where the probability of bene t and cost is used, with reference to
 probability of component or task failure. Parasuraman [65] contains a more detailed
 set of di erent analysis schemes for expected value models.
 Rodriguez [74] was the  rst to use an expected value model to compute a
 composite task score as a relative value with respect to a reference. According to
 his model, a ?reference? could be an agent, or a team con guration. The model can
 40
be used to compare performance between agents to aid in comparing heterogeneous
 agents or between team con gurations to evaluate overall team performance. In this
 model, tasks and agents were assessed for relative task completion time and relative
 resource cost. Dissimilar metrics could be used because the model resulted in a
 matrix of dimensionless parameters. Each agent?s performance ratios and resource
 cost ratios were summed for the entire scenario considered. The performance ratios
 were summed for each agent such that an agent would have a single score that
 represented the di erence between the relative bene t and the relative di erence in
 cost of using the resources.
 Rodriguez?s model was structured such that each task primitive emphasized a
 di erent aspect of human performance (cognitive, motor, and sensory skills). This
 model calculated the value added by using a given team instead of the reference team.
 The selection of the reference does not a ect the results - the analysis is relative, and
 the same relations would be achieved if a di erent reference was selected. In this
 paper, task di culty included aptitude of a given system with respect to a reference,
 and a relative amount of power, mass, or other resource needed to implement the
 candidate system. The composite task score represented the ratio of value added
 by using a speci ed system or team instead of the reference.
 Tunstel [96] applied Rodriguez?s methodology for composite task scores to
 monitor the navigation performance of the Mars Exploration Rovers, Spirit and Op-
 portunity. Mann [56] suggested using MacKenzie?s modi cation [55] to Fitts? law
 in information theory to use a less simpli ed version of Shannon?s law of commu-
 nication theory in Rodriguez?s performance ratio equation. This equation change
 41
allowed ratios in Rodriguez?s model that were nearly equal to and less than one
 (desensitizing the equation such that the relative value of an agent did not mathe-
 matically result in zero), which could feasibly occur in practical applications.
 Kaupp [49, 50] used a composite task score?s value of information to determine
 an appropriate level of autonomy for navigating a maze. Using the HURON software
 [31, 32] to plan a lunar mission, Elfes [26] sought to maximize the value per cost of
 a mission by analyzing the required input e ort to the expected output bene t of
 each task.
 Expected value models are structured to facilitate comparison between di er-
 ent candidate systems. They require approximately the same input data as task
 load models, but the data is used as relative values rather than directly. This pro-
 vides more robustness to uncertainty in the data itself. These models can also be
 run with only a rough estimation of how systems or agents perform in relation to
 each other (if task and resource data does not presently exist), removing the need to
 run more experiments before beginning analysis. The application of the models are
 more  exible than task load models: rather than necessitating a quantized overall
 score (in which case the range of scores would need to be analyzed to determine
 if a gap between two values is signi cant or negligible), the scores are immediately
 referenced for relative comparison. Expected value models are best applied when
 the designer seeks to select a con guration from a set of possibilities. On the other
 hand, if any other summary data is sought (comparison of the types of workload,
 distribution of the workload, or duration of larger workload quantities), further anal-
 ysis capabilities would need to be built into the models, or a di erent model would
 42
need to be used.
 Cognitive Systems
 The goal of the third type of model is to evaluate the e ect of tasks on human
 mental processes. The methods and systems of information processing fall into this
 domain. This model applies to HRTs because it is well adapted to cover not only
 human information processing about individual tasks, but also the coordination
 demands of working on a cooperative task (either with a human or a robot).
 These models are best used for detailed analysis of the e ect of di erent au-
 tonomy levels, situation awareness, and communication architectures on a human?s
 mental workload. Due to their emphasis (if not exclusively) on cognitive processes
 only, these models do not assess physical performance. System level HRT over-
 all performance evaluations generally do not bene t from this type of model, but
 these models can be invaluable in the development and veri cation that a mission
 scenario?s required workload level is feasible.
 State Transition Networks
 This type of model attempts to frame a sequence of tasks into the agent
 states required to perform them (e.g. observer, active physical participant, standby,
 etc.). Agents only change from one state to another when a task assigned to them
 requires a di erent form of involvement. Built into the model is the assumption that
 minimizing the number of transitions will reduce the workload required to perform a
 43
task list in its given order and improve the e ciency of overall mission performance.
 This type of analysis is valuable for tracing what causes the changes to an agents?
 state throughout a scenario, and the direct e ect between the actions. Heger [40]
 used a state transition matrix to map the probabilities of each agent?s success or
 failure at a given task, with a composite task score accumulating for each task
 attempted. The transitions in this model referred to the operator yielding control
 to the robot, adjusting the autonomy of a HRT to have the greatest probability of
 success, and a timing metric was computed to obtain the expected duration of the
 task list. Yagoda [102] modeled human-robot interactions using Petri nets to map
 transitions between a UAV operator?s states.
 State transition models facilitate analysis of human attention and mental re-
 quirements for task performance. Rather than assessing from a workload perspective
 (as done in cognitive systems models), these models can easily incorporate physical
 requirements and coordination requirements into the performance analysis. Com-
 posite scores can be obtained to compare how often and what type of mental and
 physical mode transfer must occur during a mission, but the emphasis on the num-
 ber of transitions does not necessarily correlate to better overall HRT performance.
 The scores instead reveal the in uence of switching states on each di erent agent
 during task performance. A di erent type of model would be needed to represent a
 team?s overall task performance.
 Generating quantitative overall performance models for a HRT has been a
 signi cant challenge. Designers have o ered frameworks to direct analysis to a set
 of common metrics to facilitate comparison between disparate team con gurations.
 44
The majority of these frameworks lead to developing a quantitative composite score
 model to evaluate a HRT?s overall performance.
 Shah [86] described the initial development of a di erent kind of framework
 to walk a designer through the process of selecting a HRT and the task allocation
 process. Rather than using a composite score, the evaluation of task performance
 relied on speci c methods to link together common metrics for a space exploration
 task scenario. Each section of the paper provided a literature review of commonly
 used methods. The framework proposed unifying the process of selecting a team
 design and assessing a common set of task-based metrics to allow comparison of
 disparate team performance. This is a generalized framework that is an option
 to be used either instead of the quantitative models described in this paper, or in
 addition to.
 2.2.6 Implementation Challenges
 Of the four types of models described by Parasuraman [66, 65], deciding which
 best re ects the desired analysis perspective for a given application can be di cult.
 The di erent types of quantitative models are not mutually exclusive, however. In
 fact, it can be productive to combine features from several of them to fully charac-
 terize a HRT?s interactions. It may be advantageous to analyze the expected value
 of a team con guration based on a cognitive model, or to structure a task-load
 model into a state transition matrix. Kaupp [49] presented a method to select the
 autonomy level prior to deployment of a HRT that produced the highest team e ec-
 45
tiveness for task-oriented information exchange on a HRT. This analysis included
 actual robot performance data, and resource costs. A composite task score was
 developed that included execution time, pairwise comparison of e ort, value added,
 and a weighting for the  nal composite. Selecting the type of model or combination
 of models will have a signi cant a ect on the results, and can either highlight or
 obscure signi cant interactions in an application.
 Once the type of model has been decided, the kind of input data and level
 of accuracy of the data required should be addressed. The level of accuracy of
 the input data could have a signi cant e ect on the predictive performance of the
 model. Low- delity estimations feed uncertainty into the model which, depending
 on dependencies and correlations between parameters, could propagate through the
 model in unknown ways.
 To determine the relevant input data to the model, it is necessary to select the
 set of metrics that will comprehensively re ect the overall performance of the HRT.
 A designer must select a large enough set of metrics to cover relevant aspects of the
 agent interaction, but must also avoid selecting too many metrics, which runs the
 risk of generating false correlations by analyzing the same e ect from multiple angles.
 It is at this point in the model creation process that the designer must input details
 about the application, including the relevant subtask performance parameters, and
 the architecture details that facilitate the HRT?s task completion.
 The next challenge to implementing a quantitative model is explicitly framing
 the performance metrics in mathematical expressions that do not over-simplify the
 situation. For example, it would greatly simplify the modeling process to exclude
 46
environmental parameters from an analysis of how well one agent reliably follows
 the cooperative team plan. It could be assumed that an agent?s comprehension
 of the task plan, and ability to direct its own e orts to achieve the desired goal
 would have a greater in uence on the agent?s successful performance than whether
 the terrain is grass, cement, or had vertical displacement. This simpli ed analysis
 would be fairly accurate in the majority of modeled cases, but it would diverge from
 the true behavior in cases where the environment has a signi cant e ect on the agent
 performance. It is that very divergence that would be invaluable to have re ected in
 the model. It is necessary to ensure that the model includes all relevant dependent
 correlations and parameters to accurately re ect the true system performance.
 2.3 Incorporating Optimization Into a Team Performance Model
 After selecting the metrics to be included in the model, a designer must then
 link them to facilitate ease of comparison between results from di erent team con-
  gurations. Most of the researchers who utilized a composite task score as their
 overall team performance ranking [26, 42, 49, 50, 56, 74, 96] computed this value
 by a linear summation of the individual performance scores (or ratios, or expected
 values) for subtasks. It is possible to use weightings in the summation to include
 a measure of relative importance between the performance metrics. In cases where
 weightings were not speci ed, each metric had equivalent importance in the sum-
 mation. These weightings will make a signi cant di erence on the optimal designs
 returned from the system analysis. In essence, these calculations use the individ-
 47
ual performance metrics as objective functions, transforming the evaluation of the
 HRT?s performance into a constrained multi-objective optimization problem.
 By combining the metrics with weightings into a single summation, however,
 the designer applies a common optimization technique to transform a multi-objective
 optimization problem into a single-objective optimization problem, which is much
 simpler to solve. To do this, knowledge of designer preferences between metrics or
 an estimate of relative weightings between the metrics is needed.
 The weighted-sum method of aggregating objective functions is a specializa-
 tion of a larger, more general  eld in optimization theory: utility function or value
 function methods. These methods combine multiple objective functions by develop-
 ing relations between the objectives (or interactions between the objectives). These
 relations can be linear or nonlinear, but do require setting parameters to concretely
 de ne these relations.
 There are several limitations of this method that make it less than ideal for
 application to the HRT con gurtion selection problem domain of this dissertation.
 A utility (or value) function will return only one single answer (or one solution
 at a time). The function itself would need to be altered (the parameter values
 and relations) to  nd other Pareto-optimal solutions. In other words, the existence
 of multiple solutions on the Pareto front would not be detected by this method
 (multiple equally-performing team con gurations would not be found by using this
 method - only one arbitrarily selected team would be chosen). Furthermore, the
 resulting single solution has no guarantee to be any better than other solutions not
 selected. The selection of a solution would be entirely dependent on the utility or
 48
value parameters used to de ne the relations.
 Additionally, the utility (or value) function method requires users to develop
 relations between the objectives that apply for the entire design space. In the
 past, these have either been subjectively selected or all metrics assumed to be of
 equivalent value. There has been signi cant e ort to make weightings selection more
 objective, but it has been di cult to extrapolate these methods into the practical
 HRT con guration domain. Rohrmuller [75] presented a quantitative method to map
 the probabilistic interdependencies between individual performance metrics, and
 use them and their provided data to determine relationships between the metrics,
 and to compile them into a composite score that could be used to predict system
 performance and optimize performance parameters.
 For further guidance on creating this objective method, the author suggests
 the extensive literature search by Bobko [9] which provided a cross discipline analysis
 that considered the validity of using weightings to aggregate subscores to represent
 a data group. Several methods were described and the reader was directed towards
 other references for more detail. If an objective method of selecting these weightings
 could be found, it would add rigor to the methodology of using composite scores to
 optimize a team con guration.
 2.4 Conclusions from HRT Performance Analysis Literature Review
 Future space exploration will involve humans working much more closely with
 robotic technologies, both as tools to ease the humans? workload, and as peers to
 49
expedite mission scenarios. Design options for creating the architecture of future
 HRTs have been explained, with sample performance metrics given for each category.
 With the numerous options available, frameworks have been described to guide
 selection of the most relevant metrics for each speci c application. Incorporating
 several metrics into a quantitative method to facilitate comparison between di erent
 HRT con guration solutions will be invaluable for future mission design. To this end,
 the signi cant work in the research community to facilitate and develop quantitative
 models to calculate the overall performance of a HRT was analyzed.
 The purpose of this dissertation is to continue to prepare the groundwork for
 the synthesis of existing methods to compare HRTs and measure the di erences in
 their interactions. The presence of optimization theory beyond manipulating the
 multi-objective optimization problem into a single-objective optimization problem
 in the domain of human-robot interaction research is sparse. It is this area in
 particular that this proposed dissertation research will contribute signi cantly. A
 universal quantitative team performance model with a wide range of applications
 still eludes the research community, but work continues to progress towards this
 goal.
 50
Chapter 3
 Research Methodology and Algorithms
 In my approach to the HRT con guration selection problem, I have identi ed
 three orthogonal axes of research. The  rst area of research has been in character-
 izing and increasing the diversity of human and robot teams. The state of this  eld
 was described in detail in section 2.1. Second, there has been signi cant research
 on methods to measure and quantify di erences between each agents? performance
 and between overall team performance in various aspects of task completion. A
 substantial review of this body of research was provided in section 2.2. As described
 in section 2.3, there has been sparse application of advanced optimization theory in
 solving the HRT con guration selection problem - the third research axis.
 This dissertation research sought to create a methodology that would increase
 the ability to quantitatively compare solutions in the design space of the HRT
 con guration selection problem. This was achieved by synthesizing optimization
 techniques to treat performance metrics as objective functions in a multi-objective
 optimization problem. The methodology will facilitate comparison between distinct
 groups of team members, and will facilitate selection a pseudo-optimal team for a
 given task scenario.
 51
3.1 Background to This HRT Research Problem
 For the last several years, I have been researching various aspects of HRT coop-
 eration. The research has been along the HRT con guration and performance met-
 rics orthogonal axes. My previous published research ([89], [88], [90], [91]) present
 a persuasive argument and supporting evidence that future space operations would
 bene t from involving cooperative robotic team agents in addition to the human
 crew. The papers were based on the same assumptions and the same framing of
 the HRT problem, allowing direct comparisons between the results in each paper.
 These papers de ne and assess how robots and humans can work together cooper-
 atively to complete tasks, and address several critical issues about their combined
 performance.
 In terms of addressing the challenges to comparing HRT con gurations, the
 methodology developed in these papers provides a guideline for comparing HRTs
 before the mission design phase and facilitates quantitative comparison based on
 crew time as the primary criterion. This selection was guided by consideration of
 team e ciency from a task load model perspective. For a summarized discussion
 of the methodology assumptions, procedure, and results from this research, refer to
 the Appendix section A.1.
 Three questions remain from previous research and have been the guiding
 structure of this dissertation. Is there an objective quantitative method to determine
 if a neglected parameter would have produced signi cantly di erent results? Is
 it possible to objectively reduce the problem design space to a core of important
 52
parameters? With an in nite number of ways to reduce the complexity of a problem,
 how could a designer objectively determine which is an optimal HRT con guration
 for any given mission? These are the questions that this dissertation research sought
 to answer.
 3.1.1 HRT Con guration Problem Decomposition
 In a unique approach to the HRT problem domain, I have conceptually par-
 titioned designing a human-robot team into three distinct, sequential components,
 as seen in Figure 3.1. The  rst realm (\Classi cation" in the Figure) is where prob-
 lem de nition occurs. Mission goals and objectives are identi ed, and agent details,
 task details, and performance estimates and resource usage are collated. The second
 grouping (\Planning" in the Figure) encompasses all of the planning that occurs to
 build a mission from the requirements and constraints laid out in the  rst stage.
 Tasks are allocated between participating agents, schedules are developed, and re-
 source limitations are imposed. This can be done intelligently using preferences, or
 this could be done by brute force to generate all possible options.
 It is in the third stage (\Selection" in Figure 3.1) that this dissertation provides
 a unique contribution to the  eld. With the multitude of task allocation, team
 composition, and scheduling options, how could a mission designer objectively decide
 which is the optimal team for a given application? Fundamentally, this stage answers
 two questions: 1) How to objectively select an objective function set to use in
 evaluating team options? And 2) How to objectively select the best team for a
 53
Figure 3.1: The Three Stages of the Collaborative HRT Problem
 mission based on these objective functions?
 3.1.2 Performance Metrics for the HRT Con guration Problem
 The  rst step in the HRT con guration selection problem is to enumerate
 possible performance metrics that are relevant to the problem or mission scenario.
 It would be a Herculean e ort to enumerate all possible objectives and constraints
 for the problem. It has been common practice in application to arbitrarily select
 those that the designers judge most relevant. As would be expected, the solutions
 from the problem will vary with the selected objective functions. Any decision to
 leave o a metric could potentially reshape the problem domain.
 In the HRT con guration selection problem, there are many objectives that
 could be selected by a mission designer, and it is not necessarily straight forward to
 select some and neglect others. This research methodology will provide criteria to
 select a set of performance metrics to describe an overall team?s performance and
 how this will a ect the resulting team selection for a mission. This in turn will lead
 54
to a reduced set of performance metrics (objective functions) with which to compare
 di erent con gurations of the same mission scenarios. A more detailed discussion
 of the rigor behind this set reduction can be found in section 3.3.
 If a mission designer has a preference between agents or has knowledge of the
 relative importance of the performance metrics, the problem can be reduced to a
 single-objective optimization with known weightings. There are many established
 optimization methods to solve this type of problem (refer to section 2.3, so this
 variation was not assessed in this research).
 3.1.3 Top-Level Research Outline
 It is unrealistic to use all possible metrics in any analysis. To reduce this set
 to a tractable problem, this research sought a general, objective method to select a
 subset of the available performance metrics that have the most in uence over the
 problem. It is this component of the process that is a unique contribution to the
  eld.
 Figure 3.2 provides a top-level overview of the di erent components of the
 methodology proposed in this dissertation research. Each of the components out-
 lined in a bold-face color will be discussed in greater detail in sections 3.2 and 3.3.
 The green boxes represent algorithms from the literature that were implemented in
 Matlab for this dissertation research. The blue-outlined oval represents an existing
 software tool (a Matlab toolbox) that was used extensively in this research. It is
 the synthesis of these four components that represent the proposed methodology
 55
Figure 3.2: Top Level Outline of Proposed Methodology
 56
for how to objectively reduce the problem complexity of the HRT con guration
 selection problem and forms the unique contribution of this dissertation research.
 The synthesis of the components itself is not what makes this research dissertation-
 worthy - it is what is accomplished in the HRT domain that provides the valuable
 contribution.
 There has been signi cant research in the area of reducing the dimensionality
 (or complexity) of multi-objective optimization problems (see section 3.3 for a more
 thorough discussion). Following the example of this  eld, an objective reduction
 algorithm was selected and implemented to enable selection of the most in uential
 objective functions in a quantitative manner. The implementation was validated
 in the knapsack problem domain. The resulting solution set was compared to that
 obtained by a more traditional multi-objective optimization solution technique for
 several knapsack problem instances.
 This methodology (using the objective reduction algorithm to reduce the ob-
 jective functions of a multi-objective optimization problem, and an analysis of the
 resulting solutions) was then used in two di erent applications of current HRT
 research. In both cases, the goal was to analyze if this research?s proposed method-
 ology achieved di erent results than the more traditional solution methods. The
  rst, a simple case study of reconnaissance rovers (modeling this research?s analysis
 on the sample problem described in [96]), was designed to test the common per-
 formance analysis methods of other researchers and compare the results to those
 achieved using this methodology. After demonstrating its utility on smaller-scale
 HRT problems, the methodology was used on a demonstration large-scale HRT con-
 57
 guration selection problem. This  nal test case demonstrated how the methodology
 could help a priori decision makers evaluate and select a  nal con guration from a
 complex set of possible HRTs.
 This steps beyond the question of how to pick the best team from the multiple
 options without more information. Is it possible, and how could a designer claim
 that the described optimization problem and its solution set represent an objectively
 better de nition of the problem than others? Which are the best overall solutions
 to the problem { those resulting from objective set A or from objective set B?
 Fundamentally, this is a question of how to create an objective methodology
 to select HRT performance metrics (objective functions to gage task performance).
 This involves evaluating how the selection of a set of objective functions and con-
 straints a ect the resulting solution set with the intent of providing a framework to
 simplify the problem without losing important problem information. It is this type
 of objective methodology that this dissertation research has created.
 3.1.3.1 Representing Metrics and Problem Details
 The  rst step in applying the developed methodology (as seen in the outline
 in Figure 3.2 and more detailed in Figure 3.3) is to enumerate the possible metrics.
 This requires representation of real-world concepts as mathematical expressions de-
 pendent on a common set of design parameters. The concepts Rodriguez [74] devel-
 oped for using performance and resource ratios in a combined task load model and
 expected value model will be used in this analysis. Following Rodriguez?s exam-
 58
Figure 3.3: Methodology Development for Performance Metric Selection
 ple, mathematically representing di erent performance quantities can be simpli ed
 by using non-dimensional parameters rather than keeping track of and comparing
 di erent units. This also facilitates using di erent quantities within the same ex-
 pression (rather than adding a time unit with a distance unit, a ratio of time to a
 reference and distance to a reference allows the unit-less values to be added).
 Resources required for task completion should be represented both as an objec-
 tive function (seeking to minimize resource usage to enable a larger range of tasks)
 and as a constraint (limited resources available). Other objectives such as mental
 workload of the EVA astronauts, will require analysis of input task performance
 data.
 Performance metrics often measure similar variables or e ects. The likelihood
 that some of them will overlap in their analysis is fairly high. The performance
 59
metrics should be evaluated to see if there is redundancy in the criteria used to
 estimate performance. As described by Donmez [24], using metrics that correlate to
 the same data can result in  nding false signi cant e ects in the data. Performance
 metrics should be analyzed to di erentiate between similar (acceptable if they assess
 di erent qualities) and redundant objectives (potential trouble). Very di erent per-
 formance metrics can also have concurrent goals such that improving performance
 in one metric directly correlates to improved performance in another.
 The goal of this analysis is to reduce the complexity of a problem by pruning
 the performance metric set to a minimal set. It is with the reduced metric set that
 each candidate team con guration can be objectively compared.
 3.1.3.2 Additional Experiment Questions
 There are several additional experiment questions that will be considered in the
 analysis. Is there a relation between the number or kind of metrics and either their
 convergence rate or convergence success? At what point (with how many metrics)
 does the problem fail to converge? How does including redundant or overlapping
 metrics change either the team selection or the magnitude of variation between the
 teams? A comparison of team combinations that achieved the same solution results
 would also be illuminating. All of these questions seek more information about
 the interplay of the design space and would lead to more information that could
 in uence a mission designer?s decisions.
 60
3.1.4 Conceptual Simpli cation { The Knapsack Problem
 To aid in an initial analysis of the proposed methodology, the human-robot
 team selection problem (an over-constrained multi-objective optimization problem
 as de ned by Kurtzman [52]) can be compared to the knapsack problem. This
 simpli es the concepts involved without losing details in the analogy, and will be used
 in this research to simplify the real-world problem during methodology development
 and validation. The knapsack problem can be expressed as a set of problems with
 known solutions. This research methodology can therefore be veri ed in a domain
 with known solutions before being applied to a more complex problem domain.
 Referring back to the three orthogonal axes that represent the HRT con g-
 uration selection problem (see Figure 3.1), the \classi cation" stage represented
 all of the problem details. In this conceptual simpli cation, the complexity of the
 HRT con guration selection problem is removed from consideration, and the known
 knapsack problem is dropped into its place.
 The knapsack problem provides a conceptual simpli cation that maps easily to
 the large-scale HRT con guration selection problem. The knapsack problem has a
 list of candidate items that could be selected to be placed into the knapsack (a  xed
 quantity of and types of items to be placed into the knapsack). Similarly, the HRT
 con guration selection problem has a  xed quantity of candidate teams that can be
 selected. Imposing volume and weight constraints on the knapsack problem maps
 to implementing resource and usage constraints in the HRT con guration selection
 problem on each of the agents and available consummables.
 61
For the knapsack problem, the goal is to identify which items to place in the
 knapsack to maximize overall pro t. For the large-scale HRT con guration selection
 problem, the goal is to identify the candidate team con guration that maximizes
 overall team performance. Rather than being concerned with multiple di erent kinds
 of objectives (such as task allocation scheme di erences between candidate teams),
 the knapsack problem?s simpli cation considers only overall pro t values for each
 item to be packed (or the performance capability of each team agent), which can be
 summed in a linear manner to aggregate into an overall perspective on total pro t
 (or overall team performance).
 A limitation of analyzing the HRT con guration selection problem from a
 knapsack problem approach, however, is the limitation in the variety of analysis
 that can be performed. The knapsack problem weighs each of the objectives equally
 and sums their results for an overall pro t value. In the HRT con guration selection
 problem, it cannot be assumed that each of the performance metrics (objective func-
 tions) can be weighed equally. The nature of the performance metrics is not additive
 - they should each be optimized for their own merits, and an overall team analysis
 evaluated to determine how well each team con guration meets the requirements.
 The knapsack problem will be used to illustrate the interdependencies of the
 objective functions (performance metrics in the HRT problem) and their e ect
 throughout the design space search. The knapsack problem will be used as a reduced
 order application domain to verify that the methodology works and to calibrate the
 technique before returning to the HRT con guration selection problem.
 The knapsack problem description has many similarities to the HRT con gu-
 62
ration selection problem. Including more or less detail changes not only the problem
 description, but can have vastly di erent e ects on the resulting solution set. For
 example, in the standard knapsack problem representation, when selecting items to
 take from the wealthy house, the thief assumed that it would take no e ort on his
 part to sell all of the items that he can  t in the knapsack. His workload for col-
 lecting the pro t was negligible. If, however, the problem description was expanded
 to include the e ort the thief must expend to take the items to di erent locations
 to sell them, and any and all arrangements that must be made, it is likely that an
 entirely di erent set of items would end up in the knapsack.
 This highlights the importance of fully describing all relevant portions of the
 problem. This dissertation?s goal is to address this very problem { how to ensure
 that all of the relevant problem pieces are included in the objective functions and
 constraints while not overloading the problem with irrelevant data and detail.
 A remaining issue to be discussed is how to know if the solution set resulting
 from a problem description is a good, representative set. Assuming that the returned
 Pareto set from a multi-objective genetic algorithm (MOGA) was accurate and
 complete would be faulty. It is for this reason that the methodology development
 will occur in the knapsack problem domain, as explained in Figure 3.4. Problems
 with established, documented solution techniques will be used to compare how the
 reduced objective function sets a ect the solution sets.
 It will be productive to compare the solution sets that the methodology gen-
 erates for the knapsack problems to their established solutions. It is not necessarily
 the case that the methodology-generated solutions will be identical to the solution
 63
Figure 3.4: Methodology Validation in the Knapsack Problem Domain
 sets generated by a more traditional solution technique. The methodology will be
 solving the problems from an entirely di erent perspective. If the methodology-
 generated solution sets can be veri ed by a more traditional solution technique,
 that would be ideal. If they are not, the methodology will still produce meaningful
 solutions.
 3.2 MOGA Algorithmic Approach
 To solve the constrained multi-objective optimization problem, Matlab?s multi-
 objective genetic algorithm toolbox was used to come up with the optimal con g-
 urations for a given application and speci c scenario. Matlab?s multi-objective ge-
 netic algorithm toolbox was a tool used to generate new populations of data. A
 64
Figure 3.5: Outline of the Algorithmic Approach of This Research
 user-de ned  tness function contains all of the problem-speci c parameters. Figure
 3.5 outlines the algorithms implemented in this research to combine with Matlab?s
 MOGA routine. Several existing algorithms from optimization literature have been
 implemented to incorporate multiple constraints into the multi-objective function
 evaluation and to account for potential search problems. Each component will be
 discussed in this section. Each algorithm is discussed below.
 The results from this type of analysis yield a Pareto set of non-dominated solu-
 tions. This solution set represents di erent design options that are equally capable,
 65
according to the de ned objectives, at completing the speci c mission scenario. This
 solution set, however, only applies to the speci c problem described by the mission
 requirements. If other objectives had been selected, it is highly possible that an
 entirely di erent solution set would be the result.
 3.2.1 Use of Matlab?s MOGA Toolbox
 The role of Matlab?s MOGA toolbox in this research e ort should be put
 in perspective to clarify how it contributes as a tool to the overall methodology,
 but does not in itself represent anything more than number crunching. Figure
 3.5 provides a visual representation to demonstrate where in the overall MOGA
 implementation the toolbox is used. Each run begins with the  tness function?s
 evaluation of a generation of design points, which includes the objective functions,
 constraints, and the user-speci ed MOGA evolution parameters.
 These  tness values are fed into Matlab?s gamultiobj.m function to create a
 new generation based on the  tness of the previous generation?s design points. This
 is iterated either for a speci ed number of generations, or until a tolerance between
 generation  tness is reached. In the former the algorithm does not necessarily con-
 verge, while in the latter the algorithm?s convergence is the condition that terminates
 the MOGA. If the algorithm converges, the resulting generation is the Pareto set
 of solutions, where each solution is just as optimal as any other solution in the set,
 with none being better than any other.
 It should be emphasized that Matlab?s built-in toolbox is used solely to create
 66
a new generation of solutions for consideration. All other portions of this analysis
 process represent synthesized algorithms from other researchers.
 3.2.2 Stochastic Analysis of a MOGA
 A single run through a genetic algorithm might not search the entire design
 space. It is possible that the algorithm could become stuck in a local minima,
 resulting in non-reproducible results and sub-optimal solutions. To avoid this type
 of narrow-search complication, a stochastic wrapper function was written for this
 research to run the MOGA multiple times, keeping track of the non-dominated
 solutions across all of the runs. The wrapper function has the ability to call the
 multi-objective algorithm any number of user-speci ed times (10 is recommended
 as a general rule-of-thumb for statistical relevance). This stochastic solution set
 better describes the design space. Over the iterative runs, a larger swath of the
 design space will be searched, and the algorithm has a better chance of converging
 on a steady state solution set.
 The stochastic wrapper assesses if any of the solutions from each run of the
 MOGA dominate each other. To achieve this goal, this research implemented the
 continuous update method for non-dominated solutions from Deb [22]. For use
 in this research, this algorithm was expanded to accommodate multiple objectives
 (n-dimensional) during the domination evaluation (the original continuous update
 method was speci ed only for the simple 2-dimensional case with two objectives).
 The continuous update method is a faster computational method that does
 67
not check all solutions before deciding domination. Instead, it keeps a running list
 of non-dominated solutions. When an item in this list is dominated, it is removed
 from the list. When a new design point is demonstrated to be non-dominated by
 any other solution in the data set, then that point is added to the non-dominated
 list. After each stochastic run, the new Pareto set is compared with the existing
 non-dominated set, and the overall set of non-dominated points results.
 3.2.3 Multi-Objective Constraint Handling Technique
 Single-objective optimization problems have straight-forward options for in-
 corporating a penalty function into the objective to handle the constraints simulta-
 neous with solving the problem. A more nuanced technique is required for the case
 of multi-objective optimization. The main di erence between all of the available
 techniques is how infeasible solutions are treated during the optimization process.
 The simplest approach is a static penalty function that assigns a constant
 penalty to the  tness of all infeasible solutions. Frequently, this results in the
 infeasible solutions being disregarded in the evolutionary algorithm. It could be
 desirable, especially in situations where there are few if any feasible solutions, to use
 the information contained within infeasible solutions to further the algorithm. When
 too much emphasis is put on infeasible solutions, however, the time to convergence
 for an algorithm increases.
 Woldesenbet?s [101] adaptive multi-objective constraint handling technique has
 been implemented for this research, and is called from within the MOGA?s  tness
 68
function. The primary advantage of this algorithm is that the relative importance
 of a given solution?s  tness and the value of constraint violations is altered in each
 generation based on the number of feasible individuals in the given generation.
 In this way, the algorithm adapts to use infeasible solutions (by de-emphasizing
 constraint violation) when there are few feasible solutions. This adaptive penalty
 function does not require problem-speci c parameter tuning.
 3.2.4 Hypervolume Indicator Pareto Set Quality Metric
 As seen from the convergence success and run time analysis from the previous
 sections, from a purely computational time perspective, the methodology proposed
 in this research does not seem bene cial. However, an analysis of the resulting
 Pareto solution sets will bring the utility of the method into greater light. It is an
 established practice in optimization theory to use quality metrics to gage the merits
 of a Pareto set according to two parameters: the convergence of the solutions to the
 Pareto front and the diversity of the solutions in the Pareto set. This type of metric
 enables comparison between two entirely di erent Pareto sets to gage the goodness
 for the solution set.
 To evaluate the distribution of solutions along the Pareto front, n-dimensional
 quality metrics were used. A common quality metric in this regime is the hyper-
 volume indicator (originally proposed by Zitzler and Thiele [107]). Bader [5] states
 that this is the only quality indicator that can be demonstrated to be completely
 sensitive (and correspond to) Pareto dominance, such that a higher hypervolume
 69
value of a  rst Pareto front with respect to a second Pareto front indicates that the
  rst front dominates the second.
 An additional advantage of using the hypervolume indicator as a Pareto set
 quality metric is that each front can be assessed independently of other sets as
 easily as its relative coverage can be compared (a valuable quantity both relative
 and absolute).
 Zitzler [105] proposed using two complementary quality measures for the hy-
 pervolume calculation: the S and D functions. The S function is a measure of the
 volume of the objective space weakly dominated by a Pareto front (the size of the
 dominated space). The S value of two di erent Pareto fronts cannot be used to
 determine which set entirely dominates the other. A second measure is needed for
 the relative comparison.
 The D function represents volume coverage di erence of two Pareto sets, and
 allows for a relative comparison between the two sets. The function D is de ned
 by D(A,B):= S(A+B) - S(B). The calculated value represents the volume that is
 weakly dominated by front A but not by front B. A value of D(A,B) = 1 means that
 the front B is entirely weakly dominated by A. A value of D(A,B) = 0 means that no
 points in front B are dominated by front A. This metric is not commutative. D(A,B)
 does not necessarily equal 1-D(A,B), so both must be calculated for comparison.
 Figure 3.6 (originally published in [108]) illustrates the concepts behind the
 S and D volume calculations. It considers two di erent Pareto fronts (represented
 in 2-D for visual clarity, although the metrics apply to n-D problems). For the
 maximization problem represented in the left image, front 1 and front 2 clearly
 70
Figure 3.6: Illustration of the Hypervolume Quality Metric for Assessing Pareto
 Front Distribution. Figure was published in [108]
 overlap, but neither is obviously better across both objectives. From the  gure,
 S(front 1) is equal to the region that is the sum of  and  and S(front 2) is equal
 to the region that is the sum of  and  . The region of the  gure covered by both
 front 1 and front 2 is S(front1 + front 2) =  +  +  . D(front 1, front 2), then,
 is equal to  and D(front 2, front 1) is equal to  .
 Zitzler has provided his C-language implementation of the hypervolume indi-
 cator as open source, and it has been used in this research [106]. This code is called
 with a speci ed number of objective functions and either one or two text  les. The
 text  les contain the Pareto data (where columns correspond to separate objective
 functions) for a given Pareto set. If only one data  le is provided to the hypervol-
 ume code, it calculates only the S metric for the Pareto data. If two text  les are
 provided, the hypervolume code outputs the S metrics for each of the data  les, and
 also D(data1, data2) and D(data2, data1).
 71
However, it should be noted that to use this hypervolume code, the input
 data  les must have the same number of objectives. In each of the hypervolume
 calculations in the following section, it will become evident how this fact was used. In
 essence, the objective numbers from the reduced objective set were used to identify
 which objectives from the control objective set to use. This is detailed further when
 the algorithm is used in Chapter 4.
 3.2.5 Many-Objective Optimization Extension
 It is not surprising that for large, complicated problems, the standard MOGA
 algorithms do not converge. It is well documented in optimization research that
 as the number of objective functions and constraints increases, it becomes more
 likely that progressive generations fail to improve the solution set by a signi cant
 degree. Brockho [11] explained that as the objective space is widened by an in-
 creasing number of objectives, the probability that design points are non-dominated
 increases. In other words, the probability that one design point is better across all
 objectives than another design point becomes statistically less likely. In this case,
 a larger number of Pareto solutions would be kept between generations to maintain
 the diversity. This would result in a lower selection pressure to  nd new solutions.
 In essence, the  tness between the early generations of a MOGA might not change
 signi cantly, and the algorithm may stop searching. According to Ishibuchi [46], the
 number of solutions required to approximate the Pareto set increases exponentially
 with the dimensionality (number of objective functions) of the problem.
 72
There is a signi cant quantity of research in the literature devoted to alleviat-
 ing these problems. The  eld is called Many-Objective Optimization (MaOO), and
 generally refers to problems with more than two objective functions. Ishibuchi [46]
 identi es  ve di erent categories of MaOO research: selection pressure, indicator-
 based search, preference based search, dimension/objective reduction, and visual-
 ization techniques.
 There are many existing algorithms from journal papers that could be in-
 tegrated into a MOGA (or other solution technique) to aid in algorithm conver-
 gence. These algorithms can be integrated to increase selection pressure (relax the
 Pareto dominance criterion [1], [77], [28]), change from a domination algorithm to
 an indicator-based search (use a di erent method besides domination for ranking
 [44], [109]), or use dimension and objective reduction ([79], [11], [70]). If applying
 one algorithm from this  eld does not work for a given problem domain, a di erent
 algorithms could be used to reach convergence.
 Of these categories, objective reduction was selected to be utilized in this re-
 search. The  nal application domain used in this research will involve an objective
 function set consisting of 15 objectives - clearly in the realm of MaOO. This re-
 search?s methodology proposes using an objective reduction algorithm to reduce the
 objective set prior to running Matlab?s MOGA and to observe how well this im-
 proves convergence. Objective reduction has a straightforward conceptual analogue
 to the intended application of this research (HRT con guration selection through
 performance metric analysis), and analysis of the omitted objectives and remaining
 objectives will increase knowledge about the problem domain.
 73
3.3 O ine Objective Reduction
 Solving a problem with a large number of objectives can encounter several
 standard challenges. First, the large number of objectives makes a larger set of
 trade-o s between solutions. In other words, there is a large set of design variables
 that can be varied between solutions, and the size of the solution space increases.
 Second, a designer seeking to select a single solution from an optimized set has a
 much larger volume of traits to consider between the solutions. This can be im-
 possible for a human mind to maintain in a logical way because of the di culty
 in visualizing high-dimensional problems. Third, as the number of objective func-
 tions under consideration increases, the amount of computational time drastically
 increases.
 Omitting objective functions from consideration has the potential to greatly
 improve computation time, and greatly simplify a problem. It is important to know
 quantitatively, however, how omitting an objective a ects the problem characteris-
 tics. If the addition of an objective function to the set appears to have no e ect
 on the resulting solution set, then the problem domain is insensitive to this ob-
 jective. An objective that does not a ect the solution does not contribute to the
 optimization of the problem and can be logically neglected from further analysis.
 Alternatively, if including an objective function in the analysis greatly changes
 the solution set, then the problem is sensitive to this objective. Logically, this objec-
 tive should be included in further analysis. In other words, as with any optimization
 problem, it would be expected that as the objective functions for a given problem
 74
are changed (either by weighting or by a di erent set of functions), the resulting
 solution set will be correspondingly changed.
 In application, this concept will be used to specify a reduced set of objective
 functions to be applied iteratively with a speci c mission scenario to observe the
 e ect on the generated solution sets. This will provide both validation that the
 entire design space has been searched, and that the  nal solution set represents the
 best overall solutions.
 A signi cant body of literature exists that is directly related to this problem
 of reducing the number of objective functions in a multi-objective optimization
 problem. It is sometimes called dimensionality reduction, and is a common problem
 in the  elds of pattern recognition and image processing. In general, these methods
 can be divided into two realms: feature selection, and feature extraction. Feature
 selection involves downsizing from an existing set of objectives, creating a subset.
 Feature extraction draws new, non-redundant relationships and information from
 an existing body of data. Feature extraction can involve constructing combinations
 of variables. Both can be very useful in analyzing a large volume of data.
 In the optimization literature, feature extraction is best represented as princi-
 ple component analysis and feature selection by con ict-based approaches. Feature
 extraction allows synthesis of problem characteristics to form new relationships,
 while feature selection works with the existing problem characteristics. Both meth-
 ods use di erent principles to preserve the properties of the underlying optimization
 problems. Both of these approaches can be used either a posteriori or in real-time
 during a search algorithm.
 75
3.3.1 Selecting an Objective Reduction Method
 Feature selection was utilized in this research because of the primary applica-
 tion domain (the HRT con guration selection problem). The set of relevant perfor-
 mance metrics of the HRT con guration selection problem resulting from objective
 reduction will be most useful if it clearly re ects the input objectives. Rather than
 coming up with a composite numeric answer that represents a combination of per-
 formance metrics, feature selection will allow easy traceability between selected and
 omitted objectives, and provide valuable information about the important problem
 components.
 Objective reduction can be performed at two di erent places within a problem
 domain. In real-time (online objective reduction) often occurs within an iterative
 evolutionary algorithm, updating and reducing the objective function set between
 generations. O ine objective reduction is performed on an existing body of data,
 after a solution set has been found.
 O ine objective reduction can specify which objectives are redundant within
 a set. In other words, o ine objective reduction can determine which objectives
 do not contribute new information to the problem structure. It can also be used
 to illuminate various aspects of the problem domain to assist in a posteriori deci-
 sion making and solution selection. O ine objective reduction is utilized in this
 methodology.
 Three primary approaches to objective reduction have been greatly examined
 in the literature. Saxena and Deb [23, 78, 79] developed several online objective
 76
reduction methods based on principle component analysis, and have focused on
 incorporating constraint reduction into the multi-objective optimization. Jaimes
 and Coello Coello [54, 47] integrated con ict-based objective reduction into a multi-
 objective evolutionary algorithm by using correlation between nondominated vectors
 to estimate the con ict between each pair of objectives. Their approach scheduled
 the reduction stages during a MOEA to decrease the number of objectives that
 needed to be evaluated during the search. This work relied on several arbitrary
 variables that required an adaptive algorithm.
 The third approach to objective reduction was deemed most valuable for this
 research. Brockho and Zitzler?s con ict-based approach [10, 11, 12] to objective
 reduction sought to  nd the minimum objective set for a given optimization problem
 that preserves the underlying dominance structure. The con ict-based approach to
 objective reduction follows the feature selection framework and can be implemented
 both during a MOEA and a posteriori. Solution pairs are compared separately.
 Brockho developed both an exact algorithm and several heuristic algorithms for
 objective reduction.
 The heuristic algorithms, based on a greedy algorithm approach, calculate
 a minimal objective subset that has at most a  -error in the resulting dominance
 structure. It is possible to run both the exact algorithm and the heuristic algorithms
 for the case of  -error = 0%. The exact algorithm will  nd a smaller subset of
 objectives, but it will be overwhelmed with the computational complexity in the
 case of a larger number of objectives. The exact algorithm?s run time is polynomial
 with respect to the size of the solution set, but is exponential in the number of
 77
objectives. The greedy heuristic will have a much smaller run time, with run time
 only becoming an issue for problems with more than 50 objectives and more than
 200 solution pairs [12]. The main disadvantage of the greedy methods is that they
 are not guaranteed to  nd the minimum objective set, merely a greatly reduced set
 (minimal set). The resulting set size of the minimal objective set is not signi cantly
 larger with the heuristic algorithm than with the exact algorithm [12]. For the HRT
 con guration selection problem, this performance of the greedy heuristic algorithm
 was deemed su cient.
 For application in this dissertation, objective reduction will be used in two
 di erent types of applications: in the knapsack problem domain and in the human-
 robot team con guration selection domain, both as a posteriori analysis. It is highly
 unlikely that any of the objectives in either of these domains will be wholly redun-
 dant. In other words, it is anticipated that there will be some level of con ict
 between each pair of objectives, and some new information provided by each. Ob-
 jective reduction will be used to  nd the di erent degrees of con ict. Brockho ?s
 greedy  -MOSS algorithm was implemented for this research to facilitate work in
 these two domains.
 The minimum size of an objective set will increase with the number of ob-
 jectives up to a speci c point, dependent on the number of solutions [10]. It is
 logical that the larger the search space, the more information that will be needed
 to characterize it, and therefore a larger minimum set size will be needed.
 This research implemented Brockho ?s heuristic greedy  -MOSS algorithm
 [12] to be used in a posteriori analysis of multi-objective optimization problems. It
 78
is anticipated that application to both the knapsack and HRT problem domains will
 reduce problem complexity and reduce the amount of information that needs to be
 looked at during the solution decision-making process.
 3.3.2 Brief Overview of Critical De nitions and Concepts
 There are several critical de nitions and concepts that are necessary to un-
 derstand the implementation of the objective reduction methodology used in this
 dissertation. These are reviewed brie y below, with de nitions from Brockho . For
 a more detailed description of the de nitions and their in uence on the algorithms,
 please refer to [10, 12].
 Fundamentally, the objective reduction algorithm compares all solution pairs
 across the objectives. One solution (~x) dominating another (~y) means that it is
 better in all objectives than the second. With each of the following de nitions, the
 objective function set represented by F is notional: it can represent either a single
 objective function or a set of objective functions. A Pareto front resulting from a
 genetic algorithm is the set of points that are either non-dominated or incomparable,
 i.e. no solutions are better than the others in the set across all objectives. A strict
 de nition of weak dominance is as follows:
 De nition. A solution ~x weakly dominates a solution ~y if and only if ~x is not
 worse than ~y in all objectives in the set F, such that
 ~x  F ~y :, 8fi 2 F : fi(~x)  fi(~y) (3.1)
 De nition. A solution ~x and a solution ~y are comparable if either ~x  F ~y or ~y
 79
 F ~x.
 ~x and ~y are incomparable if neither ~x  F ~y nor ~y  F ~x.
 A Pareto optimal solution set for an arbitrary optimization problem contains all
 solutions that either weakly dominate or are incomparable to any other solution.
 In the con ict-based approach to objective reduction of the  -MOSS algo-
 rithm, the underlying domination structure according to each objective function
 is determined (which points dominate each other according to each objective). A
 short-hand notation is adopted to represent the dominance relations of an objective
 function by the symbol  F for each di erent objective or set. If two di erent objec-
 tives, or two di erent objective subsets, do not have the same dominance relations,
 they are said to be con icting.
 De nition. Two objective function sets F1 and F2 are called con icting if their
 weak Pareto dominance relations di er, that is,
  F1 6= F2
 . The objective sets are called noncon icting otherwise
  F1= F2
 .
 When attempting to reduce the complexity of a problem, it is often desirable
 to identify any redundant features or objective functions. These add no additional
 information. In the context of con ict-based analysis of objective functions, redun-
 dant objectives contain the same dominance relations.
 80
De nition. A set F 0  F of objectives is called redundant if and only if there
 exists an objective subset F 00  F 0 that is noncon icting with F 0.
 In other words, if there is a subset of objectives that contains the same dominance
 relations as the whole set, then the redundant objectives can be omitted without
 altering the underlying dominance structure of the whole set.
 In many practical applications, the dominance relations are too strict to allow
 for objective reduction. In other words, removing any objectives from the set would
 change the dominance structure of the problem. Instead, an acceptable error margin
 can be used to de ne a  -noncon icting reduced objective set.
 De nition. Based on the weak  -dominance relation shown in the following equa-
 tion, a  -noncon icting set has a maximum error of  when it is wrongly assumed
 that ~x  F 0 ~y. Then ~x is not worse than ~y in all objectives by the additive term  .
   F 0 := (~x; ~y) j ~x; ~y 2 X ^ 8i 2 F
 0 : fi(~x)   fi(~y) (3.2)
 The term  -noncon icting can also be used to compare the dominance relations
 of two objective sets as follows.
 De nition. Let F1 and F2 be two objective sets. We call F1  -noncon icting with
 F2 if and only if both ( F1   
 
F2) and ( F2   
 
F1) holds; otherwise F1 and F2 are
 denoted as  -con icting.
 In the  rst case from the de nition, the inclusion of the error margin means
 that F1 and F2 have become noncon icting sets even when their strict dominance
 relations might have made them con icting.
 81
De nition. Let  1,  2 2 < and F1, F2 be two objective subsets. The ( 1, 2)-
 dominance relation
   1; 2F1;F2
 on  is de ned as
 ~x   1; 2F1;F2 ~y :, F1(~x)  1  F1(~y) ^ F2(~x)  2  F2(~y);8(~x; ~y) 2 X: (3.3)
 where the ^ symbol represents a logical \and" operator. If both of the relations
 for F1 and F2 exist, then the point pair (~x, ~y) satis es the ( 1, 2)-dominance relation.
 3.3.3 The  -MOSS Objective Reduction Algorithm
 Brockho has de ned three types of closely-related problems. In the minimum
 objective subset (MOSS) problem, the goal is to  nd a minimum objective set while
 preserving the dominance structure of the original problem. It is highly likely,
 however, that this approach could be too restrictive, resulting in no reduction of the
 problem complexity.
 In the  -minimum objective subset ( -MOSS) problem, the goal is to compute
 a reduced objective set that is  -noncon icting with the original problem. The
 special case when  = 0 is the MOSS problem. Given a maximum-tolerable  -error,
 this problem formulation will determine a reduced objective subset that has at most
 that  -percentage error in the underlying dominance structure caused by omitting
 objectives.
 82
Figure 3.7: Brockho ?s Greedy  -MOSS Algorithm [12]
 In the third problem type, the minimum objective subset of size k with mini-
 mum error (k-EMOSS), the goal is to  nd an objective subset that is  -noncon icting
 with the original problem but with at most k-objectives in the resulting subset. This
 specialization of the problem can be useful if a speci c desired  nal set size is known
 a priori.
 For this research, Brockho ?s heuristic greedy  -MOSS algorithm (algorithm
 2 from [12]) was implemented. The algorithm is reproduced in Figure 3.7.
 As Brockho described [12], the in uence of a  -error on the size of the result-
 ing objective set depends greatly on the problem itself. The impact of the objective
 reduction algorithm on a problem will vary greatly depending on properties of the
 problem itself. While this algorithm has been demonstrated to be a phenomenal
 tool, it might fail to yield a reduced set in some instances, based solely on proper-
 ties of the multi-objective problem itself. This should be kept in mind for future
 83
application of the algorithm.
 3.4 Summary
 Referring back to Figure 3.2, the  rst two steps of the research process have
 been here described. The methodology that underlies the reduction in the perfor-
 mance metrics for the HRT problem was described in section 3.1.3 and the objective
 reduction method selected was discussed in 3.3.3.
 After the methodology was veri ed in the knapsack problem domain (see Chap-
 ter 4), it was applied to the HRT con guration selection problem (see the experi-
 ments of Chapter 5 and Chapter 6). Reinserting the original application domain, a
 dataset of robotic agents and humans of various capabilities was used to run a sam-
 ple case through the methodology. The resulting solution set was a pseudo-optimal
 assignment of which con guration would be best to perform the speci ed mission.
 The methodology that was developed in this chapter represents a unique con-
 tribution to the  eld of HRT performance analysis. Given a set of agents and given
 a diverse set of potential metrics, this methodology allows a designer to rigorously
 choose a team con guration and task allocation within the team to satisfy one or
 more metrics in an optimal manner. A designer will not have to make a priori
 decisions about which are the critical metrics.
 Four di erent algorithms were synthesized and incorporated through a MOGA
 to enable use of this research?s methodology. Implementation and application of
 each of these algorithms has been done previously in the research. The unique
 84
contribution of this methodology is using the synthesis of the algorithms in the
 HRT domain to achieve superior and novel performance analysis and evaluation of
 results.
 This type of methodology has not been done in the research  eld, and will be a
 valuable, unique contribution. Using the three orthogonal axes analogy, this research
 will be on the plane that is de ned by the performance metrics and optimization
 techniques. This allows the substitution of the knapsack problem for the HRT
 con guration problem because the application domain will be used to feed data
 sets into metrics. In other words, the solution technique is not limited to a speci c
 application domain. This methodology can be used for a much wider range of
 applications. Having a generic, objective methodology like this will be a necessary
 step to advancing the goal of using cooperative human and robot cooperative teams
 in future space activities.
 85
Chapter 4
 Objective Reduction Validation and Knapsack Application
 Experiment
 4.1 Objective Reduction Algorithm Validation
 The greedy  -MOSS objective reduction algorithm described in section 3.3 was
 implemented in Matlab for this research. Before the objective reduction algorithm
 can be utilized, however, problem information must be available. Ideally, an initial
 Pareto front representing feasible solutions would be input into the algorithm. A
 Pareto front contains valuable information - correlations between variables, trends
 of each objective, and an evaluation of the trade-o between the objectives.
 An example from Brockho ?s paper [12] was used to validate the algorithm?s
 implementation. The example?s data has been reproduced in Figure 4.1. In this
 example, there are four separate objective functions, all to be minimized, each with
 values for four variables. Without allowing for an error in the underlying dominance
 structure of the problem, no reduction of the objective set was possible for this
 example. When the problem data was input to the  -MOSS algorithm with a tight
  -error of 0, it returned the same results as anticipated from Brockho ?s description
 - that no reduction of the objective set size was possible.
 With a  -error of 0.5, the algorithm returned a reduced objective set of F 0
 86
= fF1, F3, F4g. In other words, the algorithm identi ed that there is at most an
 error of 0.5 in the dominance relations when the entire objective set is replaced by
 this subset. It should be recalled that the algorithm does not guarantee to  nd the
 minimal objective set, nor the optimal objective set. The optimal reduced objective
 set for this example (from inspection of the data, as described in [12]) is F = fF2,
 F4g when a  -error of 0.5 is tolerated. This smaller reduced objective set is the
 0.5- -minimum with respect to the entire objective set.
 The di erence between the  -minimal and the  -minimum reduced objective
 sets stems from the algorithm?s method of selecting amongst a set of equal-scoring
 (after considerable computation) objective functions to place in the reduced set. The
 algorithm selects the  rst objective function in the list of equal-scoring. This caused
 the placement of F1 into the reduced set rather than F2. On the next iteration loop
 within the algorithm, the fact that F1 had already been selected in uenced the rest
 of the decisions regarding the dominance relations. If the objective functions F1 and
 F2 were switched in order before being input into the algorithm, a di erent reduced
 objective set would result.
 As part of the algorithm implementation, a subfunction was written to cal-
 culate the  -error that would result from using a speci ed objective subset rather
 than the full objective set. This subfunction was used to complete the validation of
 the algorithm implementation. Inputting F = fF2, F4g as the reduced objective set
 yielded the result that the  -error would be 0.5. Similarly, inputting F = fF1, F4g
 as the reduced set yielded the result that the maximum  -error would be 2.5. Both
 of these results also match those described in Brockho ?s work [12].
 87
Figure 4.1: Objective Reduction Algorithm Validation Test Problem (reproduced
 from a similar  gure in [12])
 The goal of using this objective reduction algorithm is to reduce an unwieldy-
 sized full objective set to a more manageable version without changing the under-
 lying dominance structure of the optimization problem. The  -MOSS algorithm
 performs this task very well. It is not required to come up with the optimal so-
 lution to these complex over-constrained HRT-con guration selection problems. A
 reduced-complexity problem is the desired result.
 4.2 Knapsack Test Problems Experimentation
 The  rst set of experiments run in this dissertation were to demonstrate the
 utility of using the new methodology laid out in Chapter 3. It was anticipated that
 the  -MOSS objective reduction algorithm would signi cantly reduce the complexity
 88
of the well-known multi-objective knapsack problem, depending on the error toler-
 ated in the underlying dominance structure. Please see section 3.1.4 for a thorough
 description of the reasoning behind using the knapsack application to conceptually
 simplify the HRT con guration selection problem, and for a complete explanation
 of the goals and reasoning behind this experiment.
 In brief summary, the multi-objective knapsack test problems allow a thor-
 ough application of the methodology developed in Chapter 3. As seen in Figure
 3.4, an initial population was generated and input to the  -MOSS objective reduc-
 tion algorithm. The resulting reduced objective set (if reduction was possible) was
 used within one of the established solution techniques for a knapsack problem: a
 multi-objective genetic algorithm. Using the full set of objectives yielded the ex-
 perimental control solution set, while using the reduced objective set yielded the
 experimental solution set. Analysis was performed on the two separate solution sets
 to determine how well the experimental method reduced the problem complexity
 while maintaining the important traits of the underlying problem.
 4.2.1 Knapsack Simulation Setup and Initial Results
 Brockho ?s seminal paper [12] used large-scale knapsack problem examples
 to demonstrate the power of the objective reduction algorithms. Some of his test
 problems were used for these simulations. The distinction between Brockho ?s anal-
 ysis of the knapsack problems and the analysis that follows in this chapter, is that
 Brockho analyzed the reduction in the objective function sets for each of his test
 89
problems. In this research, the analysis was taken a step further to analyze the re-
 sulting Pareto solution sets to see how well the reduced sets modeled the underlying
 problem?s dominance structure.
 Three di erent knapsack test problem instances were used in this research
 e ort. All three had 100 possible items for selection to place into the knapsack.
 Each problem instance contained pro t values for these 100 items rated for 5, 10,
 and 15 objectives, respectively. The objectives within a knapsack problem were all
 represented as pro t values. It was desired to have 100, 200, and 300 solutions,
 respectively, in the three problem instances.
 The generic knapsack problem de nition used as the basis of each problem
 instance di ered between this implementation and Brockho ?s. This research used
 the version of the problem as de ned by Zitzler [108], such that each item pro t pi;j
 and each item weight wi;j were randomly chosen integers in the range of (10,100),
 inclusive. The capacity constraint was de ned to be half of the sum of the item
 weights. This version of the knapsack problem was easily resizable for large-scale
 problem applications.
 A script was written to follow the methodology laid out by Brockho . The
 script initially called a MOGA to evaluate a test problem instance and come up with
 an estimated Pareto front after 100 generations (with the corresponding required
 number of solutions in the Pareto front).
 Brockho [12] recommended using  -errors as relative percentages in more
 complex applications, in addition to using scaled objective functions, to remove or-
 der of magnitude over-simpli cation of the problem. This approach was used in the
 90
Figure 4.2: Comparison of the E ect of  -Error on the Resulting Reduced Objective
 Set Size Across Di erent Knapsack Problem Instances. The Numbers in the Table
 Correspond to the Number of Objectives in the  -Minimal Set.
 following experiments. To achieve this goal, the Pareto front for each problem in-
 stance was scaled such that the di erence between the largest and smallest objective
 function value in the population would yield a  -error equal to 1. This scaled Pareto
 front information was input to the  -MOSS objective reduction algorithm.
 Four di erent relative  -error values were used: 0%, 10%, 20%, and 40% (the
 same as in [12]) to observe the e ect of allowing errors in the underlying dominance
 relations on the resulting reduced objective set size and the quality of the resulting
 Pareto set. The results of the objective set size are tabulated in Figure 4.2.
 The algorithm performed as expected. Due to the complex nature of these
 problems, a reduction in the objective function set size was not possible in the
 5-objective and 15-objective problem instances without introducing error into the
 underlying dominance structure of the test problem. In the 5-objective test problem,
 a 40% error had to be introduced before reduction of the problem was possible.
 91
Depending on the application of the objective reduction method, this level of error
 would be intolerable. This demonstrates one of the known limitations of the  -MOSS
 algorithm - its ability to reduce the complexity of a problem is highly problem-
 dependent.
 The 25-objective knapsack test problems demonstrated that they were much
 more conducive to objective reduction. With an increasing  -error, fewer objectives
 were required to de ne the underlying dominance structure of each problem instance.
 This demonstrated similar performance of the  -MOSS algorithm on the knapsack
 problem to that described in [12]. The value of this algorithm?s performance should
 not be underestimated. Examination of Figure 4.2 reveals that the problem instance
 initially described by 25 objective functions was reduced to 19 objectives if a 10%
 error were acceptable, and down to 2 objectives if a 40% error were tolerable. While
 in practical applications a 40% error would be unlikely to be acceptable, allowing a
 10% error could be entirely possible.
 4.2.2 Solving the Knapsack Test Problems
 Once the objective reduction algorithm had reduced the objective function
 set for each of the four error tolerances, each of the  ve test problems was input
 into the MOGA to solve the problem instance. Two di erent types of cases were
 computed here. In the control case, the full-size knapsack problems were run through
 the MOGA again. In the second case, the MOGA was run on each of the knapsack
 problems but with the four reduced objective sets found from the  -MOSS algorithm.
 92
All four  values were run through the MOGA for each of the three test problems.
 For both of these cases (control and experiment), the MOGA was set to run 10
 times, keeping the nondominated solutions across all of the runs.
 When running the MOGA for this set of simulations, the built-in MOGA
 control options were altered. The MOGA options used previously had been intended
 to run through 100 generations and then stop the search process. The changes made
 for this set of experiments was to have the MOGA run through its entire search
 process and come up with its best Pareto front for the given problem. The MOGA
 was set to run for 400 generations in each of the problem instances, maintaining
 a population of 200 individuals, until either the maximum generation counter was
 reached or the average spread in the Pareto front was on the order of 10 6 (tolerance
 to determine that the Pareto front has converged).
 4.2.2.1 Results From the 5-Objective Knapsack Problem Instances
 Table 4.1 itemizes the initial results from each of the resulting  nal Pareto
 populations for the 5-objective knapsack test problems. Referring back to Figure
 4.2, the 5-objective knapsack test problem required 5 objectives for all of the  -error
 values except the 40% error test case, when the objective set size was reduced to 3
 objectives.
 It is most illuminating to compare the control column to the  = 40% column
 from Table 4.1. The same maximum number of items was found for both cases.
 The maximum total knapsack pro t was slightly higher in the control case than in
 93
KP 100 Items 5 Objs Control  = 0%  = 10%  = 20%  = 40%
 Problem Instance
 Final population size 791 814 783 805 401
 Minimum # of items 50 51 50 51 51
 Maximum # of items 64 64 64 63 64
 Total pro t maximum 4094 4070 4094 4012 3942
 Pro t minimum 2482 2576 2657 2580 2649
 Average pro t 3416 3421 3405 3465 3461
 Weight maximum 3036 3036 3036 3036 3036
 Weight minimum 2747 2727 2783 2804 2882
 Average weight 2999 2999 3002 3006 3009
 Weight constraint 3036 3036 3036 3036 3036
 Table 4.1: Final Population Data for the 5 Objective Knapsack Problem Instances
 the  = 40% case (3% higher  nal value in the control case), although the reverse
 was true for the minimum total knapsack pro t. Even with including a 40% error
 into the underlying dominance structure, the best solution from the experimental
 MOGA run only di ered from the control best solution by 3%. The experimental
 MOGA?s worst solution (minimum pro t value from the Pareto set) was 6.7% higher
 than the worst solution from the control case. From this  rst Pareto set analysis,
 it appeared that the experimental solution set well represented the control solution
 set even though it had been derived from a greatly reduced complexity problem
 description.
 It is intriguing to note that all  ve test problems had (approximately) the
 94
same minimum and maximum number of items packed in the knapsack, and that
 they all found a way to pack the knapsack to reach the maximum weight constraint.
 The constraint handling algorithm appears to be working well.
 While the data in Table 4.1 contains useful information about the resulting
 populations from each problem instance, it would also be helpful to analyze the
 spread and dispersion of the data points. Figure 4.3 displays the spread of solution
 set pro t values for each of the  ve test problems as a boxplot. Each of the objective
 functions are displayed separately in the  gure to aid in assessing how well the
 MOGA run optimized each individually. The x-axis for each of the subplots has
 been labeled with the objective function number that the data corresponds with. It
 should be noted that the ordering of the objectives in each of these plots corresponds
 with each objective function?s selection by the  -MOSS algorithm as being crucial
 to preserving the underlying dominance structure (left to right represents most to
 least important).
 There are several points of interest that can be gleaned from Figure 4.3. For
 the  -error values of 0%, 10%, and 20%, the algorithm determined that all  ve
 objectives were required to maintain the dominance structure. In other words, even
 with a 20% error allowance, removing an objective would change the dominance
 structure of the test problem by more than 20%. Additionally, for each of these
 three error values the  -MOSS algorithm ordered the objective functions the same.
 Even with an increasing error tolerance up to 20%, the amount that each objective
 function contributed to the dominance structure remained the same.
 This entirely changed with the  = 40% test problem instance. In this case,
 95
Figure 4.3: Final Pareto Population Data Resulting from Full MOGA Run for Each
 Knapsack Problem Instance with 5 Objectives
 96
Figure 4.4: Termination Reason and Run Time for each Knapsack Problem Instance
 with 5 Objectives
 the  rst objective was deemed most in uential to the dominance structure, and was
 selected  rst. Allowing the large error removed the second and third objectives from
 consideration.
 There are several low-value outliers for each of the objectives in all of the test
 cases. This means that not all of the solutions to the knapsack problem have pseudo-
 optimized solutions even after the MOGA options had been altered to facilitate
 convergence. This indicated that analysis was needed to assess the reason that each
 test problem?s MOGA runs was terminated. Figure 4.4 concisely displays this data.
 At the end of every MOGA run, the reason for the termination of that run
 had been stored. It had been desired that each run would only terminate when the
 97
tolerance of the average change in the Pareto spread was reached, indicating that
 the solution set had converged to a  nal set. As can be seen in Figure 4.4, this was
 the case for the majority of the MOGA runs across all  ve test problems.
 However, a second termination reason turned up in all but the  = 40% case
 that indicated that not all of the MOGA runs reached convergence (the upper
 portions of the bar plot). When the pre-speci ed maximum number of generations
 had been reached (400 in this case), the MOGA stopped searching for solution
 convergence and ended the run with the current population. For both the control
 and the  = 0% case, this occurred once in the ten stochastic runs. In the  = 10%
 case, two out of the ten MOGA runs failed to converge. In the  = 20% case, four
 out of the ten MOGA runs failed to converge. All ten stochastic runs of the  = 40%
 test problem reached convergence. This is the anticipated result - with the problem
 complexity reduced from  ve objectives down to three objectives, it was easier for
 the MOGA to reach convergence.
 The convergence results from the  rst four cases seem counter-intuitive. It
 would be anticipated that the larger the set of objective functions, the more di cult
 it would be for the MOGA to reach convergence and the more likely the runs would
 be to hit the maximum number of generations  rst. In this 5-objective test problem,
 however, each of the  rst four test problems had the same number of objectives being
 tested.
 The reason for this divergence in termination reason can be thought of as the
 probability of  ipping heads or tails with a coin. Each  ip has a 50% chance of
 turning up heads, yet it is entirely possible to  ip the coin 5 or 10 times and get
 98
tails every time. The small sampling of the dataset creates the appearance that tails
 has a greater than 50% chance of turning up. However, if a much larger dataset
 were run (if the coin was  ipped 100 times), it becomes much more likely that tails
 will turn up only 50% of the time.
 Further analysis was run on the 5-objective knapsack test problems to deter-
 mine whether the transient probability behavior was exhibited in the convergence
 success of the MOGA. Figure 4.5 displays the results from this analysis. Instead of
 running the MOGA 10 times, the MOGA was run a total of 30 times, collating the
 non-dominated solutions from each.
 In the control case and the  = 0% test case in Figure 4.5, it can be seen
 that Pareto front convergence was reached 66% of the time. A full third of the
 stochastic 30 runs failed to reach Pareto convergence. The other two test cases with
 5 objectives (10% and 20%  -error) had slightly better average convergence success,
 but still not as high as expected. In fact, this convergence rate was worse than that
 exhibited with the 10 MOGA runs.
 It is also concerning that this convergence rate was achieved with the 5-
 objective knapsack problems because these were the least complex of the problems
 to be analyzed. This convergence failure can be attributed to two di erent factors.
 On the one hand, the objectives for the knapsack problem were based on randomly
 selected integer values. Not only were the distinct objectives not seeking concurrent
 goals (nor, for that matter, contradictory goals), there might not have been any or-
 der within them whatsoever. Additionally, a known challenge of MaOO is reaching
 solution convergence. There are just so many contradictory objectives and variables
 99
Figure 4.5: Termination Reason and Run Time for the 5-Objective Knapsack Cases,
 with 30 MOGA Runs
 that convergence can become di cult. This convergence rate was noted for further
 comparison with the more complex test problems.
 The second plot displayed in Figure 4.4 indicates the run time required for the
 two di erent components of this experiment: the run time of the greedy  -MOSS
 objective reduction algorithm and the total run time of the 10 stochastic MOGA
 iterations.
 It would be anticipated that the run time of the MOGA with the entire ob-
 jective function set would be considerably longer than the time required to run the
 reduced objective function set, and that the resulting Pareto fronts would not be
 signi cantly di erent. This does not appear to be the case. The run time for the
 100
MOGA stayed roughly the same for the  = 0% and 10% cases (this was the antic-
 ipated result because there was no di erence in the number of objectives between
 these two cases and the control), but the  = 20% case (which also had  ve objec-
 tives) took twice as long as the control. This re ects the fact that roughly 25% of the
 MOGA runs in the  = 20% test case did not converge. Instead, these MOGA runs
 churned until the maximum number of generations was reached, greatly increasing
 the duration of those MOGA runs.
 For the 5-objective problem instances, the run time for the greedy objective
 reduction algorithm increased slightly as the error-tolerance was increased. A reason
 for this is that with the error tolerance, more of the dominance relations overlapped
 resulting in a larger number of combinations to be rechecked.
 4.2.2.2 Results from the 15-Objective Knapsack Problem Instances
 The results for the 15-objective knapsack problem followed similar patterns
 to those obtained from the 5-objective function knapsack test problems. Table 4.2
 itemizes the initial results for the 15-objective knapsack test problems. Referring
 back to Figure 4.2, the 15-objective knapsack test problems required all 15 objectives
 to maintain the underlying dominance structure of the problem if no  -error was
 allowed. If a 10% error in the dominance structure was feasible, one objective
 function could be removed from the objective function set. All of the other 14
 objectives were required to maintain problem detail.
 Allowing a 20%  -error in the dominance structure of the 15-objective knapsack
 101
test problem allowed a reduction in objective set size to 10 objectives, signi cantly
 reducing the problem complexity. As can be seen in Table 4.2, introducing a 40%
  -error reduced the objective set to 4 objectives. It is possible that this substantial
 reduction of the size of the objective set would cause considerable alteration to
 problem detail. A detailed examination of the solutions in the Pareto front for this
 test problem (as seen in Figure 4.6 and discussed below) revealed that this was not
 the case.
 For the knapsack problem instances with 15 objectives, it is illuminating to
 compare across all  ve of the test problems for analysis of the initial results. Com-
 paring the results tabulated in Table 4.2, several trends become apparent. All  ve
 test problems found a maximum total pro t value within 2% of each other. In ad-
 dition, reducing the objective set size resulted in an increase in the average solution
 pro t value. This would lead to the conclusion that even the 40%  -error Pareto
 set provided a good representation of the original 15-objective knapsack problem.
 From this initial Pareto set analysis, it appeared that the all of the experimental
 solution sets well represented the control solution set even though they had been
 derived from a greatly reduced complexity problem description.
 While the data in Table 4.2 contains useful information about the resulting
 populations from each problem instance, it would also be helpful to analyze the
 spread and dispersion of the data points. Figure 4.6 displays the spread of solution
 set pro t values for each of the  ve test problems in the 15-objective knapsack prob-
 lem instance as a boxplot. The x-axis for each of the subplots has been labeled with
 the objective function number that the data corresponds with, and rearranged to re-
 102
KP 100 Items 15 Objs Control  = 0%  = 10%  = 20%  = 40%
 Problem Instance
 Final population size 857 924 900 826 535
 Minimum # of items 49 49 49 49 51
 Maximum # of items 64 65 65 65 65
 Total pro t maximum 4010 4077 4051 4097 4086
 Pro t minimum 2289 2223 2098 2400 2442
 Average pro t 3350 3365 3362 3369 3398
 Weight maximum 2685 2685 2685 2685 2685
 Weight minimum 2431 2485 2480 2451 2448
 Average weight 2650 2652 2651 2655 2656
 Weight constraint 2685 2685 2685 2685 2685
 Table 4.2: Final Population Data for the 15-Objective Knapsack Problem Instances
 103
Figure 4.6: Final Pareto Population Data Resulting from Full MOGA Run for Each
 Knapsack Problem Instance with 15 Objectives
  ect the order of importance of each objective function in preserving the underlying
 dominance structure of the 15-objective problem.
 There are several interesting trends to note from these boxplots. As had been
 the case with the 5-objective knapsack problem instance, the importance ordering of
 the objective functions resulting from the  -MOSS algorithm changed signi cantly
 depending on the level of error allowed within the underlying dominance structure.
 In this case, the 10th objective was rated to be least important by the  -MOSS
 algorithm when restricted to 0% error, yet this same objective jumped to most
 important once an error was introduced (in the 10%, 20% and 40% cases). This is
 the most extreme example of the objective function reordering. It provides insight
 104
into how dependent the selection of objective functions was to the error value.
 Another trend that was apparent in the 5-objective knapsack problem instance
 and is evident here is that there were several low-value outliers for each of the
 objectives in all of the test cases. This means that not all of the solutions found for
 the knapsack problem were pseudo-optimized. In other words, it is likely that these
 outliers were not on the Pareto front and, with further generations of the MOGA,
 could have been improved.
 Figure 4.7 displays that the MOGA toolbox struggled to converge the 15-
 objective knapsack test problem instance. In the control case, less than half of the
 stochastic 10 runs through the MOGA reached convergence - instead, those runs
 were automatically stopped after 400 generations had failed to converge. It would
 be intuitive that with the decreased objective set size of the other cases in this
 problem instance, that Matlab?s MOGA would have a higher convergence success
 rate. The  = 10% test case had one fewer objective function, and had a 20% higher
 success rate at reaching convergence.
 This trend, however, was not seen in the  = 20% test case. With 5 fewer
 objectives than the control, it would be intuitive to assume that it would have a
 greater convergence success. As seen in Figure 4.7, this was not the case. Only 2
 out of the 10 stochastic MOGA runs converged.
 The second plot in Figure 4.7 collated the run time for the 5 test cases for
 both the  -MOSS algorithm and the run time for 10 MOGA runs with the reduced
 objective function sets. It is interesting to note that while the MOGA run time for
 the control,  = 0%, and  = 10% cases remained relatively constant, there was
 105
both an increase in run time for the  = 20% test case and a substantial decrease
 in run time for the  = 40% case, the former an initially non-intuitive result and
 the latter more aligned with predictions. The reason for the longer run time in the
  = 20% test case was the low convergence rate of the MOGA runs. Rather than
 reaching convergence and moving on to the next run, each of the 10 MOGA runs
 iterated through the full 400 generations.
 Another feature worth noting from the time plot in Figure 4.7 was that the run
 time for the  -MOSS algorithm appeared to increase with each subsequent increase
 in the allowed error in the dominance structure. The  -MOSS algorithm performs
 a pairwise comparison of each of the Pareto solutions across all of the objective
 functions. With an increased error tolerance, more of the pairwise relations could
 move in either direction - the dominance relations became much more  uid. This
 required a substantial increase in the number of pairs that needed to be compared.
 4.2.2.3 Results from the 25-Objective Knapsack Problem Instances
 The results for the 25-objective knapsack problem follow similar patterns to
 those obtained from the 5 and 15-objective function knapsack test problems. Table
 4.3 itemizes the initial results for the 25 objective knapsack test problem. Referring
 back to Figure 4.2, the  -MOSS algorithm indicated that one of the objectives
 from the 25-objective knapsack test problem could be removed with 0% error in
 the underlying dominance structure of the problem. In other words, one of the
 25 objectives was fully redundant and provided no additional information to the
 106
Figure 4.7: Run Time and Termination Reason for each Knapsack Problem Instance
 with 15 Objectives
 problem. If a 10% error in the dominance structure were feasible,  ve objective
 functions could be removed from the objective function set - a substantial reduction
 in problem complexity facilitated by a small induced error.
 Allowing a 20%  -error in the dominance structure of the 25-objective knap-
 sack test problem resulted in reducing the objective set size by more than half. In
 other words, allowing a 20% error e ectively turned 15 of the objectives into redun-
 dant objectives that provided no new information about the underlying problem.
 Introducing an error of 40% reduced the problem to a two objective test case, e ec-
 tively making 23 of the objectives contain wholly redundant information about the
 dominance structure.
 The signi cant reduction in problem complexity caused by allowing a 20%
 and a 40%  -error should lead to questions regarding the validity of the resulting
 107
experimental Pareto solution sets. These questions will be answered with Figure 4.8
 and its discussion.
 For an initial analysis of the quality of the solution sets from each of the  ve
 test cases for the 25-objective knapsack problem, it is illuminating to compare the
 Pareto set results tabulated in Table 4.3.
 The di erence between the maximum total pro t values for the 25-objective
 knapsack problem instances was twice what it had been in the 5 and 15-objective
 knapsack problem instances: a di erence of 6% between the maximum pro t derived
 from the control solution set and that from the 40% error solution set. However,
 the reduced objective set size for the 40% test case was already a cause for concern
 about its validity in replicating the original test problem. When the 40% error
 test case was removed from consideration, the percentage di erence of the maxima
 pro t values was within 3% of each other, the same trend observed with the 5 and
 15-objective knapsack problem instances. In reducing the objective set down to two
 objectives in the 40% error test case, it appeared that the ability of the MOGA to
 reach the maximum pro t value of the control problem was removed.
 The trends for the average solution pro t value are more intriguing. Reducing
 the objective set size by allowing a 10% error raised the average solution pro t by
 1%. Increasing that error to 40% raised the average solution pro t by 8%. This
 would lead to the conclusion that even though incorporating a 40% error value into
 the dominance structures of the 25-objective knapsack problem resulted in a lower
 maximum pro t value for the solution set, the average pro t value of the solution
 set had increased by 8%. In other words, the 40% error solution set did not  nd the
 108
KP 100 Items 25 Objs Control  = 0%  = 10%  = 20%  = 40%
 Problem Instance
 Final population size 935 956 943 819 820
 Minimum # of items 50 47 50 51 56
 Maximum # of items 66 66 66 66 64
 Total pro t maximum 4165 4138 4110 4036 3920
 Pro t minimum 2214 2142 2309 2392 3026
 Average pro t 3351 3333 3390 3433 3614
 Weight maximum 2766 2766 2766 2766 2766
 Weight minimum 2414 2483 2543 2535 2624
 Average weight 2728 2731 2732 2740 2739
 Weight constraint 2766 2766 2766 2766 2766
 Table 4.3: Final Population Data for the 25-Objective Knapsack Problem Instances
 extrema point that had been found by the other test cases, but it appears that the
 average overall solution had a higher pro t. With the signi cantly reduced problem
 complexity represented by the reduced objective set of the 40% error test case, it
 could be argued that this Pareto set provided a good representation of the original
 25-objective knapsack problem.
 While the data in Table 4.3 contains useful information about the resulting
 populations from each problem instance, it would also be helpful to analyze the
 spread and distribution of the data points, as had been done in the previous knapsack
 problem instances. Figure 4.8 displays the spread of solution set pro t values for
 each of the  ve test problems as a boxplot. The x-axis for each of the subplots has
 109
Figure 4.8: Final Pareto Population Data Resulting from Full MOGA Run for Each
 Knapsack Problem Instance with 25 Objectives
 been labeled with the objective function number that the data corresponds with, and
 rearranged to re ect the order of importance of each objective function in preserving
 the underlying dominance structure of the 25-objective problem.
 There are several interesting trends to note from the boxplots in Figure 4.8.
 As had been the case with the two previous knapsack problem instances (5 and 15
 objective), the importance ordering of the objective functions resulting from the  -
 MOSS algorithm changed signi cantly depending on the level of error allowed within
 the underlying dominance structure. In this case, there appeared to be much more
  uidity in the importance of each objective depending on the error level. The 14th
 objective was deemed least important in the  = 0% test case, jumped to the most
 important in the  = 10% test case, was found to be wholly redundant in the  =
 20% test case and was not even present in the reduced objective set, yet the same
 110
objective jumped back to the most important slot in the  = 40% test case.
 The same  uidity of objective function importance can be seen with the second
 objective function. It was deemed wholly redundant and was not present in the
 reduced objective sets of the  = 0%, 10%, and 20% test cases, yet was the second
 most important objective in the  = 40% test case. This was the most extreme
 example of the objective function reordering in the 25 objective knapsack problem
 instance. Combined with the transience of the 14th objective, it provides insight
 into how dependent the selection of objective functions was to the error value and
 to the nature of the underlying problem.
 As with the two previous knapsack problem instances (5 and 15 objective),
 there were several low-value outliers for each of the objectives in all but the last of
 the test cases. This is unsurprising in the 25-objective test case. As with the other
 problem instances, this means that not all of the solutions found for the knapsack
 problem were pseudo-optimized. In other words, the MOGA runs did not reach full
 convergence.
 Figure 4.9 displays that the MOGA toolbox struggled to converge the 25-
 objective knapsack test problem instance, although it appeared to have more success
 than with the 15-objective test cases. In the control case, 7 out of the 10 stochastic
 MOGA runs converged to a Pareto front. This was a very surprising result, consid-
 ering how the MOGA struggled with the 15 objective. It can be assumed that the
 objectives in the 25-objective test problem were less in con ict than the 15-objective
 test case. The two were entirely separate problem instances with no overlap, both
 highly dependent on randomized objective values.
 111
Equally surprising was the convergence failure of the  = 20% test case. Zero
 of the 10 stochastic MOGA runs reached convergence. All ran the full 400 possible
 generations without converging. This fact was also evident in the time plot from
 Figure 4.9 for the 20% case. The MOGA run time was greater for this test case
 than for any of the other test cases. It appears that the MOGA runs failed to  nd
 the extrema points for this test case, although the average solution values closely
 resembled those found in the other problem test cases.
 In the 25-objective knapsack problem instance, the contrast between MOGA
 run time and  -MOSS algorithm run time became glaringly obvious. For the greater
 complexity problem, the  -MOSS algorithm took on average twice the stochastic
 MOGA run time (even the control MOGA case with 25 objectives) to reduce the
 objective set. If computation time is a limiting factor for a set of analysis, it appears
 that running the methodology proposed in this dissertation research (the  -MOSS
 algorithm before the MOGA) is not only not bene cial, but costly.
 It should be remembered, however, that the decision space that a designer has
 at the end of the methodology had signi cantly reduced complexity, facilitating the
 designer?s selection of a  nal candidate solution. The following section applies a
 Pareto set quality metric to determine which of the 5 test cases for each of the 3
 knapsack test problem instances resulted in a better decision space for the designer.
 112
Figure 4.9: Run Time and Termination Reason for each Knapsack Problem Instance
 with 25 Objectives
 4.2.3 Hypervolume Quality Metric Results for the Knapsack Problem
 Figure 4.10 shows the results of applying the hypervolume indicator to the
 Pareto set resulting from the 5-objective knapsack problem instance. The require-
 ment of the Pareto sets being compared to have the same number of objectives
 resulted in assessing the control Pareto set only across the objectives it had in com-
 mon with each of the reduced objective sets. The four test cases each have four
 di erent hypervolume quality metric values displayed.
 Examining  rst the S metric, the absolute Pareto volume coverage was cal-
 culated to be similar for the  = 0%, 10% and 20% cases. Any discrepancy in the
 values for these cases suggests that the ordering of the objectives themselves when
 input into the MOGA aided Pareto convergence. This was the anticipated result
 113
Figure 4.10: Hypervolume Quality Metric Calculations for Knapsack Problem In-
 stance with 5 Objectives)
 because in the 5-objective knapsack problem instance, the only test case that had a
 reduced objective set was the 40% error case. In the  = 40% test case, the S metric
 value was substantially higher than the control?s. This indicates that the reduced
 objective set?s Pareto set had better absolute coverage of the design space than the
 control Pareto set did.
 Examination of the D metric clearly highlights the advantage of this disserta-
 tion?s methodology in  nding a better design space representation for the mission
 designer. In the 40%  -error test case (the only one in this problem instance that
 truly re ects the reduction methodology), the value for D(Control, Delta) was zero.
 In other words, the control Pareto set had zero volume coverage that the  Pareto
 set did not have. This by itself states that the Pareto set resulting from the reduced
 114
Figure 4.11: Hypervolume Quality Metric Calculations for Knapsack Problem In-
 stance with 15 Objectives)
 objective set had at least an equivalent volume coverage as the control case. Looking
 at the value for D(Delta, Control), however, it becomes apparent that the Pareto
 set resulting from the  = 40% error case actually had 20% better volume coverage
 than the control Pareto.
 The results from these quality metrics reveal that the solution set derived
 from application of this research?s methodology had an overall better demonstrated
 representation of the Pareto front. The diversity of the solutions and the coverage
 of the front itself was better. If computational time is not an issue in a designer?s a
 priori analysis, implementation of this research?s methodology resulted in an across-
 the-board improvement in the  nal solutions to the 5-objective knapsack problem
 instance.
 115
Figure 4.11 displays the hypervolume indicator quality metric results for the
 15-objective knapsack problem instance. Recall that the reduced objective set sizes
 for the four test cases were 15, 14, 10, and 4, respectively. The  = 40% test case
 is most visible in this Figure. The S metric value from the control Pareto set was
 3% better than that from the  test case. Additionally, the relative volume coverage
 metric, D(control, delta), was approximately 3% better than the D(delta, control)
 value. In other words, in both the S and D metric, the control Pareto set had better
 coverage of solutions than the  case.
 This result should be put in perspective. The reduced objective set for the  =
 40% test case (which had 11 fewer objective functions than the control case) resulted
 in a Pareto solution set that was only 3% worse according to the quality metrics.
 That result is has great consequence. A drastic decrease in problem complexity
 was a orded at only a 3% decrease in solution quality. Considering the ease with
 which a mission designer could examine the  ?s 4 objective reduced objective set and
 its Pareto solutions, it appears that, for this test case, this research?s methodology
 proved to be very bene cial.
 Figure 4.12 displays the same data as Figure 4.11 except without the  = 40%
 test case to allow for more granular detail to be visible. The same trends can be seen
 in this plot. Even though the control Pareto set?s quality metric values for the  =
 20% test case were better than those derived from the  ?s Pareto set, the di erence
 in solution quality was only 2%. Removing 5 objectives and their corresponding
 complexity from the problem description only resulted in a 2% di erence in solution
 quality.
 116
Figure 4.12: Enlarged Figure of Hypervolume Quality Metric Calculations for Knap-
 sack Problem Instance with 15 Objectives)
 The quality metric values shown in Figure 4.13 contain the results from the
 25-objective knapsack problem instance. Recall that the reduced objective set sizes
 for the 25-objective knapsack problem instance for the  = 0%, 10%, 20%, and 40%
 test cases were 24, 19, 9, and 2, respectively. The data in Figure 4.13 is even more
 persuasive for the utility of this research?s methodology. The quality of the  = 20%
 Pareto solution set is approximately equivalent to the quality of the control solution
 set even though it represented a decrease in problem complexity of reducing the 25-
 objective control problem down to 9 objectives. To decrease complexity by nearly
 70% and still maintain the same solution quality proves the merit of this research?s
 methodology.
 The  = 40% test case contains an even larger data contrast. The Pareto
 117
Figure 4.13: Hypervolume Quality Metric Calculations for Knapsack Problem In-
 stance with 25 Objectives)
 solution set resulting from the reduced objective set had nearly 30% better quality
 than the control! This suggests that the MOGA was unable to converge the two ob-
 jectives represented in this control data set because it had so many other con icting
 objectives to be concerned with. Using the reduced objective set, however, resulted
 in a much more optimized Pareto set.
 For completeness, Figure 4.14 contains a close-up view of the  = 0% and 10%
 test cases for the 25 objective knapsack problem instance. Note the y-axis scale in
 this  gure: 10 3. The Pareto solution sets resulting from these error tolerances were
 approximately equivalent in quality to the control for each test case.
 118
Figure 4.14: Enlarged Figure of Hypervolume Quality Metric Calculations for Knap-
 sack Problem Instance with 25 Objectives)
 4.3 Summary
 Using the conceptually simpler knapsack problem domain, the utility of this
 dissertation?s methodology has been demonstrated. The methodology does not
 prove its worth across the board. One very clear disadvantage to the methodol-
 ogy is the (potentially substantial) increase in computation time required to reach a
 Pareto solution for a given design problem. As seen from the 25-objective knapsack
 problem instance, if computation time is a limiting factor for a set of analysis, us-
 ing this dissertation?s proposed methodology would be a hindrance to the analysis
 e ort.
 However, if computation time is not an issue, the merit of this research?s pro-
 119
posed methodology has been clearly demonstrated. If a mission designer is not
 concerned with producing results as fast as possible, the Pareto solution sets result-
 ing from this methodology have been demonstrated to have been at least equivalent,
 if not better, than the control case for the knapsack problem application.
 Recall the reduced objective set for the  = 40% test case from the 15-objective
 knapsack problem instance (which had 11 fewer objective functions than the control
 case) resulted in a Pareto solution set that was only 3% worse according to the
 quality metrics. A drastic decrease in problem complexity was a orded at only a
 3% decrease in solution quality. Considering the ease with which a mission designer
 could examine the 4-objective reduced set, this research?s methodology proved to
 be very bene cial.
 The results from these quality metrics reveal that the solution set derived from
 application of this research?s methodology had an overall better demonstrated rep-
 resentation of the Pareto front. With the demonstrated utility of the methodology
 in hand, the subsequent chapters will proceed to apply this methodology to  rst a
 simple robotic scenario (Chapter 5) and  nally to a large demonstration HRT co-
 operative team con guration selection scenario (Chapter 6). It is anticipated that
 the methodology will continue to prove its utility, though the role that the di erent
 application domain will play in the level of di erentiation of the results is yet to be
 seen.
 120
Chapter 5
 Case Study: Robotic Reconnaissance Rovers
 This experiment sought to analyze the performance analysis schemes utilized
 by several researchers to compare and contrast di erent candidate systems in an
 overall performance analysis. Speci cally, the work by Tunstel [96] and Schreck-
 enghost [85] were analyzed in great detail for this experiment. Through this analysis,
 the weaknesses and drawbacks of these methods were identi ed. By comparing four
 di erent robotic reconnaissance rovers, it was demonstrated that the state-of-the-
 art is insu cient to produce a meaningful, quantitative, comprehensive performance
 analysis. This dissertation?s performance analysis methodology was applied to this
 robotic reconnaissance rover case study to observe how it a ected the design space
 and the resulting solutions.
 The idea of utilizing a composite task score to aggregate performance metrics
 was  rst proposed in the HRI realm by Rodriguez [74]. This topic was discussed
 in detail in section 2.2.4. Rodriguez?s highly cited work proposed using relative
 measures for each of the performance metrics to allow comparison between metrics
 of di erent scales and units, and combining each of these ratios in a linear uniform
 summation to create an overall performance score.
 The works by Tunstel [96] and Schreckenghost [85] were analyzed in great de-
 tail for this experiment. Tunstel?s paper used Rodriguez?s approach to a composite
 121
task score to assess overall operational performance of the Mars Exploration Rovers
 (MERs) Spirit and Opportunity during a speci ed window of activity. Schreck-
 enghost?s work focused on reporting the results of Earth-analogue  eld testing of
 the K10 mobile robotic systems designed for robotic reconnaissance work. Each of
 these works will be brie y summarized in the sections to follow.
 It was the stated goal of both of these works to demonstrate that performing
 reconnaissance tasks with robotic technology would be helpful for future space mis-
 sions. Ideally, it should be possible to compare the performance of both the MERs
 and the K10 rovers to assess their relative performance and success at a variety of
 tasks. However, the metrics used in the two papers were vastly di erent and do
 not overlap. This is an example of a major short-coming of the reporting of per-
 formance metrics from the HRT realm. The data provided in the papers, however,
 was su cient to allow relative comparison in this case study and to facilitate the
 application of this dissertation?s methodology.
 5.1 Summary of Tunstel?s Method and Results
 Tunstel [96] applied Rodriguez?s [74] relative performance ratios and composite
 task score method to assess the operational performance of the Mars Exploration
 Rovers (MERs) Spirit and Opportunity. He de ned  ve performance metrics for
 use in his analysis: total autonomous traverse distance, terrain-based autonomous
 navigation speed, approachability, instrument placement position accuracy, and in-
 strument placement repeatability. Tunstel de ned approachability as the distance
 122
traveled (dapp) and time (sols) it took to reach a speci ed number of targets (N ), or
 Approachability(m=sol) =
 1
 N
 NX
 i=1
 dapp
 sols
 : (5.1)
 While Tunstel?s  rst three metrics are maximization metrics (higher value
 is better), the last two metrics are minimization metrics (lower value is better).
 To accommodate for this, Tunstel changed Rodriguez?s performance ratio formula
 such that for maximization metrics it would still be calculated as performance
 score/reference score, but for minimization metrics the ratio would be inverted:
 reference score/performance score.
 Rather than using a strict linear summation of the performance ratios, Ro-
 driguez and Tunstel both preferred to use a base-2 logarithm summation of the
 performance ratios represented as
 Score(r) =
 1
 2
 flog2[P
 2(m1; r)] + log2[P
 2(m2; r)] + +log2[P
 2(mN ; r)]g (5.2)
 The resulting value from the base-2 logarithm is in bits, allowing the score to be
 read as \rover A performed  ve times better than required".
 As can be seen in equation 5.2, each performance metric in Rodriguez?s formula
 is represented as a ratio of a system?s score to the reference score in that metric. The
 reference system used in Tunstel?s calculations were the MER?s operational require-
 ments. For the MERs, these requirements were an autonomous traverse distance
 of 1000 meters, an autonomous navigation speed of 37.45 meters/hour, approach-
 ability of 2 meters/sol (where a sol is a Martian day), an instrument placement
 123
Figure 5.1: Performance Metrics, Performance Ratios, and Overall Scores for Mars
 Exploration Rovers Spirit and Opportunity. Data reproduced from [96].
 position accuracy of 10 mm, and an instrument placement repeatability of 4 mm
 [96]. The results from each of the  ve metrics and the  nal score in bits as derived
 from Tunstel?s research is reproduced in Figure 5.1 for both Spirit and Opportunity.
 Tunstel acknowledged a shortcoming of Rodriguez?s method was the equal
 weighting of the performance metrics, which presumes equal importance of the met-
 rics in the performance analysis. In a  rst attempt to re ect a di erence in mission
 priorities between the metrics and between the MERs, Tunstel used weightings to
 specify the di erent mission characteristics of Spirit and Opportunity. These weights
 (shown in Figure 5.2) were used to re ect the di erent environments in which the
 rovers were placed, and how this a ected the overall mission objectives and de-
 sirable performance characteristics. Tunstel did not provide justi cation for these
 scalar values. This was an arbitrary assignment of weightings. Using the weightings,
 124
Figure 5.2: Weightings for each of the Performance Metrics and Weighted Scores
 for Mars Exploration Rovers Spirit and Opportunity. Data reproduced from [96].
 Tunstel claimed that although Opportunity received only a slightly higher overall
 score in the unweighted case, Opportunity should have received a signi cantly higher
 overall score, as seen in Figure 5.2.
 Tunstel provided additional metrics in [96] that he did not use in his perfor-
 mance analysis of the MERs. Tunstel did not provide justi cation for the selection
 of some metrics and not others. The  ve metrics described but not used in Tunstel?s
 analysis are summarized here for consideration in section 5.3. Tunstel de ned au-
 tonomous traverse speed ratio to be the ratio of the average autonomous navigation
 speed to the maximum autonomous navigation speed for a given system. In other
 words, this metric would create a scaled value of autonomous navigation speed that
 re ects the relative performance of a vehicle with reference to its own capabilities
 125
rather than to an external reference.
 Navigation step time was de ned to be the time required for the autonomous
 components to perceive the local terrain, detect hazards, and drive 35 cm along
 a chosen path. Percent autonomous traverse would provide a metric to gage how
 much of a mission was performed autonomously. Mean self-localizations per sol
 represents the number of position updates of the rover in each Martian day, and
 mobile manipulability is the ratio of robotic arm workspace volume to the mobile
 platform volume.
 5.2 Summary of Schreckenghost?s Method and Results
 Schreckenghost [85] summarized the  ndings of  eld trials of the NASA Ames
 K10 rovers performed in 2008 at Black Point Lava Flow, AZ. One of the limit-
 ing factors to including robotics in space operations to date has been the need to
 provide custom technology for each mission. The K10 rovers were intended as a
 demonstration of commercial o -the-shelf technology, to provide an example of the
 fact that a rover not speci cally designed for the mission could still be very useful
 in surface mapping and other reconnaissance activities. The K10 rovers are 80 kg,
 4-wheel drive rovers with all wheel steering and a passive rocker suspension, capable
 of carrying an additional 15 kg of payload each. They are capable of driving up to
 human walking speeds (90 cm/s) [30].
 In [85], the  ndings from the Black Point Lava Flow  eld trials were averaged
 for each day of operations, such that mission time was represented as a percentage.
 126
Daily productive time was assessed to determine the percentage of each day spent
 performing autonomous operations (average of 34% for each day, and a total of 85%
 of the productive time in each day), and both scheduled and unscheduled manual
 operations. Daily overhead time was the percentage of each day spent out of plan, in
 plan but inactive, or waiting for a plan to start. Daily task time was the percentage
 of each day during which tasks were completed, tasks were failed, or tasks from the
 mission plan were never attempted. This last metric requires a better explanation.
 During the mission, the K10 rover completed 324 of its required 449 tasks. It aborted
 115 tasks before they were started (assessed that it would be unable to complete
 the task). Ten tasks were attempted and then not completed.
 Although it was acknowledged that environmental conditions had a signi cant
 impact on the rover?s performance, these e ects were not quantized in Schreck-
 enghost?s analysis. The rover was given a pass/fail designation for each task at-
 tempted with no individual analysis of how well a task was performed.
 To facilitate comparison with the MERs, more performance data for the K10
 rovers was sought. NASA Ames has used the K10 rovers in several other  eld trials.
 Fong [30] described a three week  eld trial at Haughton Crater in Canada during
 which the research team completed 200 hours of robotic survey operations. Roughly
 10% of this time was conducted autonomously. Two di erent models of the K10
 were used during these trials - a red one, equipped with a Lidar for imaging, and a
 black one equipped with ground penetrating radar for subsurface mapping.
 In the Haughton Crater  eld trials [30], the red K10 performed 9 days of
 operations, traversed a total of 14 km, took 25 Lidar panoramas, and traversed at
 127
speeds of 5, 10, 20, and 40 cm/s. It was assessed that traveling at speeds of 20
 cm/s and 40 cm/s were too fast for the Lidar to take quality images. The black K10
 performed 10 days of operations, traversed a total of 32.2 km, and found evidence
 of wet clay 20 cm below the dry sand.
 Schreckenghost [82] described similar  eld trials conducted in 2008 at Moses
 Lake Sand Dunes, WA. The red K10 performed 28 total hours of operations and the
 black K10 performed 9.5 hours. The red K10 was o cially powered-up (run time)
 for 9.35 hours but only drove for 4.2 hours. The black K10 had a 5.67 hour run time
 and a 0.64 hour drive time. It was stated that the di erence between run time and
 drive time was due to a substantial wait time attributable to both poor lighting and
 a bad communication link. The red K10 had been planned to traverse 2235 meters
 but actually covered 2375 meters. The black K10 had been planned to traverse 776
 meters but drove a total of 838 meters.
 5.3 Using Tunstel?s Method to Compare the Four Rovers
 Both the K10 rovers and the MERs were tasked with robotic reconnaissance
 work and site survey operations. It would be illustrative to have a simple means of
 comparing the performance of the rovers. There was enough information from the
 combined K10  eld trial data to enable application of Tunstel?s performance metrics
 to the K10 rovers. For example, Tunstel had de ned approachability as the number
 of days an approach took divided by the number of sols, summed over each of the
 approaches, divided by the total number of targets reached during that interval.
 128
For the red K10, its 14 km traverse at Haughton Crater was performed in 9 days
 and facilitated 25 Lidar panoramas (or 25 targets being reached). According to the
 formula, approachability = (1/25)*(14km/9days) = 62.2 m/day. The black K10 was
 not equipped with an instrument that was conducive to using the approachability
 performance metric.
 Tunstel?s metrics and analysis system were applied to the K10 rovers to enable
 comparison between them and the MERs. Neither of the K10s had a robotic arm
 that was conducive to using the metrics for instrument placement, so the perfor-
 mance metrics that depended on instrument placement were neglected from this
  rst analysis run. As speci ed in [30], Lidar imaging was only feasible at naviga-
 tion speeds of less than 20 cm/s. An average navigation speed was assumed for
 the K10s to be 10 cm/s, or 360 m/hr. Fong had also speci ed that 10% of the
 traverse was performed autonomously. For the purposes of this analysis, therefore,
 the K10 red rover traveled 1400 m autonomously, and the K10 black traveled 3220
 m autonomously.
 Figure 5.3 shows the resulting performance ratios for each of the four rovers
 under consideration. With the three performance metrics applied to them, the com-
 posite task scores yielded that the K10 red rover performed 8.7 times better than the
 required performance for the MERs, and the K10 black performed 4.9 times better.
 According to these three metrics, all four rovers met their performance require-
 ments. Spirit and Opportunity performed nearly two times better than required. It
 would be misleading, however, to view these numeric results as conclusive. It will
 be demonstrated that the composite task scores are highly sensitive to the metrics
 129
Figure 5.3: Robotic Reconnaissance Rover Performance Comparison. Using Tun-
 stel?s Performance Metrics and Analysis Method to Compare the Operational Per-
 formance of the K10 rovers and the MERs.
 used in their computation.
 It is illustrative to consider a wider range of performance metrics in the com-
 posite task score analysis. Three additional performance metrics were added for this
 analysis. All three of these metrics were introduced and discussed in Tunstel?s paper
 [96], but he did not use them in his quantitative analysis. The  rst additional per-
 formance metric used for this next phase of analysis was longest single sol traverse.
 This metric is self-explanatory.
 The data available for the K10 rovers did not facilitate use of Tunstel?s in-
 strument position accuracy and instrument repeatability performance metrics, but
 Tunstel had described an additional metric to assess how often an instrument was
 used (with the assumption that a higher instrument use rate would be bene cial).
 130
Instrument placement rate was de ned to be the number of instrument placements
 divided by the total number of sols it took to complete them.
 This was the second performance metric added to this analysis. Performance
 values for all four of the rovers needed to be calculated for this metric. K10 red
 performed 25 instrument placements in 9 days, resulting in an instrument place-
 ment rate of 2.778/sol. K10 black did not have an instrument conducive to this
 analysis. Spirit completed 1200 instrument placements in 944 sols while Opportu-
 nity completed the 1200 instrument placements in 893 sols, resulting in instrument
 placement rates of 1.27/sol and 1.34/sol, respectively. For the metrics that did not
 apply to the K10 black rover, zeros were used for its metric values for the following
 calculations. The MER requirement for instrument placement rate was derived from
 the MER requirements for 1200 instrument placements in 900 days [96], resulting
 in a performance requirement of 1.33/sol.
 A third performance metric included in this analysis was autonomous traverse
 speed ratio, de ned as the ratio of the average to the maximum autonomous traverse
 rate. This metric relates the performance of each rover against its own capabilities.
 As stated in [30], the K10 rovers were capable of 90 cm/s maximum traverse rate,
 or 3240 m/hr. Using an average autonomous traverse rate of 10 cm/s or 360 m/hr,
 this resulted in an autonomous traverse speed ratio of 0.111 for both K10 rovers.
 According to Tunstel, Spirit?s and Opportunity?s average autonomous traverse rates
 were 15.06 m/hr and 22.09 m/hr, respectively, and their maximum autonomous tra-
 verse rates were 34.35 m/hr and 36 m/hr, respectively. This yielded an autonomous
 traverse speed ratio for Spirit and Opportunity of 0.438 and 0.614, respectively. The
 131
MER requirement for an average autonomous traverse rate of 35 m/hr and a max-
 imum autonomous traverse rate of 37.45 m/hr resulted in a required autonomous
 traverse speed ratio of 0.935.
 The other metrics suggested by Tunstel were not used in this analysis because
 there was insu cient data to well-represent the four candidate rovers. Navigation
 step time and mean self-localizations per sol required information about processing
 time and position updates of the autonomous software on-board each rover. Insuf-
  cient data was available to accurately describe the mobile manipulability for the
 MERs and the K10s. Percent autonomous traverse was not included in this analysis
 because it would be comparing apples to oranges - in their  eld trials, the mission
 plans for the K10 rovers did not use them as entirely autonomous rovers. They had
 periods of autonomous traverse, but portions of their mission schedule were intended
 to be teleoperated. Although all portions of the MER traversals were conducted au-
 tonomously, there were large intervals of wait time while a new set of commands
 were being uploaded. Comparing the percentage of the K10 missions conducted
 autonomously to the MERs? would misrepresent their intended missions.
 The results from the expanded performance metric analysis of the four can-
 didate rovers were tabulated in Figure 5.4. Examining the overall composite task
 scores for the four rovers, the K10 rovers continued to out-perform the MERs.
 According to this analysis, the red K10 rover proved 10.6 times more capable than
 required by the MER requirements. Surprisingly, the black K10 rover proved 3 times
 more capable than required even though it failed to score in two of the performance
 metrics. This suggests a  aw in the analysis method - if a rover was incapable of
 132
Figure 5.4: Robotic Reconnaissance Rover Performance Comparison. Tunstel?s Ex-
 panded Performance Metrics Applied to the K10 Rovers and the MERs.
 meeting a performance requirement, a penalty should have been applied. It should
 not have been mathematically possible for this rover to achieve an overall score that
 was better than required.
 It is interesting to note that neither of the MERs met their overall required
 performance composite score as calculated in Figure 5.4. Although both MERs
 performed well in total autonomous traverse distance and approachability, they did
 not meet the required performance benchmarks for navigation speed, instrument
 placement rate, longest single sol traverse, and autonomous traverse speed ratio.
 The result of not meeting these metrics was evident in their composite task score
 values. The negative score represents the fact that, according to this set of metrics,
 Spirit and Opportunity performed nearly two times worse than required.
 133
Figure 5.5: Robotic Reconnaissance Rover Performance Comparison. Tunstel?s Per-
 formance Metrics Applied to the K10 Rovers and the MERs. Redundant Distance
 Metric Removed From Composite Task Score.
 134
It could be argued, however, that these six metrics contain redundant infor-
 mation and refer to correlated data. Two of the metrics refer to distance traveled,
 and two of the metrics refer to autonomous traverse speed. To analyze the e ect
 that these correlations may have, a traverse metric (longest single sol traverse) was
 removed from consideration and the numbers recalculated. These results were tabu-
 lated in Figure 5.5. According to this set of analysis, the K10 red rover continued to
 outperform all other rovers. Spirit exceeded expectations by 80% and Opportunity
 by 130%. The K10 black rover, however, did not meet the MER requirements. Its
 overall composite task score demonstrates that it performed 120% times worse than
 required.
 For demonstration purposes, a further reduction in performance metrics re-
 moved one of the speed metrics (navigation speed). The results from this analysis
 are tabulated in Figure 5.6. As can be seen, Spirit and Opportunity appeared to
 have had better performance with this set of metrics. Both MERs performed twice
 as well as had been required. K10 red performed 3.5 times better than had been
 required, and K10 black performed 1.3 times worse than required.
 All four rovers had di erent composite task scores in each of the four sets of
 analysis that used Tunstel?s methodology and performance metrics. Not only were
 the quantitative values themselves variable, but the relations between the rovers and
 the requirements were variable. While the K10 red rover outperformed all of the
 other rovers in all of the cases, the ranking of the other three rovers was dependent
 on the set of performance metrics used in each analysis. Even the ability of the
 rovers to meet their design requirements varied between each of the sets of analysis.
 135
Figure 5.6: Robotic Reconnaissance Rover Performance Comparison. Tunstel?s Per-
 formance Metrics Applied to the K10 rovers and the MERs. Redundant Distance
 and Speed Metrics Removed From Composite Task Score.
 136
It is clear from this case study that Rodriguez?s [74] composite task score
 is highly sensitive to the performance metrics used. Adding and removing metrics
 drastically changed the composite task score values. This sensitivity, however, is not
 warranted. It results in a highly subjective ranking of the rovers. The performance
 metric set used in any given analysis could be altered to produce a preconceived
 notion of the intended results.
 It is desirable to  nd a method to gage the relative performance of the four
 rovers that is less dependent on the performance metrics selected. Toward this goal,
 this dissertation?s methodology was applied to the robotic reconnaissance rover case
 study.
 5.4 Application of this Dissertation?s Methodology to the Robotic
 Reconnaissance Rover Comparison
 Application of this dissertation?s methodology to this problem had the poten-
 tial to provide the rigorous performance analysis required to objectively and conclu-
 sively compare the rovers. The methodology uses the  -MOSS objective reduction
 algorithm at its core to determine the importance of each of the objectives (or perfor-
 mance metrics) depending on the amount of information each provides. A pairwise
 comparison of the performance metric values would yield the underlying dominance
 structure for the problem. Using this rigorous structure, selection of performance
 metrics (and identi cation of redundancy) could be done in a non-arbitrary manner.
 The  -MOSS algorithm was applied to the four candidate rovers across the six
 137
Figure 5.7: Robotic Reconnaissance Rover Performance Comparison. E ect of  -
 MOSS Objective Reduction Algorithm and Tolerated  -Error on Performance Metric
 Subset.
 performance metrics described in the previous section. It was determined that while
 performance ratios were illustrative for Rodriguez?s methodology, they would obfus-
 cate rather than aid in the clarity of the algorithm. Instead, the full performance
 score values were used in this analysis, scaled such that the di erence between the
 largest and smallest value would yield a  -error value of 1. The resulting  -non-
 con icting performance metric subsets are tabulated in Figure 5.7.
 It is intriguing to note that even with zero  -error, a large reduction in perfor-
 mance metrics resulted. The six metrics were reduced to two: autonomous traverse
 distance and instrument placement rate. These two metrics contain con icting in-
 138
formation that fully represent the design space. According to the  rst, Spirit outper-
 formed K10 black, which in turn outperformed Opportunity, with K10 red coming in
 last place. According to the instrument placement rate metric, K10 red performed
 best, followed by Opportunity, Spirit, and K10 black. All of the data contained in
 the other four performance metrics contains these pairwise performance relations
 and no new data. These four metrics contain only redundant information, and
 could be removed from consideration without a ecting the underlying dominance
 structure of the problem.
 The four values of  -error applied in the knapsack problem application (and
 by Brockho in [12]) were used in this analysis, supplemented by an analysis of
 which exact values of the  -error caused an alteration to subset composition. The
  = 0% subset was valid for only that error tolerance. When a 1%  -error was
 included in the underlying dominance structure, instrument placement rate was
 replaced by approachability in the reduced performance metric set. When a 20%
  -error was included, approachability was replaced by autonomous navigation speed
 in the reduced set. This subset was unchanged as the  -error value was increased.
 It was not until a  -error of 47% was applied that the reduced set was altered
 again. Navigation speed was removed from consideration, such that the reduced
 metric set contained only autonomous traverse distance. In other words, with a
 47% error in the underlying dominance structure, the rover comparison problem
 devolved into a single performance metric case. Enough problem data had been
 removed that there was no longer con ict between the performance metrics.
 Application of the  -MOSS objective reduction algorithm provided this case
 139
Figure 5.8: Robotic Reconnaissance Rover Performance Comparison. Convergence
 Success of MOGA Runs and MOGA Run Time for Each  Value.
 study with its desired result - an objective, non-arbitrary, rigorous reasoning to
 select some performance metrics and neglect others. This algorithm removed the
 arbitrary selection that had been the basis of Rodriguez?s performance analysis and
 replaced it with a concrete method to ensure that the underlying problem de nition
 remained constant.
 To fully demonstrate the utility of this dissertation?s methodology, the rest of
 it was applied to the robotic reconnaissance rover comparison problem. Using the
 reduced objective sets obtained from the  -MOSS algorithm (the  ve test problem
 depicted in Figure 5.7) were run through the MOGA for 10 stochastic runs of 400
 generations each to generate a Pareto solution set. The results are collated in the
  gures below.
 140
Figure 5.9: Robotic Reconnaissance Rover Performance Comparison. Final Pareto
 Set Composition for Each Test Problem.
 Figure 5.8 shows that the MOGA had 100% convergence success in four of
 the  ve test problems. In the 10% problem instance, however, the MOGA had only
 a 50% success at converging to a Pareto front. Corresponding to the lengthened
 search process of this test case, the run time duration for the MOGA across these
 10 stochastic cases was signi cantly longer than any of the other four test cases.
 The two metrics in the reduced set for this test case presented a strongly con icting
 trade-o between the objective values such that the average tolerance change in the
 Pareto front never dropped to the convergence level.
 Figure 5.9 shows the  nal Pareto solution results for the robotic reconnais-
 sance rover comparison. For each of the  ve test cases, the bar plot demonstrates
 the composition of the Pareto front. In other words, the Figure demonstrates the
 141
percentage of the Pareto set for which each of the four candidate rovers represented
 the best overall solution. It should be remembered that due to the con ict in the
 performance metrics, a single candidate solution was not the anticipated result.
 Instead, a Pareto front of equally-scoring candidate solutions was expected.
 For the  rst three test problems, Spirit was selected as the best candidate
 rover 72%, 56%, and 68% of the time, respectively. In the control case (with all 6
 performance metrics), Opportunity ranked second-best (28% of the Pareto set) with
 the K10 Red rover following in third (16%). With the  = 0% test case pruning
 down to two performance metrics, the K10 red rover and Opportunity switched their
 overall rank (28% and 14%, respectively).
 The K10 black rover represented a small fraction of the Pareto population (less
 than 1% of the total population) for both the control and  = 0% test cases, and was
 not present in the solutions thereafter. Recall that the K10 black rover did not have
 an instrument-arm that enabled approachability and other instrument-driven metric
 analysis. For these types of performance metrics, a zero value had been applied to
 the K10 Black rover to distinguish that it was incapable of performing some of the
 tasks. It is reassuring to see its lack of representation in the  nal Pareto sets of each
 of the test cases because, as an ill-equipped rover, it did not represent a good solution
 option. That it is present in two of the  ve Pareto sets, however, demonstrates that
 the tasks the K10 black rover is capable of performing are performed very well.
 With the reduced objective function set of the  = 10% test case containing
 metrics for autonomous traverse distance and approachability, the MOGA narrowed
 the design space down to an option of the two best performing rovers: Spirit (68%)
 142
Figure 5.10: Robotic Reconnaissance Rover Performance Comparison. Hypervolume
 Indicator Pareto Set Quality Metric For Each Test Case.
 and the K10 red rover (32%). When the  -error was increased to 20%, however, the
 K10 red rover became the best overall performing rover, representing 100% of the
 Pareto population.
 A  nal examination of the Pareto solution sets was conducted by using the
 hypervolume indicator quality metric (for a detailed discussion of the hypervolume
 metric, refer to section 3.2.4). Figure 5.10 contains the calculations from this quality
 metric. There are several important details to note in this  gure. Recall that the S
 quality metric measures the absolute volume coverage of a Pareto set as an indication
 of the diversity and spread of solutions. Comparing this metric for each of the  
test cases against its corresponding control, it can be seen that in the  = 0%, 20%,
 and 40% test cases, the S metric calculations for the test cases were equally as
 143
good as the control cases. In these same test cases, the D metric (which provides
 a relative coverage between two Pareto sets) calculated to exactly zero for both
 D(control,delta) and D(delta,control).
 This last  nding means that neither the control Pareto fronts nor the  Pareto
 fronts dominated each other at any point. With identical S metric calculations and
 zero values for the D quality metrics, there is su cient information to state that the
  Pareto sets for the rover reconnaissance problem were neither better nor worse in
 any aspect than their control Pareto sets.
 It was only in the  = 10% test case that the control Pareto set yielded slightly
 better solutions than the  test case. As seen from the D(control,delta) value for
 this test case, the control Pareto set dominated just under 1% of the  Pareto front.
 In other words, by using the  Pareto set for the 10%-error test case instead of
 the control Pareto set, a 1% decrease in the quality of the solution set resulted.
 This decrease in quality, however, a orded a decrease in the number of performance
 metrics from 6 down to 2 metrics. Additionally, this slight decrease in solution
 quality allowed the reduction from four candidate rovers down to two candidate
 rovers: the K10 red and Spirit.
 Not only did application of this dissertation?s methodology quantitatively and
 objectively reduce the number of performance metrics required for the overall per-
 formance analysis, but including varying levels of error in the underlying dominance
 structure allowed the visualization of the best candidate solutions for the overall
 design problem to be considered.
 144
5.5 Summary
 This case study sought to rigorously compare the operational performance
 of the Mars Exploration Rovers (MERs) Spirit and Opportunity against the op-
 erational performance of two NASA Ames Earth-analogue robotic reconnaissance
 rovers, the K10 red and black. It was demonstrated that Rodriguez?s [74] composite
 task score is highly dependent on the performance metrics selected and was insuf-
  cient for this rigorous heterogeneous performance analysis. To remedy this, the
 methodology proposed in this dissertation was utilized to reduce the performance
 metrics to a subset that maintained the underlying dominance structure of the prob-
 lem. In other words, a reduced set of objectives resulted in no change to the problem
 details and the  nal solution set.
 With zero error in the dominance structure, the six performance metrics used
 in this simpli ed analysis were reduced to two con icting performance metrics. Four
 of the six performance metrics contained only redundant information and could be
 removed from analysis without a ecting the problem and its solutions. If a high
 enough  -error were tolerable in the problem, the dominance structure of the robotic
 reconnaissance problem was demonstrated to be reducible to a single performance
 metric.
 The success of this dissertation?s methodology to objectively and quantita-
 tively reduce the complexity of a multi-objective optimization problem has been
 demonstrated on both the knapsack application, and on the comparison of rovers
 for a robotic reconnaissance mission. The  -MOSS algorithm has proved itself to be
 145
a powerful tool to reduce problem complexity by focusing the analysis on relative
 performance relations rather than on relative numbers. This algorithm, synthesized
 with the other components described in Chapter 3, have created a generic, rigorous,
 objective, quantitative methodology for the HRT domain to both select the perfor-
 mance metrics to be used in an overall team performance analysis and to reduce the
 complexity of a mission designer?s  nal decision space. This type of methodology is
 novel for the domain and a valuable contribution to further the use of human and
 robot teams. It is anticipated that this dissertation?s proposed methodology will
 prove to be a valuable tool in human-robot team con guration selection problems,
 the intended application domain of this methodology.
 146
Chapter 6
 Large HRT Application Demonstration
 The heterogeneous HRT con guration selection problem was the motivating
 application domain for the methodology developed in this research. In essence,
 the question is how to compare the operational task performance of heterogeneous
 teams across many di erent performance metrics, without necessitating preference
 information, and without dictating an equal weighting (or other arbitrary method
 of weighting) the performance metrics. The overall goal is to input a mission or
 set of tasks, several dozen possible team combinations (each with pre-de ned task
 capabilities), and output the overall team performance in a manner that facilitates
 a mission designer?s  nal con guration selection.
 In this  nal experiment, a large-scale HRT con guration selection problem
 was developed. This dissertation methodology was applied to the design space and
 greatly reduced the complexity. The Pareto sets that were generated across the  ve
 test problems were compared. A  nal set of analysis was conducted to determine
 the repeatability of the solutions.
 6.1 Initial Setup: Hubble Space Telescope Servicing Mission 3A
 For this experiment, the mission pro les from the Hubble Space Telescope
 Servicing Mission 3A were selected to form the background of this performance
 147
analysis. HST Servicing Mission 3A (HST SM-3A) provides a useful platform to
 analyze role de nition of each of the mission agents, in uences of task allocation
 schemes, the resulting cooperative schedules, and the overall performance of the
 combined human and robot servicing team.
 The  ight plans from past Hubble Space Telescope servicing missions (HST
 SMs) (as represented in the HST SM-3A EVA Checklists [100]) provide a detailed
 data set from which to examine the e ect of various activities and crew performances
 on space operations. HST was designed with access panels to allow repair of the
 components of the telescope. Some of the servicing tasks required a  ne level of
 dexterity, including manipulation of tethers and electrical connectors.
 Two astronauts were involved in each EVA excursion - one positioned on the
 end of the Space Shuttle?s robotic arm, the remote manipulator system (RMS), and
 one free  oater who either tethered to various parts of HST during task performance
 or used the portable foot restraint (PFR) to anchor to a worksite.
 Both of the humans were obligated during all of the tasks in varying capacities.
 Generally, the astronaut positioned by the RMS (called EVA2) performed hardware
 maneuvering and use of the power tools, while the other astronaut (EVA1) discon-
 nected securing straps and connectors from hardware and provided supporting aid
 when needed. Replacement units were transferred from one astronaut to another.
 Both astronauts were used to carefully position instruments like the  ne guidance
 sensor (FGS) along guide rails for insertion and installation into HST. Only a few
 of the subtasks involved with HST SM-3A required manipulation of electrical con-
 nectors or harnesses.
 148
Four days of EVA were planned for this mission. The 6-hour nominal EVA day
 (including daily setup and closeout of HST worksites) was the primary constraint
 on the number of tasks that could be performed during the servicing mission. This
 constraint was a NASA rule based on environmental exposure and consumables. In
 addition, there were sequential constraints between some of the subtasks (e.g., a
 team member cannot open a door before its securing bolts have been unscrewed).
 Most of the tasks can be broken down into primitives that require single-
 degree-of-freedom motion and utilization of a single tool [68]. Although repetitive
 single-tool primitives require a signi cant amount of the human crew?s time to per-
 form, the skills needed to perform these types of tasks are well within the capabilities
 demonstrated by Robonaut or the Mars rovers. It would be anticipated that if robots
 were used as cooperative team members tasked with performing this type of prim-
 itive that the human crew could be freed to work on other, more specialized tasks.
 This suggests the potential for improvements in crew utilization by including robots
 in operations as cooperative team members.
 The HST SM-3A mission is used in this research as an example mission for
 which a wealth of primary source data is available. Although the mission took place
 in the zero-gravity space environment, a terrestrial servicing mission could require
 similar tasks. The mobility di erences between zero-gravity and other gravity spec-
 trum would be anticipated to a ect the time intervals required to complete each
 subtask. To avoid adding additional sources of error and uncertainty, the subtask
 performance completion times from the EVA checklist were used in this research.
 149
6.1.1 Task Primitive Data Preparation
 The initial steps to create a human and robot team?s schedule follow the
 process of nominal schedule development. The target set of tasks to be accomplished
 must be identi ed and decomposed into task primitives. Following the procedure
 for the three-tier hierarchical task analysis from the NASA mission timelines [100],
 each task (eg. instrument replacement) is decomposed into subtasks (eg. retrieval
 of an instrument), which are further segmented into the primitives (eg. unlatching
 a door) that de ne the speci c steps needed to complete a given task.
 Pilotte [68] analyzed each of the task primitives from HST SM-3B, the follow-
 on mission to HST SM-3A, with the goal of assessing what types of robotic end
 e ectors would be needed to complete the mission. In her analysis, Pilotte catego-
 rized each of the task primitives according to which end e ector would be required
 to perform it. Following her methodology example, each task from HST SM-3A was
 analyzed for this research at the primitive level to determine which of nine di er-
 ent task primitive categories the item belonged (multiple categories was an option):
 visual inspection, mixed-initiative verbal communication, translation between work-
 sites, carry or transport of instruments and hardware between locations, use of a
 handrail-gripper tool, use of a pinch-grasp type of tool, a force-push (requires the
 ability to counterbalance the applied force), a force-pull, and use of an additional
 specialized tool.
 A database was created that contained the human subtask performance times
 from the HST SM-3A data, with the addition of a numeric representation of the
 150
types of primitives that were required for the completion of each subtask. Each of
 these subtasks were then assessed to determine whether they were constrained to
 follow other subtasks. If so, precedent constraints were added to the database for
 each subtask.
 The task scenarios used in this research involved the robotic agents partici-
 pating in the motor activities of the servicing mission but not in the cognitive or
 sensory parts of the mission. The NASA HST SM-3A mission was highly-scripted.
 The tasks that were analyzed as part of this mission were primarily reliant on mo-
 tor skills with a few subtasks involving visual inspection of the worksite. For this
 research, this meant that the robots in each of the scenarios were able to contribute
 signi cantly to the entire mission task list. If the task list had included the per-
 ceptual and cognitive tasks performed by both the mission planners and enacted by
 the EVA crew (and a teleoperator), the ratio of robot subtask involvement would
 have been reduced, but the contribution to the overall task performance and team
 e ciency would remain unchanged.
 The actual rate of task performance for a speci ed robotic agent will vary
 for di erent categories of tasks, and between iterations of the same task. As with
 humans, a robot might not perform a task at the exact same rate on two di erent at-
 tempts. There will be competing priorities in task performance including power and
 energy resource utilization, measurement errors and overshoot correction, di erences
 in environmental factors including lighting, situational awareness, and maintaining
 safe operating conditions in a crowded workspace. All of these factors will a ect the
 robot?s actual task performance speed and will vary between tasks.
 151
The human subtask completion time data used in this analysis was identical to
 that originally anticipated from the HST SM-3A mission. However, some variability
 was added to the robot time data to observe the a ect on the overall team schedules
 and performance if the robots performed tasks at the same speed as their human
 counterparts, at half the speed of the humans, and if their performance required
 three times as much time as the humans.
 The robot time data values used in this analysis provide an initial guide to
 enable scheduling analysis for a cooperative human and robotic team. These values
 could be updated to re ect a speci c robotic system design if desired by future users.
 6.1.2 Team Characterization and Scheduling
 Eighty-two unique teams were generated for this research. Each team had at
 most four agents, two human and two robotic. Each of the agents in these teams
 were de ned based on which of the nine task primitive types they were capable of
 performing, and the rate of robot task performance (equal-to, two-times, or three-
 times as slow as a human). The teams were speci ed with a matrix for each, with
 the rows corresponding to the four possible agents, and each column representing
 a task primitive type. A value of one in the matrix meant that the speci ed agent
 could perform the task primitive type. A value of zero meant that the agent did not
 have the skill set or capacity to perform the task primitive type.
 The humans and robots on each of these teams were assumed to be collocated,
 sharing the same work environment. Team pairings that required the humans and
 152
robots to function in the same workspace were assumed to have the necessary safety
 systems implemented. Amongst the teams, di erent kinds of task specialization
 were selected. In some of the teams, the robotic agents were only capable of carrying
 hardware and translating between locations. This type of task allocation placed the
 robots in the astronaut-assistant role in the mission design. Alternatively, in some
 of the teams at least one of the robotic agents was fully capable of all of the task
 primitive types. This type of robot, similar in capabilities to a mobile version of
 NASA?s Robonaut, contributed to completing mission tasks on a level that a third
 human would have done.
 These team task capability matrices were input into a simple task allocation
 function written for this research. Preference was given to the robot agents when
 allocating tasks. In other words, if a robot was capable of completing the subtask
 (could complete each of the required primitive types for the subtask), that subtask
 was allocated to the robot. Subtasks that required two agents to complete it in
 SM-3A also required two agents to complete it in this research. The task allocation
 scheme took this into consideration, and allocated two agents to a speci ed subtask
 when needed.
 As the task allocation was completed a priori and not altered thereafter, no
 preemption was allowed within the schedules. Once an agent was assigned a task,
 that agent had to complete that task before moving on to another. Adding pre-
 emption into a real-time mission, or allowing agents to simultaneously work on
 two di erent tasks at the same time would facilitate more elaborate and interlaced
 schedules. Neither of these paths were pursued in this research e ort. A simple,
 153
straightforward implementation of team schedules would lend itself to better demon-
 strating the impact of this dissertation?s methodology as described in Chapter 3.
 With similar reasoning, a more advanced task allocation scheme and schedul-
 ing paradigm could be utilized in future work. There has been substantial research
 into both of these areas in the literature. For example, the lunar mission planning
 software package (HURON (Human-Robot Task Network Optimization) [26]) de-
 veloped at JPL could be used to facilitate task allocation, planning, and scheduling
 for combined human and robotic activities. It provides the architecture to develop
 optimal task allocation and scheduling for a scripted multi-agent lunar mission.
 The task allocation and scheduling methods used in this research do not seek
 optimality. Instead, they are focused on simplicity of implementation. The methods
 used in this research were deemed su cient for a methodology demonstration.
 Four parallel and interrelated timelines were generated by the task allocation
 function, one for each of the four possible agents on a team. These schedules sought
 to  ll each agents? free time, and complete their allocated tasks as quickly as possible.
 Subtasks were shu ed in each agent?s schedule to minimize agent downtime if the
 subtasks were not sequentially constrained. Precedent constraints were observed
 during the initial scheduling process - a subsequent subtask was constrained to
 begin only after its preceding subtask had been completed, regardless of which
 agents performed each subtask.
 It is intuitive that adding a third and fourth agent to the servicing team would
 create schedule bene ts by reducing the workload of the original two human crew
 through parallel execution of tasks. The level of contribution of these additional
 154
agents depended on the proportion of mission tasks that the agents were equipped
 to perform, the interdependence of this subset of tasks to those performed by the
 other crew members, and the rate of task performance and completion of the subset.
 The scheduling tool rearranged subtasks within the schedule to minimize hu-
 man involvement time in each mission day (freeing the astronauts to work on other
 tasks) and minimize the time one agent has to wait for another while maintaining
 the precedence constraints between subtasks.
 As had been identi ed in [89], the primary constraint on the cooperative human
 and robot schedules was the nominal 6-hour EVA time window. All nominal NASA
 operations that involve the human crew outside of the vehicle must occur within
 this 6-hour window. As implemented in this research, the length of the human
 involvement was limited to 6 hours a day. If a team schedule required human
 involvement for longer than this time interval, a penalty was applied to the objective
 function values (please refer to the discussion of algorithms in section 3.2).
 6.1.3 Performance Metrics
 The performance metrics selected for this analysis were intended to measure
 the overall team performance of each of the heterogeneous teams, across multiple
 domains of interaction. To achieve this goal, several di erent types of performance
 metrics were selected. A total of  fteen performance metrics were utilized in this
 analysis. Each performance metric will be discussed in this section.
 Limiting the analysis to  fteen performance metrics was done for several rea-
 155
sons. As seen in the  rst set of experiments with the knapsack problem in Chapter
 4, all of the algorithms ran smoothly and the MOGA performed reasonably well
 with 15 objectives. When the number of objectives were increased to 25, however,
 the MOGA struggled to  nd an optimal Pareto front. This had been identi ed as
 a weakness of Matlab?s MOGA itself. A di erent MOGA could be used in further
 research that was speci cally equipped to deal with the many-objective nature of a
 larger HRT con guration selection problem. However, for this analysis and for the
 purpose of demonstrating how a large HRT con guration selection problem could
 be framed, using  fteen objectives was deemed su cient.
 The metrics were altered such that each sought to minimize its respective
 objective function. For example, to maximize the ratio of the amount of time
 spent actively contributing to the mission to the total mission involvement time, the
 objective function used would seek to minimize the amount of time spent inactive
 to the total mission involvement time.
 A combination of task load model and expected value model was used to frame
 the large HRT con guration selection problem. As detailed in section 2.2.5, task
 load models facilitate workload considerations and inter-agent comparisons. Ex-
 pected value statistics are structured to compare di erent candidate systems with-
 out requiring a high level of certainty in the data. These two model types can be
 best seen in the types of performance metrics selected for this analysis.
 156
6.1.3.1 Task Load Model Performance Metrics
 A few overall objectives permeate the entire mission planning spectrum. The
 mission task list total completion time should be minimized. This should not be
 interpreted to mean that this objective was paramount. On the contrary, it should
 be considered along with the other objectives as a goal, but  exible to enable mini-
 mization of the other objectives.
 In addition, a primary criterion used to assess the value of a cooperative HRT
 over the standard two-human crew from a task load model perspective is the reduc-
 tion in time that the human crew must wait for the robotic agents to perform their
 subtasks. This reduction represents a specialization of the human activities and a
 check on the e ciency of the developed schedule (how e ciently the humans are
 used as a resource).
 The a priori nature of the designer?s perspective taken for this analysis facil-
 itated selection of a work e ciency metric. Schreckenghost?s work e ciency index
 [83] was de ned to be the ratio of productive time to overhead time that occurs
 in an agents? task schedule. This can be viewed as the e ciency of resource use -
 whether an agent actively contributes to task completion or stands idle waiting for
 a precedent-constrained task to be  nished.
 Since it is desirable to maximize a work e ciency index, the inverse was used
 in this analysis to enable a minimization objective function. For the human agents,
 the ratio of human inactive time to total human involvement time would represent
 the e ciency with which the human agents were utilized as a resource. This metric
 157
was applied to each of the two types of crew agents in each team pairing. For
 the objective functions used in the MOGA, these two metrics are represented by
 minimizing the fraction of human inactive time to total human involvement time,
 and minimizing the fraction of robot inactive time to total robot involvement time.
 The scenario perspective adopted for this analysis was both human-centered
 and robot-centered. Although preference was given in the task allocation scheme
 to humans (assigning tasks to robots if they were equipped), the performance met-
 ric analysis did not have a preferred resource. Both the human agents and robot
 agents were treated equally. Both the total time that human agents were active
 and the total time that robotic agents were active were objectives to be minimized.
 By minimizing each agents? active time, the analysis sought to make each a more
 specialized resource.
 Autonomy levels were set for each team pairing. No sliding autonomy was
 considered because this applies to real-time mission scenarios rather than a priori
 mission scenarios. Supervisory control of the robots was assumed, although the
 team pairings would be equally valid for autonomous robotic systems. The task
 performance time would be expected to change for fully autonomous systems, but
 as this data was merely representative for this research, this e ect was neglected. If
 the autonomy of each of the teams and their component agents had been speci ed, it
 would be bene cial to utilize comparative autonomy metrics like those discussed in
 section 2.1.1. Similarly, a communication architecture could be speci ed in a more
 comprehensive performance analysis. For this research, some robots were given
 capacity to perform the verbal task primitive, enabling them to take part in a wider
 158
range of subtasks. Other consequences of a communication architecture were not
 considered in this experiment.
 To represent human physical workload, a modi ed Cooper-Harper scale was
 utilized (as suggested in [70]). According to the modi ed Cooper-Harper scale,
 workload ratings of 3 and below correspond to small and inconsequential errors
 made during an acceptable level of e ort. A rating of 1 is deemed very easy, when an
 operator?s mental e ort is minimal and the desired performance is easily obtainable.
 A workload rating of 2 is deemed easy, when an operator? mental e ort is low and
 desired performance is attainable. A rating of 3 corresponds to a fair to mid di culty
 level, when an operator?s required mental e ort is acceptable to reach adequate
 system performance.
 To develop performance metric data for each of the 82 unique team combina-
 tions and their corresponding schedules, values were assigned based on the number
 of physical subtasks the human agents were required to perform. Corresponding to
 the number of task primitive types that each agent performed, if an agent performed
 fewer than 3 di erent types of tasks, they were rated with a physical workload value
 of 1. Less than 5 types of primitives corresponded to a workload value of 2, and
 more than 5 types of primitives corresponded to a workload value of 3. To add more
 variability to the performance metric scores for the unique team combinations, a
 value between zero and 0.3 was randomly added to the scores such that, for exam-
 ple, the workload values of 3 were randomly selected on the range between 3.0-3.3,
 inclusive.
 Mental workload for an intra-vehicular astronaut controlling or supervising a
 159
robotic agent was assessed depending on whether the robotic agent had the capa-
 bility to provide visuals to the controller (one of the task primitive types used to
 characterize each of the agents, as discussed in section 6.1.2). In this usage, an
 operator?s situational awareness of the robot?s work environment directly in uenced
 mental workload. Using the modi ed Cooper-Harper scale for unmanned vehicles
 (as suggested in [70] to gage mental workload during o -site supervisory control), a
 workload rating value of 2 corresponds to an operator having an acceptable display
 of the vehicle?s environment. The display may have minor issues, but they do not
 interfere with the operator?s performance. A workload rating value of 4 corresponds
 to an operator having insu cient information about the vehicle?s environment to
 enable decision making.
 In this research?s performance metric analysis, if the robot supplied a visual of
 its workspace, the mental workload was rated at a 2. If no visual of the workspace
 was provided by the robot, the workload rating was a 4. If no robots were used in
 a speci c team pairing, then the IVA astronauts mental workload was rated at 0
 (no supervisory control was required). To add more variability to the performance
 metric scores, the workload values had an additional randomized value up to 0.3
 added to the  nal score such that a rating of 4 was assigned on the interval from
 4.0 to 4.3, inclusive.
 160
6.1.3.2 Expected Value Statistics Performance Metrics
 Performance metrics in this category sought to compare the relative perfor-
 mance of each candidate system to a reference. This facilitated use of metrics
 for which exact performance data was unavailable. Should the robotic agents of
 candidate system be more thoroughly speci ed in future analysis, their speci c per-
 formance data would allow selection of di erent performance metrics. For this re-
 search, however, these metrics allowed consideration of a wider selection of team
 performance characteristics.
 It would be intuitive to maximize the autonomous navigation speed of each of
 the agents (metric described and used in Tunstel [96]). Higher traverse speeds would
 enable a larger range of territory to be covered in a shorter interval of time. Since
 the MOGA for this experiment was being built as a minimization function, this
 performance metric was represented as an expected value statistic, gaging each of
 the candidate robotic systems against a fully functional reference system. If both of
 the potential robotic agents in a team pair were capable of traversal, the autonomous
 navigation speed rating of the team pair was given a value of 10. If either of the two
 potential robotic agents in a team pair were capable of traversal, the autonomous
 navigation speed rating of the team pair was given a value of 50. If either of the two
 potential robotic agents in a team were present and active (if either was capable of
 at least 1 task primitive type) but both were incapable of independent movement,
 the autonomous navigation speed was given a rating value of 150. If there were
 no robots in a given team combination, then the human pair were given a rating
 161
value of 10, equivalent to the fully functional robotic reference system. To add more
 variability to the performance metric scores, the navigation speed rating values had
 an additional randomized value up to 20% added to the  nal score.
 Similarly, it would be intuitive to seek to maximize the autonomous traverse
 distance of each agent (a measure of longevity of the agent, with use demonstrated in
 Tunstel [96]). This task would require not only the ability to translate (as measured
 by the autonomous navigation speed metric), but would also require the capacity
 to visually inspect the terrain during this travel. The reference system for this per-
 formance metric was two fully functioning robotic agents, capable of both traversal
 and visual inspection. A rating value of 1 was given to this type of system.
 If only one robot was capable of traversal and visual inspection, this robot
 earned a rating of 10, equivalent to the fully functioning candidate system. If only
 one robot was capable of traversal and visual inspection but both were capable of
 traversal, a rating value of 50 was earned to denote the added workload required
 by the one robot to provide visualization for both robotic agents. If only one robot
 was capable of traversal, but this robot was not capable of visual inspection, the
 team earned a rating value of 50. If either robot was present and active (if either
 was capable of at least one task primitive type) but both were incapable of traversal
 and visual inspection, the team earned a rating value of 150. As with the previous
 metric, if there were no robots active for a candidate team (i.e. a human-only
 team), an autonomous traverse distance rating value of 10 was used. To add more
 variability to the autonomous traverse distance score, the values had an additional
 randomized value up to 20% added to the  nal score.
 162
Tunstel?s approachability performance metric [96] was instrument-dependent.
 It related the number of instrument targets a given robotic system could reach,
 relative to the total distance traveled to reach those targets and the total amount
 of time required. For this analysis, the reference system was capable of not only
 translating and visual inspection, but was also required to be carrying an on-board
 tool for data acquisition of its environment.
 If a robotic system was capable of each of these three task primitive types,
 it earned an approachability rating value of 10. If a robotic system was capable
 of traversal and visual inspection but did not carry an on board tool, it earned a
 rating value of 30. While this might seem a generous rating, it was anticipated
 that the robotic agent would  nd and locate targets. A human agent would then
 arrive at the speci ed target and use the instrument. If neither robot was capable
 of traversal, visual inspection, or carrying a tool, an approachability rating value of
 100 was used to mark the team as incapable according to the approachability metric.
 As with the previous metric, if there were no robots active for a candidate team,
 an approachability rating value of 10 was used indicating equivalent performance to
 the robotic reference system. To add more variability to the approachability rating,
 the values had an additional randomized value up to 20% added to the  nal score.
 The error-rate for each agent on a candidate team was determined as a function
 of the number of task primitive types each agent was required to perform during each
 subtask. If an agent performed three or fewer task primitives in a given subtask,
 its error rating for that subtask was 0.001 times the number of task primitive types
 required. If an agent performed six or fewer task primitives, the error rate was 0.01
 163
times the number of task primitives required. If an agent performed more than six
 types of task primitives in a given subtask, the error rate was 0.1. The increase
 in error rate corresponds to the notion that if an agent performs a wider selection
 of tasks, this increases the chance that the agent will make a mistake. The  nal
 error rating for each agent was the summation over all of the subtasks that the
 agent performed for each team. This error rating metric was a stand-in for more
 substantial technical data that would be available if candidate robotic systems were
 speci ed for this method. The  nal error value then corresponds to the number of
 potential errors that each agent would be anticipated to make.
 The  nal performance metric used for this large HRT con guration selection
 problem analysis was designed to be a cost function to penalize over-quali ed teams.
 For example, if two teams are equally capable of performing the entire mission
 scenario, then the least capable of these teams would be the preferred team. Each
 added primitive capability of the more quali ed team represents an unused and
 redundant skill.
 Each agent on each team was assessed to determine the number of primitives
 it was capable of performing, and a cost was applied for each. The total number of
 primitives for the two human agents were multiplied by a factor of 10 for numerical
 relevance in comparison to the other objective function values. Each of the robotic
 agents? cost was computed in the same way with the addition of up to 20% added
 cost to denote that, in general, designing for robotic systems has a higher fabrication
 cost than designing for humans. Each overall team?s cost value was the summation
 of each of the four agents? individual costs.
 164
6.2 Large HRT Con guration Selection Experiment
 There were several steps to setting up the experiment. All of the large HRT
 problem-speci c information was calculated and stored in a database. This database
 would be referenced several times during the MOGA simulation. The  tness function
 used by the MOGA translated the design variables into potential team combinations,
 checked them against the database, and returned their corresponding information.
 The MOGA parameters were speci ed for each run. Each of these areas will be
 discussed in more detail, and the results of this experiment are discussed in the
 following section.
 6.2.1 Inputs: Database Development
 Several databases were created prior to running the simulations of this experi-
 ment. The 82 unique team combinations (de ned solely by the task primitives each
 of the four possible agents were capable of performing) were sent through the task
 allocation scheme. The result was a task performance schedule for each of the 82
 team combinations that speci ed the order of task performance and which agents
 were required for each of the subtasks. Each of these team combinations and their
 corresponding schedules were evaluated across all of the performance metrics spec-
 i ed in section 6.1.3. All of this information (agent task primitive capabilities for
 each team, team schedules, and team performance data) was stored in a Matlab
 data  le, to be used as a reference database for the experiment simulation.
 It was noted that although there were 82 unique team combinations, there
 165
were not 82 unique schedules generated for the database. Due to the simpli ed task
 allocation scheme used in this research, several of the team combinations resulted
 in the same task allocation and performance schedule. This can be seen in multiple
 ways. First, if two distinct teams perform the schedule in the same way, then any
 additional capabilities or skill sets of one of the teams over the other represent unused
 capabilities. The more equipped and advanced robotic agents from these scenarios
 did not result in schedule or overall team performance improvements. These robotic
 agents were under-utilized in the schedules.
 Several of the performance metrics implemented were designed speci cally to
 catch this type of under-utilization and penalize the team score (approachability,
 workload, autonomous traverse distance, cost, etc.). Only those metrics that relied
 solely on data from the team performance schedule (total mission completion time,
 agent active time, etc.) would show no numerical di erence between the di erent
 team combinations.
 6.2.2 De ning the MOGA Fitness Function
 Most of the operations required for this simulation were grouped into the
  tness function of the multi-objective genetic algorithm (implemented in Matlab).
 A detailed description of the data  ow and functions written for the HRT  tness
 function can be found in Appendix B. Fundamentally, the  tness function discretized
 the design variables from each MOGA generation into potential team combinations.
 These combinations were either present in the database, or the team pairing was
 166
identi ed as not one of the feasible combinations.
 The teams represented by each of the design variables were assessed for their
 objective function performance (according to the performance metrics), and the
 mission constraints were applied to each design point. Each of these modi ed  tness
 values (re ecting both objective function values and feasibility of the design point)
 were fed back into the MOGA to be used in generating the next population of design
 points.
 6.2.3 Specifying MOGA Parameters
 When running the MOGA for this set of simulations, the built-in MOGA
 control options were altered. The  rst instantiation of the MOGA in this simulation
 had been intended to run through 100 generations and then stop the search process
 (with an elite count of 15 individuals, average spread in the Pareto front of 1e-02,
 200 individual population size, crossover fraction of 0.8, and the adaptive mutation
 algorithm set to 0.2). This would provide a large population of potential data points
 to initialize the objective reduction algorithm.
 Each of the reduced objective sets (with a  -error of 0%, 10%, 20%, and 40%)
 were run through the MOGA. The options were selected for this set of experiments
 to enable the MOGA run through its entire search process and come up with its
 best Pareto front for each of the given problems. The MOGA was set to run for
 400 generations in each of the problem instances, maintaining a population of 200
 individuals, until either the maximum generation counter was reached or the average
 167
spread in the Pareto front was on the order of 1e-06 (tolerance to determine that the
 Pareto front has converged), with an elite count of 15 individuals, average spread in
 the Pareto front of 1e-02, 200 individual population size, crossover fraction of 0.8,
 and the adaptive mutation algorithm was set to 0.2.
 6.3 Results and Discussion
 6.3.1 Objective Reduction for the HRT Con guration Selection Prob-
 lem
 Brockho ?s  -MOSS algorithm for objective reduction was utilized in this ex-
 periment to reduce the original 15 objective function set. The input to the objective
 reduction algorithm was the MOGA results from an initial 100 generation run. The
  -MOSS algorithm was then run four di erent times: with a  -error of 0%, 10%,
 20%, and 40%. The results of this algorithm are tabulated below, in Figure 6.1, and
 each of the objectives listed are ordered in terms of importance as speci ed by the
  -MOSS algorithm.
 As anticipated, the  -MOSS algorithm proved capable of reducing the size of
 the objective set for the large HRT demonstration problem. The algorithm reduced
 the required number of objectives from 15 down to 9 in the 0%-error case - a sub-
 stantial improvement in the complexity of the problem without introducing an error
 into the underlying dominance structure of the problem. In other words, the 15 per-
 formance metrics contained redundant information about the underlying relations
 between the objectives. Even though each of the performance metrics measured a
 168
Figure 6.1: Large HRT Con guration Selection Problem: Reduced Objective Sets
 Resulting from the  -MOSS Algorithm
 di erent quantity, 6 of them could be removed from consideration without altering
 the underlying structure of the problem.
 Figure 6.1 contains the results for each of the four  -error test cases. Recall
 that the ordering of the objective functions in this  gure corresponds to the ranking
 assigned by the  -MOSS algorithm according to the quantity of pairwise relations
 represented in the data. Allowing a 10%  -error in the underlying dominance struc-
 ture of the problem allowed the objective set to be further reduced down to  ve
 objectives. Increasing the error to 20%, the objective set was shrunk to include only
 two objectives. Increasing the error tolerance beyond 20% did not further reduce
 the size of the objective function set. The same two objectives were required to
 preserve the underlying dominance structure in the last two test cases.
 The third and  fth objectives (EVA day length and robot day length, respec-
 tively) were deemed important for preserving the dominance structure in all of the
 test cases, though their importance in contributing problem information di ered (as
 169
represented in Figure 6.1 by the objective function numbering). It is interesting to
 note that these two performance metrics were selected - they represent the crux of
 the decision-making within a HRT team: how many tasks and how much time to
 allocate to the human portion of the crew and to the robot portion of the crew to
 best optimize agent usage.
 In both the  = 0% and the  = 10% test cases, two other objectives were
 prominently listed in the reduced objective set: the fourth and second objectives
 (overall mission time and the cost of each team, respectively). Put together, these
 four objectives identify most of the design trade-o s within this instance of the
 large-scale HRT con guration selection problem. These objectives balance which
 agents are utilized within the mission scenario with preference for the least capable
 of the agents (to reduce the cost of the overall team). The most equipped team
 (2 human crew with 2 Robonaut-like robotic agents) would have the potential to
 perform the overall mission fastest with an equal split in workload levels between
 the human and robotic agents, but this crew would also be the most costly in terms
 of built-in capabilities.
 The  fth objective listed in the  = 10% test case?s reduced set (the ninth ob-
 jective - autonomous navigation speed) was only selected in that test case. Combined
 with the four objectives just discussed, this additional objective included consider-
 ation of how well the robotic agents were able to perform their tasks, rather than
 merely considering the completion time required. This performance metric allowed
 di erentiation between the di erent kinds of robotic agents - not only would the
 cost of a robotic agent?s capabilities be considered, but the ability of those skills to
 170
contribute to performance e ciency would be included in judging the design space.
 Autonomous navigation speed was not one of the metrics selected in the  
= 0% test case. Instead, other measures of performance e ciency were selected.
 Objectives 10 and 11 (approachability and autonomous traverse distance) helped
 characterize how well robotic agents performed the survey-task portions of their
 scripted missions. Objectives 7 and 8 (human physical and mental workload) de-
 picted how hard the humans had to work to complete their assigned portion of
 the task list. Objective 6 relates the work e ciency index for the robotic agents,
 indicating the fraction of the entire mission that the robotic agents were involved.
 Considering the performance metrics selected in the  = 0% test case, the
 metrics represented mission and agent time, and the e ciency of each agent?s task
 performance. All of these metrics combined represented a non-redundant set of
 problem-speci c information that wholly characterized the objective space.
 It is interesting to note that  ve of the performance metrics included for con-
 sideration in the original problem description (objectives 1, 12, 13, 14, and 15 -
 human work e ciency index, and the error rates for each of the four agents (based
 on variety of task primitives performed in each team scenario), respectively) were
 deemed wholly redundant by the  -MOSS algorithm and were never selected for
 placement in the reduced basis.
 From an initial analysis of the 15 original performance metrics, it would not
 have been possible to determine which (if any) of the metrics contained redundant
 information. These  ve metrics had been included because they contained additional
 information about how well each of the agents performed their respective missions
 171
on each of the 82 teams. None of these metrics correlate to the same primary
 data as the other metrics - they all appear to calculate independent values. The
 error rate performance metrics sum not only each agent?s skill set, but the usage
 across the task allocation and developed schedules. The human work e ciency index
 di erentiates between human involvement in the mission scenario and the human
 actively working on a task (it records human inactive time as an ine cient use of
 humans as a resource).
 It should be noted, however, that the selection or rejection of performance met-
 rics for this demonstration problem directly correlates to problem-speci c data. For
 other large scale HRT con guration selection problems, it would be faulty to select
 performance metrics based on their selection in this problem. Di erent candidate
 HRTs will have di erent pairwise relations across all of the performance metrics,
 and a complete picture should be sought for each new problem instantiation.
 6.3.2 Pareto Results for the HRT Demonstration Problem
 This experiment followed the analysis steps laid out in the methodology dis-
 cussed in Chapter 3. The reduced objective sets were used to run the MOGA 5 times
 for statistical relevance. Each run sought to reach convergence of the Pareto set.
 The output from all of the runs were collated and sent a  nal iteration through the
 continuous update algorithm to verify that they represented a set of nondominated
 solutions. The results from this  nal analysis can be seen in Figure 6.2.
 Figure 6.2 displays each of the 82 unique teams that were feasible solutions for
 172
Figure 6.2: Teams Selected for Large HRT Demonstration Problem in Each of Five
 Test Cases
 173
the large HRT demonstration problem on the x-axis. On the y-axis are the  ve test
 cases that were run (vertically o set for visibility only - there is no signi cance to
 height on the y-axis). There are several interesting trends to notice about the Pareto
 sets from this Figure. First, the Pareto sets resulting from the Control test case and
 the  = 0% test case are identical even though there was a substantial di erence in
 the performance metrics used to generate them (6 additional performance metrics
 were used in the Control test case that were deemed redundant in the  = 0% test
 case by the  -MOSS algorithm). This result justi es the use of the reduced objective
 function set - with a  -error of 0%, zero change was incurred in the underlying
 dominance structure of the problem. Both of these Pareto fronts demonstrate that
 the MOGA found 41 unique candidate teams that performed no better than each
 other. The mission designer?s decision space was reduced from 82 teams measured
 across 15 objectives in the Control test case down to 41 teams measured across 9
 objectives.
 Recall that allowing a 10% error in the underlying dominance structure of the
 problem reduced the objective set from the 9 non-redundant objectives from the
 0% test case down to  ve objectives. This smaller set of criteria from which the
 solutions were judged helped winnow down the number of solutions in the Pareto
 set for the 10% test case to 32 teams. Including a 20% error brought that solution
 set size down to 9 teams in the  nal Pareto set. The  nal Pareto set for both the
 20% test case and the 40% test case were identical. With the two performance
 metrics in the reduced objective set, these nine solutions were identi ed as better
 solutions than the other 73 candidate teams. Expanding the allowed  -error in the
 174
Figure 6.3: Termination Reason and Run Time for Large HRT Demonstration Prob-
 lem
 dominance structure of the underlying problem allowed further down-selection of
 the teams represented in the Pareto front, easing a mission designer?s  nal selection
 of a team con guration.
 Figure 6.3 demonstrates the convergence success for each of the test problems
 across each of the 5 stochastic runs through the MOGA. The control case,  = 0%,
 and  = 10% test cases all converged 100% of the time. The MOGA was able to
 locate the Pareto front even with the larger number of competing objectives. In the
  = 20% test case, this convergence rate dropped to 20% and in the  = 40% test
 case this fell to 0%. In these last two test cases, the MOGA searched along two
 strongly con icting objectives for the Pareto front. However, the  tness function
 175
had been written to be insensitive to small decimal changes between generations of
 design variables (a low level of rounding was employed). This meant that the MOGA
 did not reach the small average tolerance of Pareto spread that it was searching for
 as the criteria for convergence.
 As had been the case with the knapsack test problem instances from Chapter
 4, the computational time required for the  -MOSS algorithm was the primary
 disadvantage of this method. As seen in Figure 6.3, the  -MOSS algorithm required
 more time than the control test case?s MOGA, and the MOGA run time for each
 of the  test problems increased in time after a  -error of 20% was included. In
 comparison, the control test problem?s MOGA required less than 10 minutes to
 complete. As had been concluded from the knapsack test problems, if computational
 time is a limiting factor for future performance analysis, this methodology would
 not be advantageous.
 However, if computational time is not an issue for a priori analysis, a signi -
 cant reduction in problem complexity can be achieved by utilizing the methodology
 proposed in this dissertation. Taking the  = 20% test case as example, a run time
 equivalent to four times that of the control case reduced the problem from a consid-
 eration of 41 teams in the  nal control Pareto set compared across 15 objectives to
 a consideration of 9 teams compared across 2 objectives - a substantial reduction.
 A  nal analysis on the solution quality of the Pareto sets resulting from each
 of these  ve test cases was performed, utilizing the hypervolume quality metric (see
 section 3.2.4). These results are tabulated in Figure 6.4.
 There are several important trends to note from the data represented in Figure
 176
Figure 6.4: Hypervolume Indicator Quality Metric for Each Test Case of Large HRT
 Demonstration Problem
 177
6.4. Recalling that the S quality metric re ects absolute volume coverage of the
 Pareto front, it can be seen for each of the four test problems that the reduced
 objective set resulted in a Pareto front with nearly identical volume coverage. Each
 Pareto front generated from this analysis represented remarkably good coverage
 across all of the objectives.
 Recall that the D quality metric was the relative volume coverage used to
 compare two di erent Pareto sets. In this large HRT problem, the control was
 compared across each of the four test cases with di erent  -error values. In the  
= 10% test case, the Pareto volume coverage of the control set was approximately
 5% better than the Pareto volume coverage resulting from the reduced objective
 set. This result is consistent with those seen in section 3.2.4 for the knapsack test
 problem.
 However, the results from the other three test problems in the large HRT
 problem are impressive. A fraction of 1% was calculated for the relative volume
 coverage both comparing control to the  -Pareto set, and  -Pareto compared to the
 control. There were no portions of the Pareto front that were covered by one set
 and not by the other. This provides the analytical support for the assertion that
 the Pareto sets resulting from the reduced objective sets not only provide a good
 approximation of the control Pareto set (in general), but (in this case) provided
 the exact same coverage of the Pareto front. A 1% error in solution quality would
 be introduced if a mission designer chose to use the Pareto set generated by the
 2-objective  = 20% and 40% test cases instead of the 15-objective control case.
 This result is the  nal justi cation that was sought from running the large
 178
HRT demonstration problem - application of the methodology proposed in this dis-
 sertation provided the rigorous performance analysis that had been missing from
 previous research in the literature. The  -MOSS algorithm provided objective, ana-
 lytical reasoning for selecting some performance metrics and deeming others redun-
 dant. Rather than running a comprehensive performance analysis with 15 objectives,
 a researcher could objectively and conclusively run the performance analysis with
 fewer objectives at any of the given error-tolerances and achieve very similar results.
 6.3.3 Team Con guration Analysis
 A closer examination of the best performing teams from the 20% and 40%
 (those depicted in the Pareto sets in Figure 6.2) test cases will be illuminating.
 Team 1 represented the standard two-human crew from the original NASA mission,
 without any additional robotic agents.
 The speed of the Robonaut-like robots on the Pareto teams for these test cases
 represented an interesting trade-o in design parameters. Team 41 contained the
 standard two human crew and two Robonaut-like robots that performed tasks at
 1/2 the speed of the human crew. Team 62 contained the standard two human crew
 supplemented by one Robonaut-like robot (equivalently capable to a third human
 crew member) that required three times the amount of time to perform a task as
 its corresponding human crew. The extended time required for robot performance
 was traded with cost for the extra skill capabilities for the team such that the 1/2
 speed robots were more worth the cost than the 1/3 speed robot in the overall team
 179
picture.
 Team 21 has the standard two human crew supplemented by one Robonaut-
 like robot (equivalently capable to a third human crew member) and a fourth agent
 with the skill set of an astronaut-assistant rover with no specialized tools or skills
 (capable of independent translation, load carry, site survey, force-push, and force-
 pull). Team 59 represents the same team agents as team 21 with the restriction
 that the robotic agents required twice as long to perform tasks as their human
 counterparts. It is interesting to note that both of these teams were included in the
  nal Pareto front, and also that the same team con guration that performed tasks
 at 1/3 the rate of the human crew was not present in this  nal set. It is clear that
 the trade-o between slower robot (less costly) and mission completion time could
 allow the two di erent robot speeds for the teams, but that the 1/3 rate robot was
 too slow and brought down overall team mission performance.
 Teams 29 and 57 involved a variation on the Robonaut-like robot with all skills
 except the ability to independently initiate communication. Team 29 supplemented
 this robot with the two human crew and an independently mobile robot that could
 provide an additional camera and visual perspective (verbal, visual, and translation
 primitives). Team 57?s Robonaut-like robot required twice as long to perform its
 tasks, and the team varied the second robot agent by removing the translation prim-
 itive. The fourth agent on this team was capable of providing a visual perspective
 and communicating its observations to the other agents, but required an external
 agent to place it at a desired location. Both of the teams? robotic agents contained
 between them all of the primitive skills needed to complete the full set of tasks. The
 180
two teams vary in the details of how the two robotic agents would split the tasks.
 Team 80 contained the standard two human crew, one Robonaut-like robot
 (equipped as a third human), and one robotic agent that could provide an additional
 visual perspective but required an external agent to place it at a desired location.
 Both of the robotic agents on this team moved at 1/3 the speed of their human
 counterparts.
 Team 25 was the only con guration in the  nal Pareto set to involve a di erent
 skill set in the Robonaut-like robotic agent. The primary robotic agent on this team
 was fully capable except that it was not able to use a specialized tool. This would
 be a version of Robonaut limited by its end e ectors (or robotic hands) available for
 a mission. It was capable of gripping handrails (EVA interfaces) and using a pinch
 grasp for delicate holds, but would be incapable of using other specialized tools like
 drills. This team was supplemented by a fourth agent that provided an additional
 visual perspective and was capable of independent mobility.
 Stepping back from the details of the candidate teams selected for the  nal
 Pareto sets of the  = 20% and 40% test cases, there are several overarching trends
 to be noted. In the representative best-performing teams, there were fundamentally
 only three di erent types of robotic agents: a Robonaut-like agent with the skill
 set of a human (or only one primitive o from this robot), an astronaut-assist
 robot capable of site survey, independent mobility, and carrying a load between
 desired locations, and an intelligent camera (in some instances it had independent
 mobility, in other cases it required another agent to place it). The teams in the  nal
 Pareto set contained combinations of these robots performing at di erent speeds
 181
to supplement the standard two human crew. It was the robotic agents that could
 substantially o set the human crew?s workload that were most useful to the overall
 team performance. However, it is intriguing to note that fully equipped robots were
 not necessary for this balance to be obtained.
 Why did other variations of robotic agents not make the cut into the  nal
 best-performing set? This can be traced back to the task allocation and scheduling
 schema utilized for this demonstration HRT con guration selection problem. During
 the task allocation process, if a robotic agent was capable of performing all of the
 primitives within a subtask, then that subtask was allocated to the robot. Otherwise,
 it was allocated to the human agents. Robotic agents with sparser skill sets were
 not capable of performing a su cient percentage of a subtask to be tasked with it.
 In other words, candidate teams that had these sparser skilled robotic agents did
 not use them su ciently to o set the cost of including the agent in the team.
 It is possible to assess the nine di erent types of task primitives and use this
 information to determine the con gurations of robotic agents that provide su cient
 utility (based on the assumptions built into this demonstration problem) to o set
 the human crew?s workload enough to justify the added cost of the robot. The
 unique skill combination of visual inspection and communication initiative was the
 least-skilled robotic agent in the Pareto set. The next combination of task primitives
 included translation for independent mobility. These two di erent combinations rep-
 resent a category of survey robots that aided the human crew by exploring additional
 territory without the human crew needing to traverse.
 From this base skill level, the added capability of load carrying to transport
 182
objects was greatly useful for o oading the workload of the human crew. From this
 level, however, the robotic agents were not su ciently skilled enough without several
 more primitive type-capabilities including force-push, force-pull, and handrail grip,
 which combined to allow a survey robot to independently pick up and put down the
 objects it had been carrying.
 In the  nal Pareto set, there is a very clear trade-o between robot skill level
 and robot speed. The more equipped robotic agents frequently were designated to
 operate at a slower speed than their human counterparts. This is the reason that
 some team combinations only existed in the  nal basis with slower versions of the
 robots and why their faster versions did not make the basis.
 This correlation between robotic agent skill level, task performance speed, and
 overall cost represents an intriguing guiding principle for future design of robotic
 agents and of optimized human and robot teams.
 6.3.4 Discussion of Reproducibility
 A  nal question that should be asked in this analysis is what the variation
 in the composition of the reduced objective sets say about the reproducibility of
 these sets for selecting performance metrics. In the case where two performance
 metrics contain equivalent problem data and represent the same pairwise dominance
 relations, the  rst encountered in the  -MOSS algorithm?s iterations will be selected
 for placement into the reduced objective set. In the next iteration, therefore, all
 of the relations contained in the second equivalent performance metric will already
 183
be represented in the basis. This second performance metric will not be selected to
 be included in the basis. While the initial input ordering of performance metrics
 will obviously have an a ect on those selected for the reduced set, the pairwise
 dominance relations represented by the  nal reduced set will remain the same. In
 other words, a variation in the performance metrics used in an overall analysis, if
 varied by the  -MOSS algorithm, will not have an e ect on the resulting Pareto
 solution sets.
 In a sense, all of the problem detail can be represented by these candidate
 solutions. The algorithm determines which objective functions provide the most in-
 formation for the given candidate solutions. When a set of poor candidate solutions
 were input into the  -MOSS algorithm, a di erent set of objective functions was
 deemed necessary to re ect the underlying dominance structure than when a set of
 mixed good and poor candidate solutions were input into the algorithm.
 From this perspective, it would be expected that the quality of the solutions
 would cause variation in the reduced objective set. When applying the  -MOSS
 algorithm, the reduced set of objectives will be selected based on non-arbitrary,
 problem-dependent information. No preference information will be needed. A large
 number of candidate objectives can be used to canvas the decision space, or a smaller
 set could be used. Either way, the algorithm will isolate the underlying dominance
 structure of the problem domain and identify those performance metrics that are
 most needed to characterize the decision space. The objective reduction algorithm
 has proved itself an immensely valuable tool for identifying the critical information
 for various problem types and across di erent domains.
 184
6.3.5 Discussion of Complexity Reduction in Pareto Decision Space
 It has been demonstrated that the Pareto solution sets generated by utilizing
 this dissertation?s proposed methodology are approximately (if not exactly) equiva-
 lent to using a large, unreduced objective set. It has been analytically demonstrated
 that large, complex, multi-objective optimization problems can be generically, ob-
 jectively, and conclusively reduced by application of this methodology.
 How much does this methodology aid a mission designer?s decision making?
 There has been signi cant reduction in the complexity of the mission designer?s
 decision space. Where previously the mission designer had a design space detailed by
 82 unique teams compared across 15 performance metrics, this methodology reduces
 the design space to 41 unique teams compared across 9 performance metrics without
 introducing an error in the underlying problem. This is a substantial improvement
 for the mission designer. Including a  -error further reduces the designer?s decision
 space to consideration of 9 unique teams compared across 2 performance metrics.
 A further iteration on this experiment could apply a more complex and se-
 lective task allocation schema that would further di erentiate the schedules and,
 therefore, the 82 team options. It would be anticipated that this would further
 reduce the number of candidate team options represented by the Pareto fronts in
 all of the test cases, and further reduce the options that a mission designer must
 choose from.
 This experiment was able to signi cantly reduced the designer?s decision space.
 Additionally, the designer was analytically assured that the results from this reduced
 185
decision space were at least as good as a solution selected from the design space
 represented by the control?s solution set.
 6.4 Summary
 This experiment sought to demonstrate the methodology proposed by this
 dissertation on a large-scale HRT con guration selection problem. The other exper-
 iments from this dissertation had suggested that the application of the methodology
 to this design space would facilitate signi cant reduction in the complexity of the
 over-constrained, multi-objective optimization problem, and that it would provide
 the rigorous objective reasoning to down-select from a large set of performance met-
 rics to the few signi cant ones for analysis.
 82 unique teams were proposed as candidate solutions and were compared
 across 15 performance metrics. For a mission designer, this is a very large decision
 space to consider. In the research described in Chapter 2, previous mission designers
 would have arbitrarily selected a small set of performance metrics to use in an overall
 team performance analysis. The lack of rigor in this type of analysis meant that
 comparison of results between di erent researchers, platforms, and team pairings
 was virtually impossible.
 Applying the  -MOSS objective reduction algorithm to this problem imme-
 diately (and rigorously) reduced the performance metric set from 15 down to 2
 performance metrics. Although there was some variability in the reduced set de-
 pending on the tolerable error in the underlying problem structure, this signi cantly
 186
reduced the decision maker?s design space. This by itself was a signi cant result.
 This research?s methodology, however, took the analysis one step further to assess
 the Pareto solution sets that resulted from the reduced objective sets. It was this
  nal analysis step that reduced the 82 unique teams down to the 9 best teams.
 The methodology proposed in this dissertation research has been applied
 to three di erent application realms and has proved bene cial in all three. This
 methodology has wide utility in reducing the complexity of large-scale over-constrained,
 multi-objective optimization problems. It would be anticipated that the methodol-
 ogy will provide rigorous analysis on many future applications.
 187
Chapter 7
 Conclusions and Future Work
 The diversity of robotic technology available creates a multitude of new oppor-
 tunities in task performance. Utilizing the new capabilities for hardware, software,
 sensors and system integration, and communication architecture could lead to a
 greater level of mission diversity, and facilitate cooperative human and robotic team
 scenarios that had previously been impossible.
 Future mission designers will have a large heterogeneous group of distinct
 agents (both human and robotic) from which to select the most productive or e -
 cient team members. Team members can be used in various combinations to better
 utilize their capabilities and skills to create more e cient and diversi ed operational
 teams. This involves allocating tasks to provide the most bene t from the partner-
 ship, and creating the planning, scheduling, and software interfaces to support these
 e orts. An overall, objective team performance analysis would be bene cial to facili-
 tate decisions in this design space. It would enable quantitative comparison between
 disparate teams and allow a designer to select the most e ective agents to complete
 a series of tasks.
 It is this  nal question that this research sought to answer. A methodology
 was developed to facilitate performance comparison amongst heterogeneous human
 and robot teams. This methodology made no assumptions about mission priorities,
 188
preferences, nor importance of performance criteria. Instead, it provided an objec-
 tive, generic, quantitative method to reduce the complexity of the mission designer?s
 decision space.
 This type of overall mission analysis to di erentiate team con gurations could
 be an inordinately valuable tool for mission designers. The challenge then becomes
 creating a quantitative, overall model to measure a team?s performance for a generic
 mission, to enable broad use of the analysis tool (and remove the need for mission-
 speci c models). To facilitate comparison between di erent analyses, being able to
 specify an objective set of criteria would be immensely valuable.
 It is this last point that was the motivation for this dissertation research.
 A generic, objective methodology was developed to aid in determining a pseudo-
 optimal HRT con guration for a mission scenario. The HRT con guration selection
 problem was utilized as the application that motivated the problem and was used to
 assess the quality of solutions after the problem had been solved. The methodology,
 however, provided a much broader search of the design space between performance
 metrics and optimization models, facilitating evaluation of the design options.
 7.1 Discussion of Important Results
 This dissertation?s proposed methodology was applied to three very di erent
 applications to demonstrate its utility and diversity, and to investigate how the ad-
 ditional dimensions of the large-scale HRT con guration selection problem alter the
 anticipated results. Using the conceptually simpler knapsack problem domain, the
 189
utility of this dissertation?s methodology was demonstrated. The reduced objective
 set for the  = 40% test case from the 15 objective knapsack problem instance (which
 had 11 fewer objective functions than the control case) resulted in a Pareto solution
 set that was only 3% worse than the control according to the hypervolume quality
 metrics. That result is substantial. A drastic decrease in problem complexity was
 a orded at only a 3% decrease in solution quality. Considering the ease with which
 a mission designer could examine the resulting 4-objective reduced set for this test
 case, this research?s methodology proved to be very bene cial in this application.
 In the robotic reconnaissance case study, it was demonstrated that Rodriguez?s
 [74] composite task score method for performance analysis was highly dependent
 on the performance metrics selected, and did not provide the objective analytical
 results that had been desired in the research. It proved insu cient for the rigorous
 heterogeneous performance analysis conducted as the  rst part of this case study.
 To remedy this, the  -MOSS objective reduction algorithm was utilized to
 reduce the performance metrics proposed by Tunstel [96] for the MERs to a subset
 that maintained the underlying dominance structure of the problem. In other words,
 the reduced set of objectives resulted in no change to the problem details and the
  nal solution set. With zero error in the dominance structure, the six performance
 metrics used in this simpli ed analysis were reduced to two con icting performance
 metrics. Four of the six performance metrics contained only redundant information
 and could be removed from analysis without a ecting the problem and its solutions.
 Furthermore, in the worst case from the test problems, the control Pareto
 set dominated just under 1% of the  Pareto front. In other words, by using the
 190
 Pareto set for the 10%-error test case instead of the control Pareto set, a 1%
 decrease in the quality of the solution set resulted. This decrease in quality, however,
 a orded a decrease in the number of performance metrics from 6 down to 2 metrics.
 Additionally, this slight decrease in solution quality allowed the reduction from four
 candidate rovers down to two candidate rovers: the K10 red and Spirit.
 The large HRT con guration selection demonstration problem experienced
 signi cant reduction in the complexity of the mission designer?s decision space by
 application of this research?s methodology. Where previously the mission designer
 had a design space detailed by 82 unique teams compared across 15 performance
 metrics, the methodology proposed in this dissertation reduced the design space
 to 9 unique teams compared across 2 performance metrics. This is a substantial
 improvement for the mission designer.
 A further extension of this experiment could apply a more complex and se-
 lective task allocation schema that would further di erentiate the schedules and,
 therefore, the 40 team options. It would be anticipated that this would further
 reduce the number of candidate team options represented by the Pareto front, and
 further reduce the options that a mission designer must choose from.
 The Pareto solution quality results from three of the test problems in the large
 HRT problem were impressive. An exact value of zero was calculated for the relative
 volume coverage both comparing control to the  -Pareto set, and  -Pareto compared
 to the control. There were no portions of the Pareto front that were covered by one
 set and not by the other. This provides the analytical support for the assertion that
 the Pareto sets resulting from the reduced objective sets not only provide a good
 191
approximation of the control Pareto set (in general), but (in this case) provided the
 exact same coverage of the Pareto front. Zero error would be introduced if a mission
 designer chose to use the Pareto set generated by the 2-objective test cases instead
 of the 15 objective control case.
 In the representative best-performing teams, there were fundamentally only
 three di erent types of robotic agents: a Robonaut-like agent with the skill set of a
 human (or only one primitive o from this robot), an astronaut-assist robot capable
 of site survey, independent mobility, and carrying a load between desired locations,
 and an intelligent camera (in some instances it had independent mobility, in other
 cases it required another agent to place it). The teams in the  nal Pareto set con-
 tained combinations of these robots performing at di erent speeds to supplement
 the standard two human crew. It was the robotic agents that could substantially
 o set the human crew?s workload that were most useful to the overall team perfor-
 mance. Candidate teams that had these sparser skilled robotic agents did not use
 them su ciently to o set the cost of including the agent in the team. However, it
 is intriguing to note that fully equipped robots were not necessary for this balance
 to be obtained.
 In the  nal Pareto set, there is a very clear trade-o between robot skill level
 and robot speed. The more equipped robotic agents frequently were designated
 to operate at a slower speed than their human counterparts. This is the reason
 that some team combinations only existed in the  nal basis with slower versions of
 the robots and why their faster versions did not make the basis. This correlation
 between robotic agent skill level, task performance speed, and overall cost represents
 192
an intriguing guiding principle for future design of robotic agents and of optimized
 human and robot teams.
 The methodology proposed by this dissertation does not prove its worth across
 the board. One very clear disadvantage to the methodology is the increase in com-
 putation time required to reach a Pareto set for a given design problem (in the worst
 case large HRT problem, the methodology required four times as much computa-
 tional time as the control experiment). If computation time is a limiting factor for a
 set of analysis, using this dissertation?s proposed methodology would be a hindrance
 to the analysis e ort.
 However, if computation time is not an issue, the merit of this research?s pro-
 posed methodology has been clearly demonstrated. If a mission designer is not
 concerned with producing results as fast as possible, the Pareto solution sets result-
 ing from this methodology have been demonstrated to have been at least equivalent,
 if not better, than the control case for each of the three application domains. The ob-
 jective reduction algorithm has proved itself to be a powerful tool to reduce problem
 complexity by focusing the analysis on relative performance relations rather than on
 relative numbers. It is anticipated to be a valuable tool in larger human-robot team
 con guration selection problems. It has been analytically demonstrated that large,
 complex, multi-objective optimization problems can be generically, objectively, and
 conclusively reduced by application of this methodology.
 193
7.2 Conclusions
 This dissertation?s research has provided a valuable tool for performance anal-
 ysis of large, heterogeneous, human and robotic teams. A quantitative, generic, and
 objective methodology has been developed and demonstrated that will facilitate the
 type of analysis that must be conducted to enable comparison and selection of teams
 and individuals, and to  nd the most productive and e cient team members. This
 overall team performance analysis would be bene cial and would enable quantitative
 comparison between disparate teams for future mission designers.
 This research developed a novel methodology to facilitate performance com-
 parison amongst heterogeneous human and robot teams. This methodology made
 no assumptions about mission priorities, preferences, nor importance of performance
 criteria. Instead, it provided an objective, generic, quantitative method to reduce
 the complexity of the mission designer?s decision space.
 Given a set of agents and given a diverse set of potential metrics, this method-
 ology allows a designer to rigorously choose a team con guration and task allocation
 within the team to satisfy one or more metrics in an optimal manner. A designer
 will not have to make a priori decisions about which are the critical metrics, nor
 which agents provide more utility to overall team performance. The underlying
 dominance structure of the complex problem will be assessed by the  -MOSS objec-
 tive reduction algorithm, and decisions about redundancy and similarity would be
 made based on these pairwise relations.
 This type of methodology has not been done in the research  eld, and repre-
 194
sents a valuable, unique contribution. The signi cant, unique components of this
 dissertation have been demonstrated:
  Decomposition of the HRT con guration selection problem into three distinct
 realms, decoupling their analysis: classi cation of agent and mission details,
 planning and scheduling, and selection of metrics and teams.
  Select performance metrics to preserve underlying dominance structure of a
 problem - this is a new approach for how to choose performance metrics.
  Demonstration of a method to compute overall performance evaluation that
 does not rely on aggregating multiple performance metrics into a single per-
 formance function.
  Create a generic, rigorous, objective, quantitative methodology for the HRT
 domain to both select the performance metrics to be used in an overall team
 performance analysis and to reduce the complexity of a mission designer?s
  nal decision space. This type of methodology is novel for the domain and a
 valuable contribution to further the use of human and robot teams.
 It should be reiterated that the synthesis of the four state-of-the-art algorithms
 does not represent the contribution of this dissertation. This dissertation?s contribu-
 tion was facilitated by the synthesis of these algorithms. The unique contribution of
 this research was to create a generic, rigorous, quantitative, objective methodology
 to aid in the selection of performance metrics for overall team performance analysis
 and to reduce the complexity of a mission designer?s  nal decision space.
 195
Using the three components of the HRT con guration selection problem anal-
 ogy described in section 1.2, this research lies on the plane de ned by the perfor-
 mance metrics and optimization techniques. This allowed the substitution of the
 knapsack problem for the HRT con guration problem because the application do-
 main was used to feed data sets into metrics.
 In other words, the solution technique was not limited to a speci c application
 domain. This methodology can be used for a much wider range of applications. Hav-
 ing a generic, objective methodology like this will be a necessary step to advancing
 the goal of using cooperative human and robot teams in future space activities.
 7.3 Future Work
 There are several elements of this research that could be improved with fu-
 ture work. Most notable has been the signi cant computation time required for the
  -MOSS algorithm. If the algorithm could be made more e cient by algorithmic
 improvement of code implementation to reduce the time required to run the algo-
 rithm on a large-scale problem, the method would be more useful and practical for
 a wider range of applications.
 It would be interesting in future work to implement Brockho ?s exact  -MOSS
 algorithm rather than the heuristic greedy approach used in this research. The
 implemented greedy approximation algorithm does not guarantee that the mini-
 mum objective set will be found - only a minimal set. The reason that the greedy
 algorithm had been applied was because Brockho had described that the exact
 196
algorithm failed with large complex problems [12]. If the large-scale HRT demon-
 stration problem could be run with the exact algorithm, an analysis of the di erent
 reduced objective sets could be very illuminating for the underlying nature and re-
 lations between the performance metrics. In addition, an exact algorithm would
 be guaranteed to  nd the  -minimum set. This would increase the uniqueness and
 reproducibility of the reduced objective sets.
 As evidenced by several of the stochastic runs on each of the experiments,
 Matlab?s multi-objective genetic algorithms toolbox struggled to reach convergence
 on some of the problems. The Matlab tool was utilized in this research both for
 its wide distribution (enabling other researchers to use it), and also for its ease
 of implementation. The interplay between the user de ned  tness function and
 the MOGA itself could be made more e cient, or a real-time objective reduction
 method could be implemented to wheedle down the number of objectives during the
 MOGA runs themselves. Both of these options could improve the convergence of
 the MOGA.
 Alternatively, an entirely di erent implementation of a MOGA could be uti-
 lized in future work. There are many described in the literature that were intention-
 ally designed for MaOO design problems and would likely have a better convergence
 success for these types of problems. Integrating one of those into this research?s
 methodology would improve the results even more.
 As explained in the literature review portion of this dissertation, the primary
 limiting reason for not applying other researchers? performance metric categorization
 methods has been their lack of quantitative support. The research methodology out-
 197
lined in this dissertation could provide the platform from which these categorization
 methods could be rigorously tested. This could provide the quantitative evidence
 needed to justify the use of the categorization methods in performance analysis
 research.
 To facilitate use in other HRT problems and scenarios, this methodology could
 be better compartmentalized in future work to facilitate distribution and use. Es-
 sentially, this would involve turning it into a black box tool with very obvious input
 variables and structure. This could facilitate a wider range of performance metrics
 testing, and a wider range of tool application.
 A future iteration on the  nal HRT demonstration problem could experiment
 with including a larger number of robotic agents in the team con gurations. The
 analysis presented in Chapter 6 limited these teams to two robotic agents. It would
 be intriguing to measure how teams with multiple independent robotic agents of
 varying skill capabilities a ected the overall performance of the teams.
 Future mission designers will have a large heterogeneous group of distinct
 agents (both human and robotic) from which to select the most productive or e -
 cient team members. The proposed methodology represents a novel solution tech-
 nique with a wide range of applications. Having a generic, objective methodology
 like this will be a necessary step to advancing the goal of using cooperative human
 and robot teams in future space activities.
 198
Appendix A
 Previous Cooperative HRT Research
 A.1 Preliminary Dissertation Research and Results
 Over the last few years, I have done substantial research into characterizing
 HRT interaction that addressed several of the challenges described in the introduc-
 tion. That research and its results will be summarized in this section, emphasizing
 which of the challenges have been researched and which are yet to be addressed.
 A.1.1 Methodology to Assess HRT Task Performance
 My previous research has sought to characterize how agents in HRTs a ect
 each others? task performance, and how di erent skill sets in uence not only the
 schedules, but wait time and involvement time for all of the agents. This work
 led to the development of a methodology [89, 88] to facilitate scheduling a human-
 robot team according to di erent mission preferences. The methodology takes into
 consideration real world constraints and precedent constraints to produce pseudo-
 optimal schedules for a cooperative human and robotic crew.
 The journal paper [90] extended this task allocation and scheduling methodol-
 ogy to assess the impact of the robotic agent?s task performance on the cooperative
 team?s schedule. To e ectively allocate and schedule mission tasks for a cooperative
 human and robotic team relies on an accurate characterization of the performance
 199
abilities and speeds of the disparate crew.
 In terms of addressing the challenges to comparing HRT con gurations, this
 methodology provides a guideline for comparing HRTs before the mission design
 phase and facilitates quantitative comparison based on crew time as the primary
 criterion.
 A.1.1.1 Case Study: Hubble Space Telescope Servicing Mission 3A
 The Hubble Space Telescope Servicing Mission 3A (HST SM-3A) was used as a
 case study in [89, 88, 90] to demonstrate the scheduling advantages of a cooperative
 human and robotic team in a space mission application. The  ight plans from past
 Hubble Space Telescope servicing missions (HST SMs) provide a detailed test bed
 from which to examine the e ect of various activities and crew performances in
 space operations. HST was designed with access panels to allow repair of some of
 the components of the telescope. Some of the servicing tasks require a  ne level
 of dexterity, including manipulation of tethers and electrical connectors. Most of
 the tasks can be broken down into primitives that require single degree of freedom
 motion and utilization of a single tool [68]. The latter set of tasks suggest the
 improvements and extensions to human performance in space by utilizing robots as
 cooperative team members who could perform these more repetitive tasks, freeing
 the humans to work on other tasks.
 The six hour nominal EVA day (including daily setup and closeout of HST
 worksites) is the primary constraint in the number of tasks that can be performed
 200
during a servicing mission. Any attempt to reschedule task order is complicated by
 the need to maintain the sequential constraints between some of the subtasks (e.g.,
 a team member cannot open a door before its securing bolts have been unscrewed)
 while both minimizing overall human involvement time and minimizing the time
 one agent has to wait for another.
 A.1.1.2 Methodology Assumptions: Relevant Attributes and Criteria
 The primary criteria used to assess the value of a cooperative team over the
 standard two-human crew was the reduction in active human time (representing a
 specialization of the human crew?s activities) and the time that each human crew
 member needed to wait for the robotic agent to perform its subtasks (a check on
 the e ciency of the schedule).
 The time that the human crew wait for the robot to complete a precedence
 constrained subtask is the di erence between the human involvement time and hu-
 man active time within a task. This di erence represents the time that the human
 crew was present on-site but not involved in a subtask. It is the human crew?s wait
 time that proved to be the critical factor in determining when it was acceptable to
 intersperse robot tasks with those of the human crew.
 To reduce the dependence of the scheduling model on a speci c set of robotic
 technologies, a  ight rule was established for the analysis in [89] that if the robot was
 active in the same workspace as a human, it would only be doing tasks in support of
 and that relate directly to the tasks the human was performing (including passing the
 201
human hardware), rather than independently performing tasks. A HST  ight rule
 had also been extended to enable the robotic agent to continue servicing activities
 outside of the human EVA day, when the human crew was no longer on-site. Both
 of these  ight rules continued their relevance to the task performance research and
 were incorporated into these analyses.
 A.1.1.3 Characterizing a Robot?s Role in a Servicing Team
 Similar to the design of Robonaut, the generic robot used in these analyses
 was assumed to be capable of con guring its own tools, to have all end e ectors
 necessary to complete the jobs assigned to it, and to be capable of using EVA
 hardware interfaces (like handholds) and tools. The robot was assumed capable
 of all force/torque application primitives (including use of EVA tools), hardware
 hando and translation, and visual inspection primitives (through either feedback
 to a controller or autonomous recognition capabilities). It was further assumed to
 have at least two independently controllable dexterous arms with approximately the
 same capabilities and reach as a human for sizing around the tasks.
 Although the characterization of the robot?s mobility and positioning system
 will be dependent on the speci c robot con guration selected, for the purposes
 of these studies it was assumed that the robot?s positioning mechanism would be
 independent of the human crew?s positioning systems. The robot was assumed to
 be designed with a positioning arm in addition to its assumed dexterous arms, as
 proposed in [73]. This independence would allow the robot to be added to the
 202
human crew?s task performance rather than occupying one of the required human
 positioning aids.
 A.1.1.4 Relative Robot Task Performance
 The task performance research sought to analyze the role of the robot perfor-
 mance parameter (RPP) in the development of a cooperative team schedule. The
 RPP represents the relative amount of time that it takes a robot to complete a task
 with respect to its human analogue. Using the relative performance ratio concept
 from Rodriguez [74], a value greater than one indicates the robot?s performance is
 slower in magnitude than a human?s by the RPP?s value. This information can be
 used to determine a performance bound for the selected robotic agent within its
 operational capabilities that would re ect the e ect that each individual task speed
 has on the overall team schedule. A robot does not perform every task as quickly
 as possible, so this level of analysis would seek to match the selected robot?s capa-
 bilities with the optimal performance required to have it perform and contribute to
 the development of optimal task performance for the collaborative mission.
 Additionally, analysis of the RPP could be used before design selection to in u-
 ence the rate of task execution, and skill requirements. This would feed preferences
 derived from an initial scheduling analysis about the role and added bene t of the
 robotic agent as a crew member on the cooperative team into the design decision.
 The RPP analysis examined the extent that precedent constraints between
 subtasks and primitives create intervals of wait time for the human crew of a coop-
 203
erative team. These wait time intervals can only be in uenced by agent performance
 speed or reallocation of tasks to di erent agents. In general, if the wait time is longer
 than the human performance time of the given subtask, the subtask should be real-
 located to the human crew.
 A.1.1.5 Role of the Third Agent
 It is intuitive that adding a third agent to the servicing team would create
 schedule bene ts by reducing the workload of the original two-human crew through
 parallel execution of servicing tasks. The role and level of contribution of this third
 agent depends on the proportion of mission tasks that the third agent has the capa-
 bilities to perform, the interdependence of this subset of tasks to those performed
 by the other crew members, and the rate of task performance and completion of the
 subset.
 In the speci c case of an RPP equal to 1, the third agent can be considered
 as a third human on the crew with restricted task capabilities. To isolate the e ect
 of the RPP on the three agent schedules developed, the  ight rules that limited
 simultaneous task diversity within a cooperative workspace still applied. In partic-
 ular, the third human was restricted to working on the same task as the other two
 humans if all three were within the same workspace. A new baseline was established
 to compare the RPP-varying schedules with a timeline that had been scheduled for
 a restricted three-human crew.
 The RPP research facilitates the ability to more accurately predict how much
 204
the human crew?s wait time could be reduced by various robotic con gurations.
 Analysis of the e ect of the RPP on the cooperative schedule provides framework
 for using human time window constraints to back out the required robot performance
 time to maintain an e cient cooperative schedule. This in turn can be used to set
 robot design requirements.
 A.1.2 Results from the HRT Scheduling Research
 The e ciency of the human and robot team in the cooperative activities,
 as de ned by Steinfeld et.al. [93], can be measured by the time required for the
 interaction to be completed. This quantitative metric can be used to compare the
 contribution of each type of robotic agent to the overall team.
 After the subtask-level task allocation analysis, it was noted in [90] that 22%
 of the HST SM3A mission subtasks needed to be performed by the human crew.
 For the task allocation strategy speci ed, this was the minimum solution for EVA
 time for the mission task pro le. The robotic agent de ned for this paper was able
 to aid the human crew in 78% of the mission subtasks. 31% of the mission tasks
 were reallocated entirely to the robot, freeing the human from involvement in the
 given task.
 One of the most signi cant results from [89] was that all of the tasks that had
 been performed during HST SM-3A could be completed in half the active EVA time
 than during the NASA mission by including the generic robot in the crew. Had a
 cooperative human and robotic team been employed on the mission as de ned in
 205
this paper, twice the volume of tasks could have been accomplished during the EVA
 time of HST SM-3A.
 In [88], the impact of three categories of robotic agents, based on their de ned
 role on an operational team, was examined. A result of the analysis was that involv-
 ing a robotic agent (any of the three categories described) as a third team member
 to aid in performing the EVA tasks reduced the required human participation in
 the mission from between 40-60%, with the position in this range depending on the
 functional capabilities of the robot. With the goal of minimizing human involve-
 ment time in the mission while maintaining all constraints, this demonstrates that
 employing a robotic agent will improve the e ciency of the team over and above
 the expected contributions of an additional human.
 This provides motivation for comparing di erent HRT con gurations and their
 mission performance possibilities in the early stages of mission design. Had the
 three-agent crew as de ned in [90] been utilized, a much larger quantity of tasks
 could have been performed during the mission.
 A.1.3 Analysis of Previous Scheduling Work
 In my previous work, I treated the HRT scheduling problem as an over-
 constrained, multi-objective optimization problem. The analyses generated signif-
 icant proof-of-concept results. As a contribution to the  eld, [89], [88], and [90]
 present a persuasive argument and supporting evidence that future space opera-
 tions would bene t from involving cooperative robotic team agents in addition to
 206
the human crew. The three papers are based on the same assumptions and the
 same framing of the HRT problem, allowing direct comparisons between the results
 in each paper.
 These papers de ne and assess how robots and humans can work together
 cooperatively to complete tasks, and address several critical issues about their com-
 bined performance. A small subset of time-based task attributes were selected as
 the primary criterion for comparing di erent team performance. This selection was
 guided by consideration of team e ciency from a task load model perspective. The
 methodology guides the application of these metrics through developing a task al-
 location scheme and a corresponding schedule.
 A.1.3.1 Methdology?s Quantitative Model
 The model described by this methodology incorporated aspects of a task load
 model. Following a standard task load model, individual agent task completion
 time was used to gage the e ort required for each agent to complete each task. Task
 times were summed over each EVA day to facilitate using a pairwise comparison of
 relative time involvement as a common metric between team con gurations.
 The task allocation scheme used in this methodology was intended to reduce
 human active time by as much as possible. This metric represented freeing the
 humans of mundane tasks easily performed by a robot. The new free time would
 allow humans to specialize their e orts for more complex tasks.
 The methodology developed in my previous work, though generic to HRT
 207
scheduling problems, represents another scenario speci c HRT con guration prob-
 lem. A human-centered approach was used, such that the primary objective was to
 reduce active human time during each EVA and a secondary objective was to reduce
 overall human involvement in each EVA by reordering tasks to group the active in-
 tervals. Several other objective functions could have been identi ed for this analysis
 - minimize traversal time needed to retrieve tools or visit worksites (application of
 the traveling salesman problem to increase e ciency of movement), minimize human
 mental workload, etc. Incorporating all of the objectives that could possibly apply
 to the HRT scheduling of HST SM-3A would have created an intractable problem.
 The objectives used in the analyses were subjectively selected from the larger
 set of applicable objectives to assess one small portion of the overall HRT problem.
 This set of reduced objectives does not necessarily fully (if at all) describe other
 applications or mission scenarios. As discussed by Parasuraman (section 3.7 in
 [65]), the price of creating quantitative models of the parameters and properties of
 a HRT is a loss in generality of the resulting model.
 The same thing can be said for the constraints used in the analyses. Sequen-
 tial constraints between tasks were maintained and constraints that restricted the
 robot?s movement in the human workspace were included to preserve a realistic mis-
 sion scenario. These are also only a small subset of the constraints that could be
 applied to the larger problem. There are very speci c constraints about the amount
 of time each instrument in the HST can be exposed to space for environmental ex-
 posure (limits task time). Power as a resource is not an unlimited quantity - any
 robot will run o the power of its supporting structure (in this case, the Space Shut-
 208
tle), and its power draw for the tasks assigned in these analyses could be entirely
 infeasible.
 A.1.4 Challenges Yet to be Addressed
 Fully describing every aspect of HRT servicing would yield an intractable prob-
 lem. To facilitate analysis, assumptions and simpli cations must be made to the
 model to reduce the number of variables. Importance ratings could be applied be-
 tween the multitude of objective functions. Each of these subtly changes the design
 space and shapes the resulting solution set, potentially into widely di erent regions
 of the design space. Other equally valid selection choices for the important design
 parameters, objectives, and constraints could have been made. Without using a pri-
 ori information about mission priorities, there are an in nite number of subjective
 combinations of reduced-complexity problem descriptions.
 The question that remains about my previous research is whether there is an
 objective quantitative method to determine if a neglected parameter would have
 produced signi cantly di erent results. Is it possible to objectively reduce the prob-
 lem design space to a core of important parameters? With an in nite number of
 ways to reduce the complexity of a problem, how could a designer objectively deter-
 mine which is an optimal HRT con guration for any given mission? These are the
 questions that this dissertation research seeks to answer.
 209
Appendix B
 Large HRT MOGA Implementation Details
 B.1 Fitness Function Inputs
 Several databases were created prior to running the simulations of this experi-
 ment. The 40 unique team combinations (de ned solely by the task primitives each
 of the four possible agents were capable of performing) were sent through the task
 allocation scheme. The result was a task performance schedule for each of the 82
 team combinations that speci ed the order of task performance and which agents
 were required for each of the subtasks. Each of these team combinations and their
 corresponding schedules were evaluated across all of the performance metrics spec-
 i ed in section 6.1.3. All of this information (agent task primitive capabilities for
 each team, team schedules, and team performance data) was stored in a MATLAB
 data  le, to be used as a reference database for the experiment simulation.
 It was noted that although there were 82 unique team combinations, there
 were not 82 unique schedules generated for the database. Due to the simpli ed task
 allocation scheme used in this research, several of the teams combinations resulted
 in the same task allocation and performance schedule. This can be seen in multiple
 ways. First, if two distinct teams perform the schedule in the same way, then any
 additional capabilities or skill sets of one of the teams over the other are unused
 capabilities. The more equipped and advanced robotic agents from these scenarios
 210
did not result in schedule or overall team performance improvements. These robotic
 agents were under-utilized in the schedules. In a further iteration of this research,
 a function could be written which would add this overlap of schedules from dis-
 parate teams into an additional performance metric to gage resource utilization and
 specialization.
 B.2 Fitness Function Outline
 The following is a discussion of the functions that were written to evaluate
 each population generation and return a  tness value for the large HRT application.
 Each step in the  tness function can be seen in Figure B.1, with the primary details
 of each step identi ed. When the MOGA begins, Matlab passes an initial population
 (the user can specify the size of the population during the MOGA call) of design
 solutions to the  tness function for evaluation. There were four variables, each
 bounded to be decimals between zero and one.
 The user-de ned  tness function initializes by loading the database of input
 problem information. This includes the 40 unique team combinations, their corre-
 sponding task allocation and schedules, and the performance metric data for each
 of the teams. This information must be provided up front.
 The  tness function takes the population generated by Matlab?s MOGA and
 passes it to the function dv decimal2int.m, which converts the design variables to
 integers. This discretizes the design variables such that the bin size is equal to the
 inverse of the number of possible agents. There were ten di erent bins for the large
 211
HRT con guration selection problem: the four agents working independently (agent
 1, agent 2, agent 3, and agent 4), and each of the possible pairings of agents required
 for two-agent tasks (agents 1 and 2, agents 1 and 3, agents 1 and 4, agents 2 and 3,
 agents 2 and 4, and agents 3 and 4). Each of the design variables was discretized
 into one of these 10 bins. The output of this function was the list of agents (or pairs
 of agents) required and speci ed in each of the design variable combinations of the
 generation.
 It should be noted that the design variables generated for testing by the MOGA
 were not guaranteed to be feasible solutions. In other words, the MOGA had the
 ability to generate lists of agents (or empty lists) that were unable to perform the
 entire task list. Subsequent functions in the  tness function were designed to test
 for this case and assess the feasibility of each team combination. At this point in
 the code, however, there was no di erentiation.
 The function sch agentcombos.m performed the same analysis on the schedules
 in the database to identify which agents were required for each possible schedule.
 Combomatch kta.m sought to match the list of agents from the design variables
 to those required in each of the database?s schedules. This function identi ed the
 corresponding schedule options for each list of agents in the design variables (i.e.
 this function takes the list of agents speci ed by a design point and searches for a
 task schedule that only requires these agents). A schedule was a match if the exact
 same list of agents was required in both. A schedule that only required a portion of
 the design point?s list of agents was not a match.
 The signi cance of this schedule match should not be under-estimated. For the
 212
Figure B.1: Detailed Diagram of the MOGA Fitness Function Call
 given list of participating agents, the schedules represented which agents performed
 each task. If there were multiple schedule options for a given agent list, then there
 were multiple task allocation schemes (or multiple options of which agent performs
 each task) available. At this point in the analysis, it was not possible to select
 one task allocation schedule from multiple options. Instead, this function and each
 subsequent function (hence the \kTA" su x in the function names) kept track of the
 multiple schedule options for a given design variable. It was only at the end of the
  tness function that a  nal selection was made to determine which task allocation
 schedule was best for the indicated agent list.
 Once the schedules for each design variable were identi ed, objeval kta.m eval-
 uated each design point for all objective functions. If there was no corresponding
 213
schedule for an agent list (no schedules in the database required the speci ed list of
 agents), then a  ller-value was placed in the performance data for each of the design
 variables for ease of reference in a later function. This gave the initial evaluation
 amongst the design points.
 The constraints, however, had not been considered, so the feasibility of the
 solutions was still unknown. A solution that violated constraints was infeasible, and
 while some of these solutions yielded rewarding objective function values, they were
 not viable solutions to the problem. The  tness value for each design point must
 incorporate the feasibility of the point, or the constraints have not been properly
 applied.
 The function constrainthandling Woldesenbet kta.m performed the feasibility
 analysis on each of the design points. The multi-objective, adaptive constraint han-
 dling technique described by Woldesenbet [101] was implemented in this function
 (described in more detail in section 3.2. All constraints were coded into this func-
 tion. The function evaluated the constraints at each of the design points, and then
 used Woldesenbet?s algorithm for MOGA constraint integration to create a modi ed
  tness value for each design point. This modi ed  tness value was calculated from
 the objective function values, with a penalty applied for each violated constraint.
 The weighting of the importance of constraint violation in the  tness evaluation
 depended on the number of feasible solutions in each generation. In generations
 with few feasible solutions, violated constraints represented a smaller penalty to
 enable information from these solutions to be fed into the diversity of a subsequent
 generation.
 214
The last function called by the  tness function determined which schedule to
 use in evaluating a design point if there were multiple available (multiple task al-
 location options available). It called the continuous update domination algorithm
 (discussed in section 3.2) to select the dominating point between the schedule op-
 tions. The best schedule?s  tness value was then selected to be the modi ed  tness
 for the design point. All of the modi ed  tness values from the constraint handling
 algorithm were returned to the MOGA to be used in selecting the next generation
 of design points.
 B.3 Fitness Function Output
 In practice, it was discovered that several of the schedule options for a given
 agent list represented non-dominated solutions. Although they corresponded to dif-
 ferent objective function values for each of the performance metrics, these solutions
 did not dominate each other (as determined by the continuous update domination
 algorithm). To re ect this, a modi cation was made to the evaluate kta.m function
 to return the list of teams that represent equally valid and equivalently performing
 solutions. This information was not utilized within the MOGA itself, but was used
 in post-simulation processing. For the MOGA itself, one of the feasible, equally-
 valid solutions was selected for the modi ed  tness values to enable the MOGA to
 continue churning a new generation.
 215
B.4 Stochastic MOGA Usage
 When running the MOGA for this set of simulations, the built-in MOGA
 control options were altered. The MOGA options used previously had been intended
 to run through 100 generations and then stop the search process. The changes made
 for this experiment was to have the MOGA run through its entire search process and
 come up with its best Pareto front for the given problem. The MOGA was set to
 run for 400 generations in each of the problem instances, maintaining a population
 of 200 individuals, until either the maximum generation counter was reached or the
 average spread in the Pareto front was on the order of 1e-06 (tolerance to determine
 that the Pareto front has converged).
 A single run through a genetic algorithm might not have searched the entire
 design space. It is possible that the algorithm could become stuck in a local minima,
 resulting in non-reproducible results and sub-optimal solutions. To avoid this type
 of narrow-search complication, a stochastic run wrapper was designed to run the
 genetic algorithm multiple times, keeping track of the non-dominated solutions from
 all of the runs (see Figure 3.5 and section 3.2.2). The wrapper function has the
 ability to call the multi-objective algorithm any number of user-speci ed times (10
 is recommended as a general rule-of-thumb for statistical relevance). This stochastic
 solution set better describes the design space. Over the iterative runs, a larger
 swath of the design space will be searched, and the algorithm has a better chance
 of converging on a steady state solution set.
 As discussed in section 3.2.2, an algorithm is called from the stochastic run
 216
wrapper to assess if any of the solutions from each run of the MOGA dominate each
 other. The non-domination algorithm implemented is the continuous update method
 from Deb [22], but has been expanded in this research to accommodate multiple
 objectives during the domination evaluation. The continuous update method is
 a faster computational method that does not check all solutions before deciding
 domination. Instead, it keeps a running list of non-dominated solutions. When an
 item in this list is dominated, it is removed from the list. When a new design point
 is demonstrated to be non-dominated by any other solution in the data set, then
 that point is added to the non-dominated list. After each stochastic run, the new
 Pareto set is compared with the existing non-dominated set, and the overall set of
 non-dominated points results.
 The  nal set of non-dominated points across all of the stochastic runs rep-
 resented the  nal Pareto front solution to the large HRT con guration selection
 problem. To identify which of the 82 unique teams were best suited to the mission,
 a  nal run through the  tness function was conducted with the  nal set of non-
 dominated points. The design variables were discretized to re ect the agent list.
 The uniqueness of these lists was then sought. Unsurprisingly, there was substantial
 overlap amongst the discretized agent lists. A unique subset was down-selected, and
 it was these design points that were funneled through the rest of the analysis. Each
 was assessed for schedules and performance metric values.
 Rather than using the  nal modi ed  tness value for each of the design points,
 however, the evaluate kta.m function was used to return the list of each equivalently
 performing team for each of the design points. The  nal return from this function
 217
was a list of the best equivalently performing teams from the 40 unique possible
 teams.
 218
Appendix C
 Large HRT Con guration Selection Problem - Initial Details and
 Results
 An initial instantiation of the large-scale HRT con guration selection problem
 was developed and run prior to the experiment described in Chapter 6. The Chapter
 6 experiment was designed to yield more meaningful results than the  rst run of the
 experiment had. The details of that experiment and its results are contained in this
 Appendix chapter. Only those experimental details that were di erent from those
 described in Chapter 6 are herein related for brevity.
 C.1 Initial Setup: Hubble Space Telescope Servicing Mission 3A
 For this experiment (as with the experiment discussed in Chapter 6), the mis-
 sion pro les from the Hubble Space Telescope Servicing Mission 3A were selected to
 form the background of this performance analysis. HST Servicing Mission 3A (HST
 SM-3A) provides a useful platform to analyze role de nition of each of the mission
 agents, in uences of task allocation schemes, the resulting cooperative schedules,
 and the overall performance of the combined human and robot servicing team.
 The  ight plans from past Hubble Space Telescope servicing missions (HST
 SMs) (as represented in the HST SM-3A EVA Checklists [100]) provide a detailed
 data set from which to examine the e ect of various activities and crew performances
 219
on space operations. HST was designed with access panels to allow repair of the
 components of the telescope. Some of the servicing tasks required a  ne level of
 dexterity, including manipulation of tethers and electrical connectors.
 C.1.1 Task Primitive Data Preparation
 The actual rate of task performance for a speci ed robotic agent will vary
 for di erent categories of tasks, and between iterations of the same task. As with
 humans, a robot might not perform a task at the exact same rate on two di erent at-
 tempts. There will be competing priorities in task performance including power and
 energy resource utilization, measurement errors and overshoot correction, di erences
 in environmental factors including lighting, situational awareness, and maintaining
 safe operating conditions in a crowded workspace. All of these factors will a ect the
 robot?s actual task performance speed and will vary between tasks.
 The subtask completion time data used in this analysis is identical to that
 originally anticipated from the HST SM-3A mission. Rather than add uncertainty
 by modifying this time data to de ne whether robotic agents perform subtasks
 faster or slower than a human, this same time data was used for the robot subtask
 completion times in this experiment.
 The robot time data values used in this analysis provide an initial guide to
 enable scheduling analysis for a cooperative human and robotic team. These values
 could be updated to re ect a speci c robotic system design if desired by future users.
 220
C.1.2 Team Characterization and Scheduling
 Forty unique teams were generated for this experiment. Each team had at
 most four agents, two human and two robotic. Each of the agents in these teams
 were de ned based on which of the nine task primitive types they were capable
 of performing. The teams were speci ed with a matrix for each, with the rows
 corresponding to the four possible agents, and each column representing a task
 primitive type. A value of one in the matrix meant that the speci ed agent could
 perform the task primitive type. A value of zero meant that the agent did not have
 the skill set or capacity to perform the task primitive type.
 C.1.3 Performance Metrics
 The same performance metrics were used for this experiment and for the ex-
 periment described in Chapter 6. However, the rating values used in this version of
 the experiment were solid integers and the 20% random variable was not included.
 These alterations a ected the performance metric data for the following metrics:
  Human inactive time
  Total human involvement time
  Total mission duration time
  Total robot involvement time
  Robot inactive time
  Human physical workload
 221
 Human mental workload
  Autonomous navigation speed
  Approachability
  Autonomous traverse distance
 In addition to the alterations listed above, the expected value rating scale on
 the three survey performance metrics was increased for the experiment described
 in Chapter 6. For this  rst version of the experiment, the autonomous navigation
 speed rating scale was as follows:
  If both of the potential robotic agents in a team pair were capable of traversal,
 the autonomous navigation speed rating of the team pair was given a value of
 1.
  If either of the two potential robotic agents in a team pair were capable of
 traversal, the autonomous navigation speed rating of the team pair was given
 a value of 5.
  If either of the two potential robotic agents in a team were present and active (if
 either was capable of at least 1 task primitive type) but both were incapable of
 independent movement, the autonomous navigation speed was given a rating
 value of 15.
  If there were no robots in a given team combination, then the human pair were
 given a rating value of 1, equivalent to the fully functional robotic reference
 system.
 222
Similarly, the autonomous traverse distance performance metric had a similar
 valued rating scale, such that:
  The reference system for this performance metric was two fully functioning
 robotic agents, capable of both traversal and visual inspection. A rating value
 of 1 was given to this type of system.
  If only one robot was capable of traversal and visual inspection, this robot
 earned a rating of 1, equivalent to the fully functioning candidate system.
  If only one robot was capable of traversal and visual inspection but both
 were capable of traversal, a rating value of 5 was earned to denote the added
 workload required by the one robot to provide visualization for both robotic
 agents.
  If only one robot was capable of traversal, but this robot was not capable of
 visual inspection, the team earned a rating value of 5.
  If either robot was present and active (if either was capable of at least one task
 primitive type) but both were incapable of traversal and visual inspection, the
 team earned a rating value of 15.
  As with the previous metric, if there were no robots active for a candidate
 team (i.e. a human-only team), an autonomous traverse distance rating value
 of 1 was used.
 The approachability performance metric was scaled in the same fashion:
 223
 If a robotic system was capable of each of these three task primitive types, it
 earned an approachability rating value of 1.
  If a robotic system was capable of traversal and visual inspection but did not
 carry an on board tool, it earned a rating value of 3. While this might seem
 a generous rating, it was anticipated that the robotic agent would  nd and
 locate targets. A human agent would then arrive at the speci ed target and
 use the instrument.
  If neither robot was capable of traversal, visual inspection, or carrying a tool,
 an approachability rating value of 10 was used.
  As with the previous metric, if there were no robots active for a candidate
 team, an approachability rating value of 1 was used indicating equivalent per-
 formance to the robotic reference system.
 C.2 Results
 This experiment followed the analysis steps laid out in the methodology dis-
 cussed in Chapter 3. The reduced objective sets were used to run the MOGA 10
 times for statistical relevance. Each run sought to reach convergence of the Pareto
 set. The output from all of the runs were collated and sent a  nal iteration through
 the continuous update algorithm to verify that they represented a set of nondomi-
 nated solutions. The results from this  nal analysis can be seen in Figure C.2.
 Figure C.2 displays each of the 40 unique teams that were feasible solutions for
 224
Figure C.1: Reduced Objective Sets for the First Instantiation of the Large HRT
 Demonstration Problem in Each of Five Test Cases
 the large HRT demonstration problem on the x-axis. On the y-axis are the  ve test
 cases that were run (vertically o set for visibility only - there is no signi cance to
 height on the y-axis). From Figure C.2 it is apparent that the Pareto sets of solutions
 resulting from the  -MOSS analysis not only resulted in good approximations of the
 control Pareto set (resulting from all 15 objectives), but in several of the test cases,
 they were the exact same results.
 This result is the  nal justi cation that was sought from running the large HRT
 demonstration problem - application of the methodology proposed in this disserta-
 tion provided the rigorous performance analysis that had been missing from previ-
 ous research in the literature. The  -MOSS algorithm provided objective, analytical
 reasoning for selecting some performance metrics and deeming others redundant.
 Rather than running a comprehensive performance analysis with 15 objectives, a
 researcher could objectively and conclusively run the performance analysis with 3
 objectives at any of the given error-tolerances and achieve the same results.
 225
Figure C.2: Teams Selected for Large HRT Demonstration Problem in Each of Five
 Test Cases
 Figure C.3 demonstrates the convergence success for each of the test problems
 across each of the 10 stochastic runs through the MOGA. The control case (with 15
 objectives) converged in only 7 out of 10 of the runs. This means that 30% of the
 runs failed to  nd the Pareto front - the MOGA had too many competing objectives.
 This was approximately the convergence rate of the MOGA during the 15 objective
 knapsack test problem instance.
 The convergence results from the  = 0% test case were more concerning. Only
 30% of the runs reached convergence. This suggests that the team analysis of this
 test case should be taken with a grain of salt - the results from this test case do
 not represent a  nal Pareto front. Even with only 3 objectives, the MOGA failed
 to converge an unacceptable percentage of the time. However, the opposite results
 226
Figure C.3: Termination Reason and Run Time for Large HRT Demonstration
 Problem
 227
were seen in the  = 10%, 20%, and 40% test cases. Each represents three objective
 functions run to complete convergence within the MOGA. All three test cases had
 100% convergence rate.
 A combination of the results from Figure C.3 and the results of Figure C.1 yield
 the result that the reduced performance metric set for this instantiation of the large
 HRT demonstration problem should contain either total mission performance time,
 human inactive time, and human physical workload, or human physical workload,
 total EVA involvement time (length of the EVA day), and total robot involvement
 time. Either of these two performance metric sets resulted in the same Pareto set
 of solutions and both resulted in 100% convergence of the MOGA.
 As had been the case with the knapsack test problem instances from Chapter
 4, the computational time required for the  -MOSS algorithm was the only disad-
 vantage of this method. As seen in Figure C.3, the  -MOSS algorithm required more
 than 7 hours to run for each of the test problems. In comparison, the control test
 problem?s MOGA required less than an hour to complete. As had been concluded
 from the knapsack test problems, if computational time is a limiting factor for future
 performance analysis, this methodology would not be advantageous.
 However, if computational time is not an issue for a priori analysis, a signi -
 cant reduction in problem complexity can be achieved by utilizing the methodology
 proposed in this dissertation. A  nal analysis on how good the Pareto sets resulting
 from each of these  ve test cases was performed, utilizing the hypervolume quality
 metric (see section 3.2.4). These results are tabulated in Figure C.4.
 There are several important trends to note from the data represented in Figure
 228
Figure C.4: Hypervolume Indicator Quality Metric for Each Test Case of Large
 HRT Demonstration Problem
 C.4. Recalling that the S quality metric re ects absolute volume coverage of the
 Pareto front, it can be seen for each of the four test problems that the reduced
 objective set resulted in a Pareto front with nearly identical volume coverage. Each
 Pareto front generated from this analysis represented remarkably good coverage
 across all of the objectives.
 Recall that the D quality metric was the relative volume coverage used to
 compare two di erent Pareto sets. In this large HRT problem, the control was
 compared across each of the four test cases with di erent  -error values. In the  
= 0% test case, the Pareto volume coverage of the control set was approximately
 5% better than the Pareto volume coverage resulting from the reduced objective
 set. This result is consistent with those seen in section 3.2.4 for the knapsack test
 229
problem.
 However, the results from the other three test problems in the large HRT
 problem are impressive. An exact value of zero was calculated for the relative volume
 coverage both comparing control to the  -Pareto set, and  -Pareto compared to the
 control. There were no portions of the Pareto front that were covered by one set
 and not by the other. This provides the analytical support for the assertion that
 the Pareto sets resulting from the reduced objective sets not only provide a good
 approximation of the control Pareto set (in general), but (in this case) provided the
 exact same coverage of the Pareto front. Zero error would be introduced if a mission
 designer chose to use the Pareto set generated by the 3-objective test cases instead
 of the 15-objective control case.
 C.2.1 Analysis of Bias in Solution Selection
 Application of this research?s methodology greatly reduced the large HRT
 con guration selection problem from a 15-objective across 40 teams decision down
 to a 3-objective across 22 teams decision. Several questions arose from the previous
 analysis. Were the 22 solution teams (winners) that made the  nal cut uniquely
 quali ed and higher performing than the other 18 teams (losers) or was there some
 level of biasing in the  nal solution set due to a majority vote type of situation?
 To further examine the relations between the solutions, an additional run
 through the methodology was arranged for the 18 teams that were pruned out of
 the  nal solution set. One team from the winning set was added to be representative
 230
of the winning solutions. The large HRT con guration selection problem was run
 again with only these 19 candidate teams as solution options across the full 15
 objective functions.
 The  rst step in the methodology process, however, yielded unexpected re-
 sults. An initial run through the MOGA for 100 generations had been expected to
 provide an initial population of good-ranking teams to input into the  -MOSS ob-
 jective reduction algorithm. However, after 100 generations (iterating through the
 15 objective functions and 19 candidate solutions) the MOGA returned a singular
 best team result. The MOGA identi ed the 1 winner candidate solution as the best
 overall from the 19 candidate solutions.
 Several di erent winner candidate solutions were used in the 19 candidate
 solutions (only 1 per iteration) to determine if the MOGA would be enticed to
 select one of the other 18 solutions. On every iteration, however, the one winner
 team con guration was selected, regardless of which of the winner teams was used.
 This result reinforces the reproducibility of the  nal con guration selection
 results. When all chances of majority vote team biasing had been removed from the
 candidate teams, the one winner candidate solution stood out as the best overall
 con guration for the given HRT con guration selection problem. The MOGA anal-
 ysis was run 10 times to verify this result. Every stochastic iteration returned that
 one candidate winner solution as the best solution.
 Even though a best team had been found for this additional analysis, it was
 still desirable to assess the objective reduction results from this smaller loser-team
 population. The initial 100 generation MOGA population contained only versions of
 231
that one winner candidate solution, so the objective function values themselves for
 the 19 candidate teams were used as input to the  -MOSS algorithm. The objective
 values were scaled (as had been done in previous iterations of the  -MOSS algorithm)
 such that the di erence between the largest and smallest objective function value
 would represent a  -error equal to 1.
 The results from the objective reduction algorithm were intriguing. With zero
  -error allowed in the underlying dominance structure, the 15 performance metrics
 were reduced down to 5 (F 0 = fF2, F13, F1, F12, F15g), ordered by importance to
 the underlying dominance structure) to fully represent these 19 candidate teams.
 When a 5%  -error was tolerated, the reduced performance metric set was reduced
 to only 2 objectives (F 0 = fF1, F3g). As the  -error value was increased, this reduced
 objective set held constant and was unchanged both in terms of objective ordering
 and makeup of the reduced set. A  -error value of 35.5% reduced the objective
 set to a single performance metric (F 0 = fF1g). All higher error tolerance levels
 maintained this one performance metric in the reduced set.
 Without the winner candidate teams in the set, the objective reduction algo-
 rithm was only able to reduce the objective set size to 5 objectives rather than the
 3 it had found in the full analysis. Referring back to Figure C.1, it is interesting to
 note that three of the objectives chosen in this last  -MOSS algorithm iteration were
 never selected by the algorithm in the full set. The objectives that di erentiated the
 loser candidate solutions from each other were irrelevant for the winner candidate
 teams. Part of the reason for this was that the loser candidate teams had similar
 performance across many of the objectives, to an extent that those objectives were
 232
no longer valuable in di erentiating the solutions.
 Including a  -error of 5%, however, reduced the objective set to F 0 = fF1,
 F3g. Both of these objectives were deemed highly valuable to de ning the underlying
 dominance structure of the original large HRT con guration selection problem. Both
 of these objectives turned up in several of the  -error cases depicted in Figure C.1.
 Metric fF1g re ects a preference for minimizing total mission duration time, and
 fF3g sought to minimize overall human physical workload. It can be inferred that
 the primary di erences between the loser candidate teams re ected the usage of the
 human crew as a resource on the team. Each of these teams had at least one robotic
 crew member, but the robotic crew were all tasked in the same way.
 A  nal question that should be asked in this analysis is what this variation in
 the reduced objective set components say about the reproducibility of the reduced
 objective set for selecting performance metrics. In the case where two performance
 metrics contain equivalent problem data and represent the same pairwise domi-
 nance relations, the  rst encountered in the  -MOSS iterations will be selected for
 placement into the reduced objective set. In the next iteration, therefore, all of
 the relations contained in the second equivalent performance metric will already be
 represented in the basis. This second performance metric will not be selected to be
 included in the basis. While the initial input ordering of performance metrics will
 obviously have an a ect on those selected for the reduced set, the pairwise domi-
 nance relations represented by the  nal reduced set will remain the same. In other
 words, a variation in the performance metrics used in an overall analysis, if varied
 by the  -MOSS algorithm, will not have an a ect on the resulting Pareto solution
 233
sets.
 In a sense, all of the problem detail can be represented by these candidate
 solutions. The algorithm determines which objective functions provide the most in-
 formation for the given candidate solutions. When a set of poor candidate solutions
 were input into the  -MOSS algorithm, a di erent set of objective functions was
 deemed necessary to re ect the underlying dominance structure than when a set of
 mixed good and poor candidate solutions were input into the algorithm.
 From this perspective, it would be expected that the quality of the solutions
 would cause variation in the reduced objective set. When applying the  -MOSS
 algorithm, the reduced set of objectives will be selected based on non-arbitrary,
 problem-dependent information. No preference information will be needed. A large
 number of candidate objectives can be used to canvas the decision space, or a smaller
 set could be used. Either way, the algorithm will isolate the underlying dominance
 structure of the problem domain and identify those performance metrics that are
 most needed to characterize the decision space. The objective reduction algorithm
 has proved itself an immensely valuable tool for identifying the critical information
 for various problem types and across di erent domains.
 C.2.2 Pareto Decision Space Analysis
 It has been demonstrated that the Pareto solution sets generated by utilizing
 this dissertation?s proposed methodology are approximately (if not exactly) equiva-
 lent to using a large, unreduced objective set. It has been analytically demonstrated
 234
that large, complex, multi-objective optimization problems can be generically, ob-
 jectively, and conclusively reduced by application of this methodology.
 How much does this methodology aid a mission designer?s decision making?
 There has been signi cant reduction in the complexity of the mission designer?s
 decision space. Where previously the mission designer had a design space detailed by
 40 unique teams compared across 15 performance metrics, this methodology reduces
 the design space to 22 unique teams compared across 3 performance metrics. This
 is a substantial improvement for the mission designer.
 It could be argued that 22 unique teams is still too many for a mission designer
 to need to compare simultaneously. This number, however, re ects shortcomings of
 the input data rather than the method itself. Although there were 40 unique teams
 input into the beginning of this analysis, it was noted that this did not result in 40
 unique team schedules. The task allocation schema used in this analysis to generate
 the schedules was very simplistic, resulting in several of the unique teams generating
 identical schedules.
 A further iteration on this experiment could apply a more complex and se-
 lective task allocation schema that would further di erentiate the schedules and,
 therefore, the 40 team options. It would be anticipated that this would further
 reduce the number of candidate team options represented by the Pareto front, and
 further reduce the options that a mission designer must choose from.
 This experiment was able to signi cantly reduced the designer?s decision space.
 Additionally, the designer was analytically assured that the results from this reduced
 decision space were at least as good as a solution selected from the design space
 235
represented by the control?s solution set.
 C.3 Summary
 This experiment sought to demonstrate the methodology proposed by this
 dissertation on a large-scale HRT con guration selection problem. The other exper-
 iments from this dissertation had suggested that the application of the methodology
 to this design space would facilitate signi cant reduction in the complexity of the
 over-constrained, multi-objective optimization problem, and that it would provide
 the rigorous objective reasoning to down-select from a large set of performance met-
 rics to the few signi cant ones for analysis.
 40 unique teams were proposed as candidate solutions and were compared
 across 15 performance metrics. For a mission designer, this is a very large decision
 space to consider. In the research described in Chapter 2, previous mission designers
 would have arbitrarily selected a small set of performance metrics to use in an overall
 team performance analysis. The lack of rigor in this type of analysis meant that
 comparison of results between di erent researchers, platforms, and team pairings
 was virtually impossible.
 Applying the  -MOSS objective reduction algorithm to this problem imme-
 diately (and rigorously) reduced the performance metric set from 15 down to 3
 performance metrics. Although there was some variability in the reduced set de-
 pending on the tolerable error in the underlying problem structure, this signi cantly
 reduced the decision maker?s design space. This by itself was a signi cant result.
 236
This research?s methodology, however, took the analysis one step further to assess
 the Pareto solution sets that resulted from the reduced objective sets. It was this
  nal analysis step that reduced the 40 unique teams down to the 22 best teams.
 The methodology proposed in this dissertation research has been applied
 to three di erent application realms and has proved bene cial in all three. This
 methodology has wide utility in reducing the complexity of large-scale over-constrained,
 multi-objective optimization problems. It would be anticipated that the methodol-
 ogy will provide rigorous analysis on many future applications.
 237
Bibliography
 [1] H. Aguirre and K. Tanaka. Adaptive epsilon ranking on many objective prob-
 lems. Evol. Intel., 2(4):183{206, 2009.
 [2] D.L. Akin, S. Jacobs, and D. Gruntz. Investigations into several approaches
 to eva-robot integration. In 37th International Conference on Environmental
 Systems, number SAE 2007-01-3232. SAE International, 2007.
 [3] D.L. Akin, B. Roberts, K. Pilotte, and M. Baker. Robotic augmentation of eva
 for hubble space telescope servicing. In AIAA Space Conference Exposition,
 number AIAA-2003-6274. AIAA, 2003.
 [4] J.A. Arnold. Towards a framework for architecting heterogeneous teams of
 humans and robots for space exploration. Master?s thesis, Massachusetts In-
 stitute of Technology, 2006.
 [5] J. Bader and E. Zitzler. A hypervolume-based optimizer for high-dimensional
 objective spaces. New Developments in Multiple Objective and Goal Program-
 ming, pages 35{54, 2010.
 [6] A. Bechar, Y. Edan, and J. Meyer. Optimal collaboration in human-robot
 target recognition systems. In Systems, Man and Cybernetics, 2006. SMC?06.
 IEEE International Conference on, volume 5, pages 4243{4248. IEEE, 2007.
 [7] A. Bechar, J. Meyer, and Y. Edan. An objective function to evaluate per-
 formance of human-robot collaboration in target recognition tasks. Systems,
 Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions
 on, 39(6):611{620, 2009.
 [8] L. Billman and M. Steinberg. Human system performance metrics for evalu-
 ation of mixed-initiative heterogeneous autonomous systems. In Proceedings
 of the 2007 Workshop on Performance Metrics for Intelligent Systems, pages
 120{126. ACM, August 2007.
 [9] P. Bobko, P. Roth, and M. Buster. the usefulness of unit weights in creat-
 ing composite scores: a literature review, application to content validity, and
 meta-analysis. Organizational Research Methods, 10(4):689{709, 2007.
 [10] D. Brockho , T. Friedrich, N. Hebbinghaus, C. Klein, F. Neumann, and E. Zit-
 zler. Do additional objectives make a problem harder? In Proceedings of the
 9th Annual Conference on Genetic and Evolutionary Computation, pages 765{
 772. ACM, 2007.
 [11] D. Brockho , D.K. Saxena, K. Deb, and E. Zitzler. On handling a large
 number of objectives a posteriori and during optimization. Multiobjective
 Problem Solving from Nature, pages 377{403, 2008.
 238
[12] D. Brockho and E. Zitzler. Objective reduction in evolutionary multiob-
 jective optimization: Theory and applications. Evolutionary Computation,
 17(2):135{166, 2009.
 [13] D.J. Bruemmer, D.A. Few, R.L. Boring, J.L. Marble, M.C. Walton, and C.W.
 Nielsen. Shared understanding for collaborative control. IEEE Transactions
 on Systems, Man, and Cybernetics - Part A: Systems and Humans, 35(4):494{
 504, July 2005.
 [14] J.L. Burke, R.R. Murphy, D.R. Riddle, and T. Fincannon. Task performance
 metrics in human-robot interaction: Taking a systems approach. Performance
 Metrics for Intelligent Systems, 2004.
 [15] J.W. Crandall and M.L. Cummings. Developing performance metrics for the
 supervisory control of multiple robots. In Proceedings of the ACM/IEEE Inter-
 national Conference on Human-Robot Interaction, pages 33{40. ACM, 2007.
 [16] J.W. Crandall and ML Cummings. A predictive model for human-unmanned
 vehicles systems. Technical Report HAL2008-05, Massachusetts Institute of
 Technology, 2008.
 [17] J.W. Crandall, M.A. Goodrich, D.R. Olsen, and C.W. Nielsen. Validating
 human-robot interaction schemes in multitasking environments. IEEE Trans-
 actions on Systems, Man, and Cybernetics - Part A: Systems and Humans,
 35(4):438{449, July 2005.
 [18] J.W. Crandall, C.W. Nielsen, and M.A. Goodrich. Towards predicting robot
 team performance. In Systems, Man and Cybernetics, 2003 IEEE Interna-
 tional Conference on, volume 1, pages 906{911. IEEE, 2003.
 [19] ML Cummings, P. Pina, and J.W. Crandall. A metric taxonomy for supervi-
 sory control of unmanned vehicles. In AUVSI, 2008.
 [20] K. Dautenhahn and I. Werry. A quantitative technique for analysing robot-
 human interactions. In Proceedings of the 2002 IEEE/RSJ International Con-
 ference on Intelligent Robots and Systems, 2002.
 [21] E. De Visser, R. Parasuraman, A. Freedy, E. Freedy, and G. Weltman. A
 comprehensive methodology for assessing human-robot team performance for
 use in training and simulation. In Human Factors and Ergonomics Society
 Annual Meeting Proceedings, volume 50, pages 2639{2643. Human Factors
 and Ergonomics Society, 2006.
 [22] K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. John
 Wiley and Sons, 2001.
 [23] K. Deb. Multiobjective Optimization using Evolutionary Algorithms. Wiley,
 2008.
 239
[24] B. Donmez, P. Pina, and M.L. Cummings. Evaluation criteria for human-
 automation performance metrics. In Proceedings of the Performance Metrics
 for Intelligent Systems Workshop, 2008.
 [25] M.R. Elara, C.A.A. Calderon, C. Zhou, and W.S. Wijesoma. False alarm
 demand: A new metric for measuring robot performance in human robot
 teams. In Autonomous Robots and Agents, 2009. ICARA 2009. 4th Interna-
 tional Conference on, pages 436{441. IEEE, 2009.
 [26] A. Elfes, C.R. Weisbin, H. Hua, J.H. Smith, J. Mrozinski, and K. Shelton. The
 huron task allocation and scheduling system: Planning human and robot ac-
 tivities for lunar missions. In Automation Congress, 2008. WAC 2008. World,
 pages 1{8. IEEE, 2008.
 [27] C.A. Ellis, S.J. Gibbs, and G. Rein. Groupware: some issues and experiences.
 Communications of the ACM, 34(1):39{58, 1991.
 [28] M. Farina and P. Amato. A fuzzy de nition of optimality for many-criteria
 optimization problems. Systems, Man and Cybernetics, Part A: Systems and
 Humans, IEEE Transactions on, 34(3):315{326, 2004.
 [29] T. Fong, A. Abercromby, M.G. Bualat, M.C. Deans, K.V. Hodges, J.M. Hur-
 tado, R. Landis, P. Lee, and D. Schreckenghost. Assessment of robotic recon
 for human exploration on the moon. Acta Astronautica, 67(9-10):1176{1188,
 November{December 2010.
 [30] T. Fong, M. Bualat, M. Deans, M. Allan, X. Bouyssounouse, M. Broxton,
 L. Edwards, R. Elphic, L. Fl uckiger, J. Frank, et al. Field testing of utility
 robots for lunar surface operations. In AIAA Space. Citeseer, 2008.
 [31] T. Fong, I. Nourbakhsh, C. Kunz, L. Fluckiger, J. Schreiner, R. Ambrose,
 R. Burridge, R. Simmons, L.M. Hiatt, A. Schultz, et al. The peer-to-peer
 human-robot interaction project. Space, 6750(AIAA 2005-6750), September
 2005.
 [32] T. Fong, J. Scholtz, J.A. Shah, L. Fluckiger, C. Kunz, D. Lees, J. Schreiner,
 M. Siegel, L.M. Hiatt, and I. Nourbakhsh. A preliminary study of peer-to-peer
 human-robot interaction. In Systems, Man and Cybernetics, 2006. SMC?06.
 IEEE International Conference on, volume 4, pages 3198{3203. IEEE, 2007.
 [33] A. Freedy, E. DeVisser, G. Weltman, and N. Coeyman. Measurement of trust
 in human-robot collaboration. In Collaborative Technologies and Systems,
 2007. CTS 2007. International Symposium on, pages 106{114. IEEE, 2008.
 [34] B.P. Gerkey and M.J. Mataric. A formal analysis and taxonomy of task al-
 location in multi-robot systems. International Journal of Robotics Research,
 23(9):939{954, 2004.
 240
[35] D.F. Glas, T. Kanda, H. Ishiguro, and N. Hagita. Simultaneous teleoperation
 of multiple social robots. In Proceedings of the 3rd ACM/IEEE International
 Conference on Human Robot Interaction, pages 311{318. ACM, 2008.
 [36] J. Glassmire, M. O?Malley, W. Bluethmann, and R. Ambrose. Cooperative
 manipulation between humans and teleoperated agents. In Proceedings of the
 IEEE International Symposium on Haptic Interfaces for Virtual Environment
 and Teleoperator Systems, 2004.
 [37] M. Goodrich and D. Olsen. Seven principles of e cient human robot interac-
 tion. In IEEE International Conference on Systems, Man, and Cybernetics,
 volume 4, pages 3943{3948, 2003.
 [38] M.A. Goodrich, T.W. McLain, J.D. Anderson, J. Sun, and J.W. Crandall.
 Managing autonomy in robot teams: observations from four experiments. In
 Proceedings of the ACM/IEEE International Conference on Human-Robot In-
 teraction, pages 25{32. ACM, March 2007.
 [39] S.G. Hart and L.E. Staveland. Development of nasa-tlx (task load index):
 Results of empirical and theoretical research. Human Mental Workload, 1:139{
 183, 1988.
 [40] F. Heger and S. Singh. Sliding autonomy for complex coordinated multi-robot
 tasks: analysis and experiments. In Proceedings of Robotics: Science and
 Systems, August 2006.
 [41] A. Howard. A synergistic approach for maximizing human-automation sys-
 tem performance. Draper  scal year 2006  nal report, Georgia Institute of
 Technology, July 2006.
 [42] A. Howard. A systematic approach to predict performance of human-
 automation systems. IEEE Transactions on Systems, Man, and Cybernetics -
 Part C: Applications and Reviews, 37(4):594{601, July 2007.
 [43] A. Howard and G. Cruz. Adapting human leadership approaches for role allo-
 cation in human-robot navigation scenarios. In Automation Congress, 2006.
 WAC?06. World, pages 1{8. IEEE, 2007.
 [44] E.J. Hughes. Fitness assignment methods for many-objective problems. Mul-
 tiobjective Optimization, pages 307{329, 2008.
 [45] J.H. Hwang, K.W. Lee, and D.S. Kwon. A formal method of measuring in-
 teractivity in hri. In Robot and Human interactive Communication, 2007.
 RO-MAN 2007. The 16th IEEE International Symposium on, pages 738{743.
 IEEE, 2007.
 [46] H. Ishibuchi, N. Tsukamoto, and Y. Nojima. Evolutionary many-objective
 optimization: A short review. In Evolutionary Computation, 2008. IEEE
 World Congress on Computational Intelligence, pages 2419{2426. IEEE, 2008.
 241
[47] A.L. Jaimes, C.A.C. Coello, H. Aguirre, and K. Tanaka. Adaptive objective
 space partitioning using con ict information for many-objective optimization.
 Evolutionary Multi-Criterion Optimization, 6576:151{165, 2011.
 [48] B. Kannan and L.E. Parker. Fault-tolerance based metrics for evaluating
 system performance in multi-robot teams. In Proceedings of Performance
 Metrics for Intelligent Systems Workshop, 2006.
 [49] T. Kaupp and A. Makarenko. Measuring human-robot team e ectiveness to
 determine an appropriate autonomy level. In Robotics and Automation, 2008.
 ICRA 2008. IEEE International Conference on, pages 2146{2151. IEEE, 2008.
 [50] T. Kaupp, A. Makarenko, and H. Durrant-Whyte. Human-robot communi-
 cation for collaborative decision making - a probabilistic approach. Robotics
 and Autonomous Systems, 58:444{456, 2010.
 [51] J. Keller. Human performance modeling for discrete-event simulation: human
 performance modeling for discrete-event simulation: workload. In Proceedings
 of the 34th Conference on Winter Simulation: Exploring New Frontiers, pages
 157{162. Winter Simulation Conference, 2002.
 [52] C. R. Kurtzman. Time and resource constrained scheduling, with applications
 to space station planning. PhD thesis, Massachusetts Institute of Technology,
 1988.
 [53] A. Lampe and R. Chatila. Performance measure for the evaluation of mobile
 robot autonomy. Proceedings of the 2006 IEEE International Conference on
 Robotics and Automation, May 2006.
 [54] A. L opez Jaimes, C. Coello, and J. Ur  as Barrientos. Online objective reduc-
 tion to deal with many-objective problems. In Evolutionary Multi-Criterion
 Optimization, pages 423{437. Springer, 2009.
 [55] I.S. MacKenzie. A note on the information-theoretic basis for  tts? law. Jour-
 nal of Motor Behavior, 1989.
 [56] G.A. Mann. Quantitative evaluation of human-robot options for maintenance
 tasks during analogue surface operations. In C. Pain, editor, Proceedings of
 the 8th Australian Mars Exploration Conference, pages 26{34, 2008.
 [57] J.L. Marble, D.J. Bruemmer, D.A. Few, and D.D. Dudenhoe er. Evaluation of
 supervisory vs. peer-peer interaction with human-robot teams. In Proceedings
 of the 37th Annual Hawaii International Conference on Systems Sciences,
 pages 4{8, 2004.
 [58] C.A. Miller and R. Parasuraman. Who?s in charge?: Intermediate levels of
 control for robots we can live with. Systems, Man, and Cybernetics, 2003.
 IEEE International Conference on, 1:462{467, 2003.
 242
[59] C. E. Nehme. Modeling Human Supervisory Control in Heterogeneous Un-
 manned Vehicle Systems. PhD thesis, Massachusetts Institute of Technology,
 February 2009.
 [60] U. Nehmzow. Quantitative analysis of robot-environment interaction{towards
 scienti c mobile robotics. Robotics and Autonomous Systems, 44:55{68, 2003.
 [61] D.R. Olsen and M.A. Goodrich. Metrics for evaluating human-robot interac-
 tions. In Proceedings of PERMIS, 2003.
 [62] D.R. Olsen, B. Wood, and J. Turner. Metrics for human driving of multiple
 robots. In International Conference on Robotics and Automation. IEEE, 2004.
 [63] D.R. Olsen and S.B. Wood. Fan-out: measuring human control of multiple
 robots. In Proceedings of the Special Interest Group on Computer Human
 Interaction Conference on Human Factors in Computing Systems, pages 231{
 238. ACM, 2004.
 [64] Y. Oren. Performance analysis of human-robot cooperation in target recogni-
 tion tasks. Master?s thesis, Ben-Gurion University of the Negev, Aug. 2008.
 [65] R. Parasuraman. Designing automation for human use: empirical studies and
 quantitative models. Ergonomics, 43(7):931{951, July 2000.
 [66] R. Parasuraman, T.B. Sheridan, and C.D. Wickens. A model for types and
 levels of human interaction with automation. IEEE Transactions on Systems,
 Man, and Cybernetics - Part A: Systems and Humans, 30(3):286{297, May
 2000.
 [67] J.C. Parrish, D.L. Akin, and G.G. Gefke. The ranger telerobotic shuttle ex-
 periment: Implications for operational eva/robotic cooperation. In 30th Inter-
 national Conference on Environmental Systems, number 2000-01-2329. SAE
 International, 2000.
 [68] K. Pilotte. Analysis of grasp requirements for telerobotic satellite servicing.
 Master?s thesis, University of Maryland, 2004.
 [69] P. Pina, ML Cummings, J.W. Crandall, and M.D. Penna. Identifying gener-
 alizable metric classes to evaluate human-robot teams. In 3rd Annual Con-
 ference on Human-Robot Interaction: Metrics for Human-Robot Interaction
 Workshop, 2008.
 [70] P. Pina, B. Donmez, and M.L. Cummings. Selecting metrics to evaluate hu-
 man supervisory control applications. HAL Report HAL2008-04, Humans and
 Automation Laboratory, Massachusetts Institute of Technology, 2008.
 [71] S.S. Ponda, H.L. Choi, and J.P. How. Predictive planning for heterogeneous
 human-robot teams. AIAA Infotech@ Aerospace, 2010.
 243
[72] M. Prewett, R. Johnson, K. Saboe, L. Elliott, and M. Coovert. Managing
 workload in human-robot interaction: a review of empirical studies. Comput-
 ers in Human Behavior, 26(5):840{856, September 2010.
 [73] F. Rehnmark, R. Ambrose, and M. Goza. The challenges of extra-vehicular
 robotic locomotion aboard orbiting spacecraft. In Proceedings of the IEEE
 International Conference on Robotics and Automation. IEEE, 2004.
 [74] G. Rodriguez and C.R. Weisbin. A new method to evaluate human-robot
 system performance. Autonomous Robots, 14(2-3):165{178, 2003.
 [75] F. Rohrmuller, O. Kourakos, M. Rambow, D. Brscic, D. Wollherr, S. Hirche,
 and M. Buss. Interconnected performance optimization in complex robotic
 systems. In IEEE/RSJ International Conference on Intelligent Robots and
 Systems, pages 4113{4118. IEEE, 2010.
 [76] J.A. Saleh and F. Karray. Towards generalized performance metrics for
 human-robot interaction. In Autonomous and Intelligent Systems, 2010 In-
 ternational Conference on, pages 1{6. IEEE, 2010.
 [77] H. Sato, H. Aguirre, and K. Tanaka. Controlling dominance area of solutions
 and its impact on the performance of moeas. In Evolutionary Multi-Criterion
 Optimization, pages 5{20. Springer, 2007.
 [78] D.K. Saxena and K. Deb. Dimensionality reduction of objectives and con-
 straints in multi-objective optimization problems: A system design perspec-
 tive. In Evolutionary Computation, 2008. CEC 2008.(IEEE World Congress
 on Computational Intelligence). IEEE Congress on, pages 3204{3211. IEEE,
 2008.
 [79] D.K. Saxena, T. Ray, K. Deb, and A. Tiwari. Constrained many-objective
 optimization: a way forward. In Evolutionary Computation, 2009. IEEE
 Congress on, pages 545{552. IEEE, 2009.
 [80] J. Scholtz. Theory and evaluation of human robot interactions. In Proceedings
 of the 36th Hawaii International Conference on System Sciences, 2002.
 [81] J. Scholtz, B. Antonishek, and J.D. Young. Implementation of a situation
 awareness assessment tool for evaluation of human-robot interfaces. IEEE
 Transactions on Systems, Man, and Cybernetics - Part A: Systems and Hu-
 mans, 35(4):450{459, July 2005.
 [82] D. Schreckenghost, T. Fong, T. Milam, E. Pacis, and H. Utz. Real-time
 assessment of robot performance during remote exploration operations. In
 Aerospace conference, 2009 IEEE, pages 1{13. IEEE, 2009.
 [83] D. Schreckenghost, T. Fong, H. Utz, and T. Milam. Measuring robot perfor-
 mance in real-time for nasa robotic reconnaissance operations. In Proceedings
 of NIST PERMIS Workshop, 2009.
 244
[84] D. Schreckenghost, T. Milam, and T. Fong. Ai space odyssey: Measuring per-
 formance in real time during remote human-robot operations with adjustable
 autonomy. IEEE Intelligent Systems, 25(5):36{44, 2010.
 [85] D. Schreckenghost, T. Milam, and T. Fong. Measuring performance in real
 time during remote human-robot operations with adjustable autonomy. IEEE
 Intelligent Systems, 25(5):36{45, 2010.
 [86] J.A. Shah, J.H. Saleh, and J.A. Ho man. Review and synthesis of consider-
 ations in architecting heterogeneous teams of humans and robots for optimal
 space exploration. IEEE Transactions on Systems, Man, and Cybernetics -
 Part C: Applications and Reviews, 37(5):779{, September 2007.
 [87] J.A. Shah, J.H. Saleh, and J.A. Ho man. Analytical basis for evaluating the
 e ect of unplanned interventions on the e ectiveness of a human-robot system.
 Reliability Engineering and System Safety, 93(8):1280{1286, 2008.
 [88] S.M. Singer and D. Akin. Role de nition and task allocation for a cooperative
 EVA and robotic team. Number 2009-01-2529. SAE International, 2009.
 [89] S.M. Singer and D.L. Akin. Task scheduling for cooperative human/robotic
 space operations. In International Conference on Environmental Systems,
 number 2008-01-1985. SAE International, 2008.
 [90] S.M. Singer and D.L. Akin. Analyzing the e ect of robot task performance
 on scheduling a cooperative human and robotic team. Acta Astronautica,
 66(1):102{116, 2010.
 [91] S.M. Singer and D.L. Akin. A survey of quantitative team performance metrics
 for human-robot collaboration. In International Conference on Environmental
 Systems, number AIAA 2011-5248. AIAA, AIAA, 2011.
 [92] B. Stanton, B. Antonishek, and J. Scholtz. Development of an evaluation
 method for acceptable usability. In Proceedings of PERMIS, 2006.
 [93] A. Steinfeld, T. Fong, D. Kaber, M. Lewis, J. Scholtz, A. Schultz, and
 M. Goodrich. Common metrics for human-robot interaction. In Proceedings
 of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction,
 pages 33{40. Association for Computing Machinery, 2006.
 [94] I. Tikach, A. Bechar, and Y. Edan. Switching between collaboration levels in a
 human-robot target recognition system. IEEE Transactions on Systems, Man,
 and Cybernetics, Part C: Applications and Reviews, 41(6):955{967, 2011.
 [95] J.G. Trafton, N.L. Cassimastis, M.D. Bugajska, D.P. Brock, F.E. Mintz, and
 A. Schultz. Enabling e ective human-robot interaction using perspective-
 taking in robots. IEEE Transactions on Systems, Man, and Cybernetics -
 Part A: Systems and Humans, 35(4):460{470, July 2005.
 245
[96] E. Tunstel. Operational performance metrics for mars exploration rovers.
 Journal of Field Robotics, 24(8-9):651{670, 2007.
 [97] A. van Wissen, Y. Gal, BA Kamphorst, and MV Dignum. Humanagent team-
 work in dynamic environments. Computers in Human Behavior, 28(1):23 {
 33, 2012.
 [98] H. Wang, M. Lewis, P. Velagapudi, P. Scerri, and K. Sycara. How search
 and its subtasks scale in n robots. In Proceedings of the 4th ACM/IEEE
 International conference on human robot interaction, 2009.
 [99] J. Wang and M. Lewis. Assessing cooperation in human control of heteroge-
 neous robots. In Proceedings of the 3rd ACM/IEEE International Conference
 on Human Robot Interaction, March 2008.
 [100] D.J. Weigel, R.S. McDaniel, and J.V. Thornton. Eva checklist: Sts-103  ight
 supplement. Technical Report JSC-48024-103, NASA Johnson Space Center,
 1999.
 [101] Y.G. Woldesenbet, G.G. Yen, and B.G. Tessema. Constraint handling in
 multiobjective evolutionary optimization. IEEE Transactions on Evolutionary
 Computation, 13(3):514{525, June 2009.
 [102] R.E. Yagoda and M.D. Coovert. How to work and play with robots: an ap-
 proach to modeling human-robot interaction. Computers in Human Behavior,
 28(1):60{68, 2012.
 [103] H.A. Yanco and J.L. Drury. A taxonomy for human-robot interaction. In
 Proceedings of the AAAI Fall Symposium on Human-Robot Interaction, 2002.
 [104] H.A. Yanco, J.L. Drury, and J. Scholtz. Beyond usability evaluation: Anal-
 ysis of human-robot interaction at a major robotics competition. Human-
 Computer Interaction, 19(1):117{149, 2009.
 [105] E. Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods
 and Applications. PhD thesis, Swiss Federal Institute of Technology (ETH),
 1999.
 [106] E. Zitzler. Hypervolume metric calculation, 2001.
 [107] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algo-
 rithms - a comparative case study. In Parallel Problem Solving from Nature -
 PPSN-V, pages 292{301, 1998.
 [108] E. Zitzler and L. Thiele. multiobjective evolutionary algorithms: a compar-
 ative case study and the strength pareto approach. IEEE Transactions on
 Evolutionary Computation, 3(4):257{271, 1999.
 246
[109] X. Zou, Y. Chen, M. Liu, and L. Kang. A new evolutionary algorithm for solv-
 ing many-objective optimization problems. Systems, Man, and Cybernetics,
 Part B: Cybernetics, IEEE Transactions on, 38(5):1402{1412, 2008.
 247