ABSTRACT

Title of Dissertation: FAST FEASIBLE MOTION PLANNING
WITHOUT TWO-POINT
BOUNDARY VALUE SOLUTION

Sharan Nayak
Doctor of Philosophy, 2023

Dissertation Directed by: Professor Michael W. Otte
Department of Aerospace Engineering

Autonomous robotic systems have seen extensive deployment across domains such as man-

ufacturing, industrial inspection, transportation, and planetary surface exploration. A crucial

requirement for these systems is navigating from an initial to a final position, while avoiding

potential collisions with obstacles en route. This challenging task of devising collision-free tra-

jectories, formally termed as motion planning, is of prime importance in robotics. Traditional mo-

tion planning approaches encounter scalability challenges when planning in higher-dimensional

state-spaces. Morever, they rarely consider robot dynamics during the planning process. To ad-

dress these limitations, a class of probabilistic planning methods called Sampling-Based Motion

Planning (SBMP) has gained prominence. SBMP strategies exploit probabilistic techniques to

construct motion planning solutions.

In this dissertation, our focus turn towards feasible SBMP algorithms that prioritize rapidly

discovering solutions while considering robot kinematics and dynamics. These algorithms find


utility in quickly solving complex problems (e.g, Alpha puzzle) where obtaining any feasible

solution is considered as an achievement. Furthermore, they find practical use in computationally

constrained systems and in seeding time-consuming optimal solutions.

However, many existing feasible SBMP approaches assume the ability to find precise tra-

jectories that exactly connect two states in robot’s state space. This challenge is framed as the

Two-Point Boundary Value Problem (TPBVP). But finding closed-form solutions for the TPBVP

is difficult, and numerical approaches are computationally expensive and prone to precision and

stability issues. Given these complexities, this dissertation’s primary focus resides in the devel-

opment of SBMP algorithms for different scenarios where solving the TPBVP is challenging.

Our work addresses four distinct scenarios – two for single-agent systems and two for

multi-agent systems. The first single-agent scenario involves quickly finding a feasible path from

the start to goal state, using bidirectional search strategies for fast solution discovery. The sec-

ond scenario focuses on performing prompt motion replanning when a vehicle’s dynamical con-

straints change mid-mission. We leverage the heuristic information from the original search tree

constructed using the vehicle’s nominal dynamics to speed up the replanning process. Both these

two scenarios unfold in static environments with pre-known obstacles. Transitioning to multi-

agent systems, we address the feasible multi-robot motion planning problem where a robot team

is guided towards predefined targets in a partially-known environment. We employ a dynamic

roadmap updated from the current known environment to accelerate agent planning. Lastly, we

explore the problem of multi-robot exploration in a completely unknown environment applied

to the CADRE mission. We demonstrate how our proposed bidirectional search strategies can

facilitate efficient exploration for robots with significant dynamics. The effectiveness of our al-

gorithms is validated through extensive simulation and real-world experiments.


FAST FEASIBLE MOTION PLANNING WITHOUT TWO-POINT
BOUNDARY VALUE SOLUTION

by

Sharan Nayak

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor Michael Otte, Chair/Advisor
Professor Pratap Tokekar, Dean’s Representative
Professor Derek Paley
Professor Huan Xu
Professor Jeffrey Herrmann


© Copyright by
Sharan Nayak

2023


Acknowledgments

Firstly, I would like to express my sincerest gratitude to my advisor, Dr. Michael Otte,

for providing me with the invaluable opportunity to work in the Motion and Teaming (MO-T)

Laboratory. Over the past five years, it has been a fascinating experience learning from him how

to conduct original scientific research, write research papers, and prepare and present scientific

work. Additionally, Dr. Otte gave me the freedom to determine the direction of my research while

offering guidance when needed. I am thankful to him for extending discussions beyond meeting

hours and engaging in stimulating research conversations, as well as for addressing concerns

raised by journal reviewers. Lastly, I am appreciative of his efforts in facilitating my internship

at the NASA Jet Propulsion Lab (JPL), California Institute of Technology.

I extend my gratitude to Dr. Derek Paley, Dr. Jeffrey Herrmann, Dr. Huan Xu, and Dr. Pratap

Tokekar for agreeing to be part of my dissertation committee and for providing guidance and

feedback during the final phase of my Ph.D. journey. I am particularly thankful to Dr. Paley for

his instruction in Advanced Dynamics (ENAE646) and Non-linear Control (ENAE743), as the

knowledge gained from these courses greatly contributed to my Ph.D. research. My appreciation

also goes to Dr. Herrmann and Dr. Xu, as well as to Dr. Azarm for their constructive feedback

and support during the multi-robot task allocation project and the preparation of the Robotics

and Automation Letters (RAL) paper. I thank Dr. Tokekar for his leadership and support dur-

ing my voluntary participation at the WAFR-2022 conference. Lastly, a special thanks goes out

ii


to Professor Mount for teaching the Computational Geometry (CMSC756) class as the insights

gained from this class become useful in implementing efficient collision-check algorithms, and

the multi-robot exploration algorithms for the CADRE mission.

I am grateful to my collaborators at UMD, including Khalid, Estefany, Suyash, Elliot, and

Ruchir. It was a pleasure working together on the multi-robot task allocation project. I also ex-

tend my thanks to my lab-mates (Khalid, George, Dalan, Jack, Alkesh, Yi-Hsuan, Alex, Ollie,

Rachel) for their friendship, motivating suggestions, and insightful questions during presenta-

tions. Special thanks to Nitin Sanket and Chahat Deep Singh from the Perception and Robotics

(PRG) group for their generosity in lending the Bebop 2 quadrotor for my various hardware ex-

periments. Lastly, I appreciate the efforts of the numerous anonymous reviewers who contributed

to the significant improvement of my manuscripts from their preliminary versions.

I would like to express my gratitude to the members of JPL, notably Federico Rossi,

Michael Paton, JP La-Croix, Roland Brockers, Grace Lim, Dustin Aguilar, Libby Boroson, Amir

Rahmani, and Hari Nayar, and many others, for their guidance, support, and the opportunities I

had while collaborating with them during two internships for nearly two years. I am particularly

grateful to Michael Paton for assisting me with initial multi-robot experiments at the Mars Mini

Yard and as well as for generously providing transportation to and from JPL. Furthermore, I wish

to express my appreciation to Federico for arranging my second internship at JPL. His support not

only provided me an opportunity to run multi-robot experiments using the Mercury-7 rovers but

also develop actual flight code for the CADRE rovers which are destined for Lunar exploration.

I would like to express my appreciation to my parents, brother, uncles, aunts, cousins, and

friends (especially Abe Rajan) for their continuous inspiration and unwavering motivation during

my Ph.D..

iii


Lastly, I want to acknowledge the financial support provided by the Clark Doctoral Fel-

lowship, the Minta Martin Research Fellowship Fund, and the U.S. Office of Naval Research

(ONR) grant N00014-20-1-2712 without which this dissertation would not have been possible.

The research for algorithms proposed in Chapter 6 and 7 was carried out at the Jet Propulsion

Laboratory, California Institute of Technology and was sponsored by the National Aeronautics

and Space Administration (NASA) under Grant 80NM0018D0004.

iv


Table of Contents

Acknowledgements ii

Table of Contents v

List of Tables ix

List of Figures x

List of Abbreviations xii

I Introduction 1

Chapter 1: Introduction 2
1.1 Motivation for feasible algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Feasible planning without Two-Point Boundary Value Solution (TPBVP) . . . . . 5
1.3 Contributions to the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Fast feasible motion planning . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Motion replanning under dynamics variation . . . . . . . . . . . . . . . . 8
1.3.3 Multi-agent motion planning . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.4 Multi-robot exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Outline of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2: Literature review 12
2.0.1 Solving TPBVP in sampling-based motion planning . . . . . . . . . . . . 12
2.0.2 Use of heuristics in sampling-based motion planning . . . . . . . . . . . 14

2.1 Single-agent motion planing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Bidirectional strategies for fast motion planning . . . . . . . . . . . . . . 15
2.1.2 Motion replanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Multi-robot motion planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Roadmap based approaches for multi-robot motion planning . . . . . . . 18
2.2.2 Multi-robot motion planning in dynamic environments . . . . . . . . . . 19
2.2.3 Multi-robot exploration algorithms . . . . . . . . . . . . . . . . . . . . . 20
2.2.4 Map-partition based frontier-based exploration . . . . . . . . . . . . . . 21
2.2.5 Communication in multi-robot exploration . . . . . . . . . . . . . . . . . 22

Chapter 3: Technical background 23

v


3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Primitive operations of single-query SBMP . . . . . . . . . . . . . . . . . . . . 28

3.2.1 Sampling nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Nearest neighbor (NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Range search (RS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.4 Collision check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.5 TPBVP for motion planning . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.6 Forward propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.7 Best-input propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

II Single-agent motion planning without two-point boundary value
solution 36

Chapter 4: Bidirectional sampling based motion planning without TPBVP solution 37
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.1 Proposed Generalized Bidirectional RRT (GBRRT) . . . . . . . . . . . . 44
4.4.2 General outline of exploration and exploitation searches . . . . . . . . . 50
4.4.3 Generalized Asymmetric Bidirectional RRT (GABRRT) . . . . . . . . . 54

4.5 GBRRT specific implementation of subroutines used in our experiments . . . . . 55
4.5.1 Specific edge-generation types . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.2 Parameters of GBRRT/GABRRT . . . . . . . . . . . . . . . . . . . . . . 59

4.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.1 Probabilistic completeness . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.2 Computational complexity and shrinking d-ball rate . . . . . . . . . . . . 65

4.7 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.7.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7.2 Statistics chosen for GBRRT insights . . . . . . . . . . . . . . . . . . . 82
4.7.3 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.8 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.8.1 Simulation experiment results . . . . . . . . . . . . . . . . . . . . . . . 85
4.8.2 NBRRT comparison results . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8.3 Interpretation of GBRRT experimental statistics . . . . . . . . . . . . . . 88
4.8.4 Hardware experiment results . . . . . . . . . . . . . . . . . . . . . . . . 89
4.8.5 Limitations of our proposed algorithms . . . . . . . . . . . . . . . . . . 90

4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Chapter 5: Motion replanning under changed dynamical constraints without TPBVP
solution 94

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

vi


5.3.1 Selection of heuristic radius δhr . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.4.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.5 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

III Multi-agent motion planning without two-point boundary value
solution 110

Chapter 6: Dynamic multi-robot motion planning without TPBVP solution 111
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4 Framework architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.5 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.5.1 Algorithm component that runs on the base station . . . . . . . . . . . . 118
6.5.2 Algorithm component that runs on an agent . . . . . . . . . . . . . . . . 120
6.5.3 Heuristic guided feasible sampling-based motion planning . . . . . . . . 121
6.5.4 Handling target-reached condition . . . . . . . . . . . . . . . . . . . . . 126
6.5.5 Robot-obstacle and robot-robot collision avoidance . . . . . . . . . . . . 126

6.6 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.6.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.7 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Chapter 7: Multi-robot exploration without TPBVP solution 137
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.3 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.3.1 Map-partition and assignment . . . . . . . . . . . . . . . . . . . . . . . 142
7.3.2 Robot-partition assignment . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.3.3 Boundary computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.3.4 Agent-based exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.4 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.4.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.4.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.5 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.6 Multi-robot exploration without TPBVP solution . . . . . . . . . . . . . . . . . 157
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

vii


IV Conclusion 161

Chapter 8: Conclusion and future work 162
8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Bibliography 166

viii


List of Tables

4.1 Experimental dynamical systems and input parameters for GBRRT and GABRRT 74
4.2 Experimental statistics of GBRRT for different systems . . . . . . . . . . . . . . 80
4.3 NBRRT results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.1 Experiment parameters for RAPID . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2 Hardware experiment results for RAPID vs. RRT . . . . . . . . . . . . . . . . . 105

6.1 Hardware experiment results for MR-HIRRT . . . . . . . . . . . . . . . . . . . . 134

7.1 Hardware experiment results for CADRE multi-robot exploration tests . . . . . . 154

ix


List of Figures

1.1 Finding a feasible path for a unicycle robot using SBMP . . . . . . . . . . . . . 3
1.2 Comparison of Feasible vs. AO algorithm . . . . . . . . . . . . . . . . . . . . . 4
1.3 Presence and absence of TPBVP . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Blossom propagation for different values of N . . . . . . . . . . . . . . . . . . . 35

4.1 Different types of searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Comparison of unidirectional vs. bidirectional search in higher dimensions . . . . 39
4.3 Availability and absence of TPBVP in bidirectional search . . . . . . . . . . . . 39
4.4 Difference between GBRRT and GABRRT . . . . . . . . . . . . . . . . . . . . 40
4.5 GBRRT algorithm applied to a maze environment . . . . . . . . . . . . . . . . . 41
4.6 Insert and update operations of Q . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.7 Search types used in forward expansion of GBRRT . . . . . . . . . . . . . . . . 51
4.8 Specific edge-generation types used in our version of GBRRT . . . . . . . . . . . 57
4.9 Pop operation from Q for heuristic edge-generation . . . . . . . . . . . . . . . . 58
4.10 Variation of initial solution time vs. parameters of GABRRT . . . . . . . . . . . 60
4.11 Covering ball sequence used to prove probabilistic completeness of GBRRT . . . 63
4.12 Runtime flowchart for different variations of GBRRT . . . . . . . . . . . . . . . 73
4.13 Maneuver libraries for unicycle and quadrotor used in simulation experiments . . 74
4.14 Initial feasible solution found by our algorithms in simulation experiments . . . . 76
4.15 Hardware experiment preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.16 Simulation experiment results comparing GBRRT and GABRRT against state-

of-the-art algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.17 Initial solution cost results of simulation and hardware experiments . . . . . . . . 86
4.18 Hardware experiment results comparing GABRRT and RRT . . . . . . . . . . . 89

5.1 Quadrotor encounters dynamics change and replans with RAPID algorithm . . . 95
5.2 Edge addition in RAPID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Quadrotor used to test the RAPID algorithm . . . . . . . . . . . . . . . . . . . . 103
5.4 Obstacle course and trajectory libraries used in hardware experiments to test

RAPID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Comparison of RAPID vs. RRT for two trials of hardware experiments . . . . . . 106
5.6 Simulation experiment results for RAPID algorithm . . . . . . . . . . . . . . . . 106

x


5.7 Replanned paths using RAPID, RRT-Extend and SST-Random algorithms for
unicycle and quadrator vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1 Heuristic-guided dynamical multi-rover sampling-based motion planning without
TPBVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2 Multi-robot motion planning architecture for MR-HIRRT . . . . . . . . . . . . . 117
6.3 Push and pop operations for exploitation process in MR-HIRRT . . . . . . . . . 122
6.4 Different environments used in MI-HIRRT simulation experiments . . . . . . . . 128
6.5 Safety trajectories and trajectory library used in MR-HIRRT experiments . . . . . 128
6.6 Results of simulation experiments comparing MR-HIRRT against state-of-the-art

algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.7 Analysis of PRM nodes count vs planning time for MR-HIRRT . . . . . . . . . . 133
6.8 Scalability experiment results for MR-HIRRT . . . . . . . . . . . . . . . . . . . 133
6.9 Testing MR-HIRRT at Lunar mini yard at JPL . . . . . . . . . . . . . . . . . . . 134

7.1 Map-partition based multi-robot exploration . . . . . . . . . . . . . . . . . . . . 138
7.2 Testing our proposed map-partition multi-robot exploration algorithm in the Lu-

nar mini yard at JPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.3 Proposed centralized-decentralized approach for multi-robot exploration . . . . . 141
7.4 Map-partition and assignment in leader agent . . . . . . . . . . . . . . . . . . . 143
7.5 Frontier-based exploration in agent . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.6 Even-odd rule algorithm used to test existence of a point in polygon . . . . . . . 145
7.7 Different layouts of unexplored space used in simulation experiments . . . . . . . 148
7.8 Map partitions for different input layouts of unexplored space using K-means

algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.9 Progression of an exploration mission using our proposed multi-robot exploration 150
7.10 Simulation experiment results from testing our proposed multi-robot exploration

algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.11 CADRE Mercury-7 rovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.12 Visual-guided navigation pipeline used in the CADRE rovers. . . . . . . . . . . . 152
7.13 Original image showing a crater in the front of the rover and the corresponding

stereo pipeline output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.14 Two different scenarios used to test our exploration algorithm in real-world ex-

periments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.15 Progression of a real-world experiment having two robots exploring 6mx6m region155
7.16 Progression of a real-world experiment testing resiliency of our proposed algorithm156
7.17 Results for multi-robot exploration using a feasible planner without TPBVP solution159

xi


List of Abbreviations

AIT Adaptively Informed Trees
AO Asymptotically Optimal
AO-RRT Asymptotically Optimal - Rapidly Exploring Random Tree
A.S. Almost Sure
BVP Boundary Value Problem
CADRE Co-operative Autonomous Distributed Robotic Exploration
CSMA-CA Carrier Sense Multiple Access - Collision Avoidance
DI Dynamic Insertion
DIRT Dominance-Informed Region Trees
DOF Degrees of Freedom
GBDIRT Generalized Bidirectional Dominance-Informed Region Trees
GBRRT Generalized Bidirectional Rapidly Exploring Random Tree
GABDIRT Generalized Asymmetric Bidirectional Dominance-Informed Region Trees
GABRRT Generalized Asymmetric Bidirectional Rapidly Exploring Random Tree
iSST Informed Stably Sparse Rapidly Exploring Random Tree
IVP Initial Value Problem
JPL Jet Propulsion Laboratory
KS Kolmogorov Smirnov
LPA* Lifelong Planning A*
MR-HIRRT Multi-Robot - Heuristic Informed Rapidly Exploring Random Tree
NASA National Aeronautics and Space Administration
NBRRT Numerical Bidirectional Rapidly Exploring Random Tree
NN Nearest Neighbor
ONR Office of Naval Research
PD Proportional-Derivative
PRM Probabilistic Roadmap
RAPID Rapid Planning for Interchanged Dynamics
RRT Rapidly Exploring Random Tree
RS Range-Search
ROS Robot Operating System
SBMP Sampling Based Motion Planning
SST Stably-Sparse Rapidly Exploring Random Tree
TPBVP Two-point Boundary Value Problem

xii


Part I

Introduction

1


Chapter 1: Introduction

Autonomous robotic systems have found widespread applications in various domains, in-

cluding manufacturing [1], industrial inspection [2], transportation [3], and planetary surface

exploration [4]. These systems necessitate the ability to plan and execute movements from an

initial to a final position while ensuring collision-free navigation around obstacles, structures,

and people. This crucial capability hinges upon the concept of motion planning [5], which in-

volves charting a safe feasible path from start to goal. Informally, a feasible path is one that a

robot can execute accounting for its kinematics and dynamics.

With many robotic systems having high dimensional state spaces due to their numerous

degrees of freedom and higher-order derivatives, conventional motion-planning methods such

as visibility graphs [6] or cell decomposition [7] become impractical as their computational de-

mands increase exponentially with dimensionality [8]. Furthermore, these methods ignore ve-

hicle dynamics while planning. This dilemma leads us to using a relatively new paradigm of

probabilistic methods known as Sampling-Based Motion Planning (SBMP) [5].

SBMP strategies involve the construction of probabilistic trees (Fig. 1.1) or roadmap graphs,

where each edge represents a feasible trajectory for the robot. Through intelligently selected ran-

dom samples in the robot’s state space, SBMP methods employ exploration and exploitation

techniques to iteratively expand and refine these trees or graphs. The inherent stochastic nature

2


allows for the efficient exploration of the state space, leading to the generation of feasible paths

subject to robot’s kinematic and dynamic constraints. The balance of exploration and exploitation

techniques enables SBMP methods to adapt to diverse environments where traditional methods

struggle due to the curse of dimensionality [8] and complex dynamics. SBMP approaches have

found applications [9] in protien folding, digitial animation, navigation of self-driving vehicles,

and industrial manipulator robots.

Sampling-based motion planning

Node Feasible Trajectory Feasible Path Obstacles

Goal

Figure 1.1: Finding a feasible path for a unicycle robot using SBMP

In the realm of SBMP, two categories of algorithms have emerged: Feasible [5] and Asymp-

totically Optimal (AO) [10]. Feasible algorithms promptly yield initial solutions to motion plan-

ning challenges, prioritizing speed over solution quality. We apply them to complex problems

(e.g., Piano mover’s problem [11], Alpha puzzle [12]) where finding any feasible solution is con-

sidered an achievement. Conversely, AO algorithms progressively refine their solutions towards

optimality with available planning time. While both categories hold significance (Fig. 1.2), our

focus within this thesis centers on the development of feasible motion planning algorithms.

3


Comparison of Feasible vs. AO algorithm

So
lu

ti
o

n
 C

o
st

Time

Feasible

0

AO

𝑡𝑖1 𝑡𝑖2

𝑡𝑖1 - Initial Solution Time (Feasible) 
𝑡𝑖2 - Initial Solution Time (AO) 

𝑡𝑖1 > 𝑡𝑖2
Figure 1.2: Feasible algorithms are faster at generating initial solutions than AO algorithms i.e.
ti1 > ti2 where ti1 is the initial solution time of feasible algorithm and ti2 is the initial solution
time of AO algorithm. AO algorithms excel in optimizing solution costs over time.

1.1 Motivation for feasible algorithms

There are several reasons why we choose to focus on feasible sampling-based motion al-

gorithms in this thesis which are outlined below:

1. Fast solution discovery: Feasible motion planning algorithms prioritize the rapid gen-

eration of valid solutions, rendering them particularly well-suited for tackling complex

challenges such as the Alpha puzzle [12] or protein folding [13], where any solution is

deemed an accomplishment. Moreover, their efficacy extends to time-critical applications

like robotics, autonomous vehicles, and electronic games.

2. Computational efficiency: Feasible algorithms often require less computational power

than AO algorithms. This makes them suitable for resource-constrained systems where

computational time is limited.

3. Problem simplification: Feasible solutions can be used as a start step in solving complex

problems. Once a feasible solution is obtained, it can serve as a starting point for further

4


optimization or refinement.

4. Dynamic environments: In dynamic environments with moving obstacles and other

robots, generating fast feasible solutions [14] can be more valuable than optimizing them.

Presence and absence of TPBVP

𝑥0

𝑥1
ℰ1

𝑥2
ℰ2

TPBVP Solution Available ( ℰ1 exactly connects 𝑥0 and 𝑥1 )

No TPBVP Solution  ( ℰ2 does not connect 𝑥0 and 𝑥2 )

Figure 1.3: E1 exactly connects x0 and x1, thus solving TPBVP solution. E2 does not connect x0

and x2, thus no TPBVP solution.

1.2 Feasible planning without Two-Point Boundary Value Solution (TPBVP)

Many existing feasible SBMP algorithms [15] operate under the assumption that a trajec-

tory precisely connecting two points in multi-dimensional state space can be found – a conun-

drum known as the Two-Point Boundary Value Problem (TPBVP) [5, 16, 17]. Addressing this

problem entails solving a multi-dimensional first-order differential equation subject to specified

initial and final conditions. Finding solutions to this problem especially in high-dimensional

state spaces are inherently complex, with analytical solutions only available to specific canoni-

cal systems [18] [19]. Numerical approaches [16], on the other hand, come with computational

overheads and susceptibility to precision and stability issues. Consequently, this dissertation con-

centrates on devising feasible SBMP algorithms tailored for scenarios where the solution to the

5


challenging TPBVP is computationally hard and/or impossible (Fig. 1.3). These algorithms find

valuable applications, particularly in motion planning for modern, complex, high-dimensional

state space systems, including autonomous vehicles, industrial manipulators, humanoid robots,

multi-agent systems, and bio-inspired robots.

1.3 Contributions to the state-of-the-art

This dissertation contributes to the field of feasible sampling-based motion planning by

addressing the challenge of planning feasible motions that consider a robot’s kinematics and

dynamics when TPBVP solutions are unavailable. These contributions have either been published

or submitted (in process) to archival journals [20–22] and presented at a major conference [23].

Our main theme is explored through a range of scenarios, including two for single-agent systems

and two for multiple-agent systems, as outlined below. The integration of various heuristics

accelerates the discovery of solutions across these diverse scenarios.

1. Single-agent systems

(a) Fast feasible motion planning: Quickly find a feasible motion path from a start to

goal configuration in a static known environment.

(b) Motion replanning under dynamics variation: Efficiently replanning a feasible

path back to home base accounting for vehicle’s dynamics change in a static known

environment.

2. Multi-agent systems

(a) Multi-robot motion planning: Expediently solving the multi-robot feasible motion

6


planning problem where robots visit pre-selected unlabeled targets in a partially-

known dynamic environment.

(b) Multi-robot exploration: Quickly solving the multi-robot feasible exploration prob-

lem where robots explore a completely unknown environment using a divide and

conquer approach.

In the subsequent sections, we will delve into these four scenarios elaborating on our con-

tributions to each.

1.3.1 Fast feasible motion planning

In this scenario, we tackle the challenge of rapidly solving feasible motion planning prob-

lems in static known environments, which is crucial for addressing high-dimensional motion

planning challenges. We introduce novel bidirectional search strategies that expedite the discov-

ery of solutions. Our approach involves the forward search tree, combining exploration and ex-

ploitation to quickly find a path to the goal. The exploitation process leverages knowledge gained

from the growing reverse search tree to focus the forward search on the goal region. Notably,

this work represents the first application of bidirectional search strategies in situations where

TPBVP solutions are unavailable. The primary contribution of this work is the proposal of two

feasible algorithms: Generalized Bidirectional Rapidly Exploring Random Tree (GBRRT) and

GABRRT (Generalized Asymmetric Bidirectional Rapidly Exploring Random Tree). Additional

contributions include:

• Provision of proofs for GBRRT and GABRRT being probabilistically complete.

• Derivation of computational complexities for GBRRT and GABRRT.

7


• Empirical analysis of input parameters of GABRRT vs intial solution time.

• Experimental verification employing simulation experiments for six different non-linear

dynamical systems, demonstrating that GBRRT and GABRRT performs comparable or

better than existing approaches.

• Experimental verification using real-world quadrotor experiments on a 3D obstacle course.

1.3.2 Motion replanning under dynamics variation

Our second scenario involves motion replanning under changed dynamical constraints,

with practical applications in swiftly re-routing systems back to their home base when faced

with sudden dynamics changes (e.g., vehicle damage). We propose a novel approach that lever-

ages the topological information stored in the original search tree to expedite the replanning

process. This work represents the first approach that considers motion planning under dynamics

changes without solving TPBVP. The primary contribution of this work is the introduction of the

RAPID algorithm (Rapid Planning for Interchanged Dynamics) that performs quick replanning.

Additional contributions include:

• Substantiation of the algorithm’s efficacy through simulation experiments involving two

distinct dynamical systems.

• Demonstrating practicality using real-world quadrotor experiments on a 2D obstacle course.

8


1.3.3 Multi-agent motion planning

Transitioning to multi-agent scenarios, we first tackle a multi-agent feasible motion plan-

ning problem within a partially-known dynamic environment. This scenario is particularly rel-

evant in planetary surface missions, where robots need to visit predetermined points of interest

based on lower-resolution maps chosen by scientists. Compared to previous works, this is the

first work that considers mult-robot motion planning problem in partially-known environments

when TPBVP solutions are unavailable. In this context, we maintain a centralized roadmap using

the initial environmental map, dynamically updating it as robots encounter new obstacles during

their operations. This dynamic roadmap serves as a heuristic to expedite motion planning and

facilitate robot-target assignment. Our primary contribution is the development of a multi-robot

motion planning algorithm known as MR-HIRRT (Multi-Robot Heuristic Informed Rapidly Ex-

ploring Tree). Additional contributions include:

• Experimental verification using simulation experiments involving two-dynamical systems

and three different scenarios.

• Implementation on real-world prototype lunar rovers to test feasibility of proposed algo-

rithm.

• An analysis of the overall planning time variation vs. number of initial PRM nodes for two

high-dimensional dynamical systems.

• A scalability experiment analysis showing the variation of total planning time and mission

time with increase in the number of robots and targets.

9


1.3.4 Multi-robot exploration

Our final scenario addresses the feasible multi-robot exploration problem, where teams

of robots collaborate to explore uncharted territories. This challenge has practical applications

in rapidly and efficiently exploring completely unknown environments, such as extra-terrestrial

surface missions or search and rescue operations. Our main contribution includes two aspects:

firstly, a comprehensive study of multi-robot exploration within the context of the upcoming

CADRE mission, employing a divide-and-conquer approach adhering to CADRE’s mission con-

straints. Secondly, an extension of this algorithm to accommodate robots with substantial dy-

namics using feasible sampling-based motion planners when TPBVP solutions are unavailable.

Additional contributions include:

• Experimental verification using simulation experiments encompassing different initial shaped

environments.

• Experimental verification using real-world prototype lunar rovers.

• Experimental verification of the advantage of using heuristic based planners for motion

planning when TPBVP solutions are not available.

1.4 Outline of this document

The outline for the rest of the document is as follows. Chapter 2 offers a comprehen-

sive literature review contextualizing the work presented in this dissertation. Chapter 3 provides

an introductory technical foundation for feasible sampling-based motion planning. Chapter 4-

Chapter 7 present the technical details, discussions, and experiments related to the four algo-

10


rithmic contributions highlighted in the preceding section. In particular, Chapter 4 outlines the

GBRRT and GABRRT algorithms, proposed for executing kinodynamic bidirectional searches

in scenarios where the TPBVP is inaccessible. Chapter 5 delves into the RAPID algorithm,

formulated to enable swift re-planning when a vehicle’s dynamic constraints undergo alteration

mid-mission. Chapter 6 introduces MR-HIRRT, a framework to efficiently enable navigation of

‘n’ agents visiting ‘m’ unlabeled targets in partially-known environments. Chapter 7 introduces

the map-partition based multi-robot exploration algorithm for the CADRE mission. The culmi-

nation of the dissertation is found in Chapter 8, summarizing the primary contributions made and

engaging in discussions regarding potential avenues for future research.

11


Chapter 2: Literature review

This dissertation proposes several feasible sampling-based motion planning algorithms, tai-

lored to address general single and multi-agent scenarios where solutions to the TPBVP remain

elusive. Within the context of single-agent scenarios, this work encompasses fast feasible mo-

tion planning and motion replanning under the influence of changing dynamical constraints. For

multi-agent scenarios, the focus extends to encompass multi-robot motion planning and multi-

robot exploration, the former within partially known and the latter in entirely unknown environ-

ments respectively. Throughout these scenarios, the strategic utilization of pre-existing or on-

the-fly calculated heuristics serves as a fundamental approach to expedite the motion planning

process.

In the subsequent sections, we first discuss background on finding TPBVP solutions for

motion planning and use of heuristics for accelerated motion planning which are common themes

in all our proposed algorithms. We then discuss related work on motion planning for single and

multi-agent systems.

2.0.1 Solving TPBVP in sampling-based motion planning

Solving a TPBVP in sampling-based motion planning entails finding a solution to a n−

dimensional first-order differential equation (where n is number of states) constrained to the

12


given start and end boundary conditions. This is an extension of the well-studied IVP [24] where

only the initial condition is specified. Finding the TPBVP solution is non-trivial for general

non-linear systems and closed form-expressions for TPBVP have been only be attained for some

specific systems [16, 25, 25]. Consequently, researchers have explored various approaches to

tackle the TPBVP motion planning problem. These include generating approximations [26],

utilizing numerical methods such as shooting approaches [27], finite-difference methods [16], and

collocation methods [28]. Additionally, strategies involving the simplification of dynamics via

linearization [29] and subsequently solving the resulting linear TPBVP have been employed. It

should be noted, however, that numerous numerical methods tend to be computationally intensive

and susceptible to stability and precision challenges [16].

More recently, a variety of algorithms based on random forward-propagation [30] [31] have

emerged. These algorithms circumvent the need to solve the TPBVP, instead constructing trees

by exclusively addressing the IVP. Notable examples of these forward-propagation algorithms

encompass the Stable Sparse-RRT (SST) [31], Informed SST (iSST) [32], Dominance-Informed

Region Trees (DIRT) [33], Asymptotically Optimal RRT (AO-RRT) [34] [35], and the algorithm

employing bundles of edges (BOE) [36]. It is important to note, however, that these algorithms

fall under the AO category, prioritizing the enhancement of solution quality over expedited so-

lution discovery. Consequently, a gap exists in the domain of algorithms that swiftly identify

feasible initial solutions—a gap this dissertation tries to address.

13


2.0.2 Use of heuristics in sampling-based motion planning

Numerous heuristics have been previously employed within the context of sampling-based

motion planning, as a well-selected heuristic frequently leads to improved search efficiency. A

common approach involves probabilistically sampling the goal node, a technique known as goal

biasing [37]. Akgun et al. [38] utilize sampling heuristics to increase the number of nodes sam-

pled near a discovered path, thereby increasing the rate of finding a more optimal one. Urmson

et al. [39] introduce the heuristically guided RRT (hRRT), which leverages a quality measure to

strike a balance between exploration and exploitation when selecting the next node to expand,

consequently resulting in lower-cost solutions.

Anytime-RRT [40] presents a general heuristic for rejecting samples in the context of any-

time planning whenever the heuristic cost from the start to the new sample plus the heuristic cost

from the sample to the goal is greater than the length of the best known path. C-FOREST [41] ex-

tends this idea to AO sampling-based motion planning on both single-process and multi-process

implementations. In D-dimensional Euclidean spaces (RD), the sample region from [40, 41]

takes the special form of a hyperellipse; Informed RRT* extends the idea from [41] by show-

ing how to generate new samples from the hyperellipse embedded in RD directly (without using

rejection sampling). DIRT [33] and iSST [32] use user-defined heuristics to provide high rates

of convergence towards optimal solution. However, sometimes it is difficult to come up with a

heuristic that provides a benefit in a particular problem instance, even though heuristic for the en-

tire problem domain may be easy to formulate [42]. To alleviate this issue, Strub et al. [42] [43]

propose a geometric planner that uses the problem-specific cost-to-go heuristic provided by the

ever-improving reverse search tree obtained from LPA* to converge towards the optimum. Our

14


methods use a similar approach to [42] in using the reverse search tree to get the cost-to-goal

heuristic but differ in that our algorithms do not require solving the TPBVP.

In the context of multi-robot motion planning, various heuristics have been used in both

unlabeled [44] and labeled [45] goals scenarios. In labeled goals scenario, each robot is given

a specific target to visit whereas in the unlabeled goal scenario, it does not matter which robot

visits what target as long as all targets are visited. Le and Plaku introduced various centralized

approaches in both both labeled [46–48], and unlabeled goal [49] scenarios using a pre-built

lower-dimensional PRM roadmap [50] as a heuristic for each robot to guide the search. Thus, for

both single and multi-agent systems, a diverse array of heuristics have been harnessed to expedite

solution discovery and enhance solution quality. In this thesis, we also utilize different types of

heuristics into our proposed algorithms to accelerate initial solution identification.

2.1 Single-agent motion planing

We first discuss previous literature regarding bidirectional SBMP which forms the basis for

our work on bidirectional search without TPBVP.

2.1.1 Bidirectional strategies for fast motion planning

There have been a variety of bidirectional SBMP algorithms proposed in the literature. A

seminal work is bidirectional RRT-Connect [37] which grows two trees toward each other by

having each tree grow toward the nearest vertex of the opposite tree. Jordan et al. [51] develop a

bidirectional variant of RRT* [52] that provides asymptotic optimality utilizing various heuristics

and procedures to improve the convergence rate. Tahir et al. [53] and Xinyu et al. [54] use arti-

15


ficial potential fields [55] to generate intelligent samples using gradient descent methods to im-

prove speed of convergence of bidirectional RRT*. Devaurs et al. [56] update the unidirectional

Transition-RRT (T-RRT) algorithm to create a bidirectional variant that achieves fast conver-

gence and better quality paths over configuration spaces where cost functions are defined. Starek

et al. [57] propose a bidirectional approach to Fast Marching Trees (FMT*), which they show

has better convergence rates to high-quality paths than many existing unidirectional planners.

Strub et al. [42] use an asymmetric bidirectional search in Adaptively Informed Trees (AIT*)

with the reverse tree created using Lifelong Planning A* (LPA*). Although the algorithms men-

tioned above provide fast convergence rates, they cannot be used (in their current form) when

the TPBVP cannot be solved, or used in practice when doing so is computationally expensive.

Bidirectional algorithms [37,58] that do not require solving TPBVP, instead require extra compu-

tation to connect the discontinuity that exists between the two trees, e.g., using methods like point

perturbation [37] or Bézier curves [58]. In contrast, our approach avoids the tree-tree connection

problem entirely by using the heuristic provided by the reverse search tree to grow the forward

tree towards the goal.

In the previous section, we discussed bidirectional strategies proposed for fast solution

discovery and improving solution quality. However, these approaches assume that the vehicle

dynamics does not change during the mission. While this assumption holds true for certain

missions, numerous situations arise where changes in vehicle dynamics necessitate the need for

replanning. We address this problem in Chapter 5 in our thesis and provide the related work on it

in the next section.

16


2.1.2 Motion replanning

A survey of the sampling-based motion planning literature reveals limited work on re-

planning under changed dynamical constraints but numerous existing work on replanning under

changed dynamic environments. For example, Bekris et al. [59] present a tree-based planner that

deals with evolving configuration spaces by incrementally updating a tree-based data structure re-

tained from previous steps and then biasing the search with a greedy and probabilistic complete

exploration approach. Ferguson et al. [60] make repairs to an already existing RRT tree, remov-

ing invalid parts and then growing the changed tree until a new solution is found to account for

the changed configuration space. Yoshida et al. [61] present a reactive method that uses a com-

bination of path replanning and path deformation to account for changed environments. Otte et

al. [62] present RRT-X, the first AO algorithm in dynamic environments that uses ideas from the

D* algorithm for quick replanning. While such algorithms work for replanning in evolving con-

figuration spaces, they are not directly applicable to replanning in response to changing vehicle

dynamics. Nonetheless, our method is inspired by the same high-level idea of using an existing

but outdated solution to speed up the replanning process.

2.2 Multi-robot motion planning

In this section, we focus our attention towards multi-robot motion planning and exploration,

laying the foundation for our forthcoming work in Chapters 6 and 7. We first review related work

on roadmap-based approaches for multi-robot motion planning, and motion planning in dynamic

environments, ideas of which are used to expedite motion planning in Chapter 6. Following this,

we provide an overview of multi-robot exploration algorithms. We then review map-partition

17


based multi-robot exploration which forms the basis for our proposed algorithm used for the

CADRE mission (Chapter 7). Concluding this section, we delve into the realm of communication

strategies for multi-robot exploration, clarifying the rationale behind our selected communication

strategy.

2.2.1 Roadmap based approaches for multi-robot motion planning

Švestka et al. [63] use a centralized approach for roadmap-based multi-robot path planning.

They plan on an explicit higher-dimensional composite roadmap created by taking the cartesian

product of individual robot roadmaps. Wagner et al. [64] improve this work by showing it is

sufficient to use an efficient implicit composite roadmap representation and apply their M* algo-

rithm [65] to calculate paths. dRRT [66] and dRRT* [67] further use this implicit representation

to perform an RRT-like search in discrete space to find robot paths efficiently. However, these

approaches do not work for complex robot dynamics when the TPBVP cannot be solved.

Van der Berg and Overmars [68] use a local planner that encodes dynamic constraints while

employing A* to calculate global paths on a pre-computed roadmap. They use a prioritized plan-

ning scheme where each robot is assigned a priority, with motion planning prioritized for robots

with decreasing priority. Le and Plaku introduced various centralized approaches for multi-robot

motion planning for labeled [46–48] and unlabeled goals [48] while considering robot dynam-

ics. They build a motion tree over the composite state space of all robots while using a lower-

dimensional roadmap for each robot to guide the search. The completed motion tree’s trajectories

correspond to each robot’s feasible paths. However, these approaches assume a fully known en-

vironment at the start of the mission. In contrast, our work considers partially known settings

18


where new obstacles are detected as robots move in the environment.

2.2.2 Multi-robot motion planning in dynamic environments

Prior work has employed disparate approaches to handle dynamic environments. The def-

inition of a “dynamic” environment varies in the literature but is usually characterized by the

encounter of another robot or unexpected obstacle appearance, disappearance, or movement un-

known during initial planning. Shwail et al. [69] construct a roadmap in the preprocessing step

and use A* to find paths for robots in an offline phase. In the online phase, after each delta

time step, a collision check with other robots is performed, and the robot executes a bang-bang

control along with a wait option to avoid other robots and reach the goal position. Tordesillas

and How [70] present MADER, where the trajectories of robots are characterized by an outer

polyhedral surrounding them. MADER introduces linear separability between polyhedra using

planes in an optimization problem for collision avoidance with other robots and obstacles.

A whole body of work has focused on using machine learning strategies. Zhu et al. [71]

use a recurrent neural network to learn multi-robot motion behaviors from demonstrated trajec-

tories. The learned trajectory model is used in conjunction with a model predictive controller

(MPC) to perform robot-robot and robot-obstacle collision avoidance. PRIMAL [72] uses re-

inforcement and imitation learning to learn decentralized robot policies for robot and obstacle

collision avoidance that can extend to any number of robots. PRIMAL 2 [73] extends this work

to lifelong planning, where robots visit other goals upon reaching their assigned goal. However,

these machine learning strategies do not encode robot dynamics while planning.

Our multi-robot motion planning work differs from previous work in that we consider both

19


robot dynamics and partially known environments. Although moving obstacles (besides robots)

can also be incorporated, we stick to only using static obstacles to emulate the planetary explo-

ration environment in our experiments.

2.2.3 Multi-robot exploration algorithms

There have been a variety of multi-robot exploration algorithms proposed in the literature.

Notable approaches include frontier-based [74,75], information theory-based [76,77] and poten-

tial field-based, [78,79]. Frontier-based algorithms drive the robot to “frontiers” which are points

on the boundary between explored and unexplored space. Information theory-based approaches

move the robot to areas so as to reduce the total entropy, or maximize the mutual information [80]

in the map. Finally, potential-field methods use a potential field [81] function to steer the robots

to unexplored areas and away from explored areas of the map. Frontier-based exploration is the

most commonly used among the proposed approaches due to its effectiveness and simplicity.

Various frontier-based methods have been presented [82] to minimize time [83], avoid redundant

coverage [84, 85], and reduce communication [86].

Frontier-based approaches can be divided into centralized [85,87] and decentralized schemes

[88] depending on which entity generates the frontier points. Centralized schemes have a central

entity like a base station or a leader agent that assigns frontier points to the individual agents.

For example, Banfi et al. [87] use Integer Linear Programming (ILP) on a central base station

to assign frontier points to each agent to achieve efficient exploration while enforcing recurrent

connectivity constraints. In contrast, decentralized schemes have individual agents themselves

assign their next frontier point. For example, Bautin et al. [88] have individual agents choose

20


their next frontier point by considering the distance of other agents to the frontier and how many

agents are closer to a frontier. In our scheme, we use a map-partition based approach that uses

elements of centralized and decentralized approaches, which we discuss next.

2.2.4 Map-partition based frontier-based exploration

An important topic in multi-robot frontier-based exploration is assigning frontiers to dif-

ferent robots as to reduce the overall mission time for exploration [75]. Previous approaches

[75, 83, 89] propose a way to achieve this objective is by dividing the environment into disjoint

regions and assigning frontier points within these regions for the individual robots to explore.

For example, The works by Wurm et al. [75], Lopez et al. [89], and Wu et al. [90] employ the

Voronoi partition to segment environments (rooms, corridors), whereas Solanas et al. [83] uses

K-means-based partitioning for evironment segmentation. These approaches then use the robot’s

location to determine the robot-region assignment. Similar to these methods, we propose using a

map-partition approach in our work due to its exploration efficiency. However, our work differs

from these methods in the following ways. First, Wurm et al. [75] and Solanas et al. [83] divide

map into regions and then assign frontier to robots (within a region) in a centralized fashion.

Wurm et al. [75] also calculates new partitions at every iteration. In contrast, our algorithm has

agents individually determine frontier points in their assigned static regions in a decentralized

fashion. Second, Lopez et al. [89] shares maps between different robots, and the agents indi-

vidually calculate their Voronoi partitions, whereas we use a centralized entity to partition the

map. Our approach offers the advantages of both decentralized and centralized methods in that it

facilities autonomous exploration of individual regions by agents, while resulting in efficient co-

21


ordination (less communication) between them. Although, it is true that with the use of a central

entity introduces a single point of failure, we mitigate this problem by planning to use a leader

election scheme [91, 92] to select another leader agent if the current one gets disabled or stuck.

An additional difference to previous works is that they focus on testing the algorithm either

in simulation [83, 89] or an indoor-office structured environment [75]. In contrast, we test our

exploration algorithm in both simulation and an outdoor setting with obstacles and craters that

resembles the Lunar surface (Fig. 7.2) to cater to our mission.

2.2.5 Communication in multi-robot exploration

Communication is essential for multi-robot exploration as it affects the coordination be-

tween robots [93]. Inefficient coordination may lead to poor exploration efficiency, with the same

areas being visited multiple times by different robots. Prior research has delved into diverse com-

munication strategies [87] to facilitate multi-robot exploration coordination. While providing an

exhaustive treatment of these strategies lies beyond the scope of this work, we highlight three

common scenarios frequently employed: 1) Full communication, where communication is al-

ways available, 2) Periodic communication [94], where communication is available periodically

and 3) Recurrent communication [87], where communication is available only at the deployment

locations. Ideally, in our work, we only need communication at the mission start to communi-

cate the regions and mission end to communicate back the locally explored maps. This contrasts

with algorithms like [87, 95], where the base station communicates the frontier points to agents

at every iteration.

22


Chapter 3: Technical background

Sampling-based motion planning approaches [96] [9] have gained significant popularity in

the field of robotics due to their simplicity and their efficiency in exploring high-dimensional

state spaces. These approaches involve the random sampling of collision-free feasible states

for a robot within its state space and moving between them in a way that respects a robot’s

kinematics and dynamics. Broadly, the literature categorizes these approaches into two primary

categories namely roadmap approaches [97] and incremental planning [98] methods. We employ

incremental methods for single-query motion planning [99] and roadmap approaches for multi-

query motion planning [100].

In single-query motion planning, the central objective revolves around constructing a data

structure, often represented as a tree, to effectively solve a single query. This query entails finding

a feasible path between a provided start state and a designated goal state. In contrast, multi-query

motion planning pertains to the creation of a data structure, frequently represented as a graph,

capable of handling multiple queries. These queries involve finding paths between various start

and end states.

A prominent example of a roadmap approach is the Probabilistic Roadmap (PRM) pio-

neered by Kavraki et al. [50] In its simplest form, PRM involves the sampling of collision-free

states within the state space. These sampled states are then connected to a subset of their nearest

23


neighbors. During the online phase, the shortest path between a given start and goal state can

be determined through the graph. The versatility of PRM enables its application in multi-query

motion planning scenarios. However, it’s important to note that the construction of a roadmap

assumes the solvability of the TPBVP for the given dynamical system. Feasible motion planning

has traditionally been employed to find solutions to intricate problems, such as the Piano Mover’s

problem [11] and the Alpha puzzle [12], where the attainment of any valid solution is deemed

a significant achievement. In such contexts, the focus is on discovering any valid solution, with

all solutions considered equally valuable. Recently, feasible planning has also found application

as the initial step in any-time AO motion planning.The AO approach aims to provide a workable

solution while simultaneously refining its quality as additional planning time becomes available,

particularly towards achieving optimality. Throughout this dissertation, our primary emphasis

remains on expeditiously solving the feasible motion planning problem across various scenarios.

In the following section, we present definitions for the various primitives employed in

feasible single-query motion planning algorithms, which will be consistently utilized throughout

this thesis.

3.1 Definitions

Definition 1. The configuration of a robot is defined as the minimum set of coordinates required

to completely specify the degrees of freedom (DOF) of the robot.

Typical examples of configurations of various robots include:

1. Unicycle Robot: Its configuration is completed specified by its 2D position (x, y) and θ i.e.

(x, y, θ).

24


2. Manipulator robot: The configuration of a manipulator robot with three revolute joints with

joint angles θ1, θ2, θ3 is (θ1, θ2, θ3).

Definition 2. The configuration space C (also called C-space) is the space of all allowable con-

figurations of a robot.

For the unicycle robot, the configuration space corresponds to the Special Euclidean Group

SE(2), while for the manipulator robot with three revolute joints, it is represented as a three-

dimensional Torus (T 3).

Definition 3. The state of robot includes variables that fully describe its current state including

its configuration and some of its higher order derivatives (e.g. velocities and accelerations).

Unlike a robot’s configuration, the robot’s state is not unique and can include higher-order

derivatives such as accelerations, jerk, snap, etc. The decision of which higher-order derivatives

to incorporate depends on the specific application and striking a fine-balance between employing

higher-fidelity models and availability of computational power and sensor data.

In the context of a unicycle robot, a possible representation of its state could comprise

(x, y, θ, v, ω) where v is the linear velocity and ω is the angular velocity. Similarly, for the ma-

nipulator robot featuring three revolute joints, a suitable state space representation might involve

(θ1, θ2, θ3, ω1, ω2, ω3) where ω1, ω2, and ω3 are the angular velocities of the revolute joints.

Definition 4. The state space S of the robot is the space of all states of a robot.

Definition 5. The workspace of a robotW is the space of possible physical locations a robot can

reach.

25


The workspace for the unicycle robot is the set of 2D locations the robot can reach. For the

manipulator robot with n revolute joints, the workspace is the set of 3D locations its end-effector

can reach.

For the rest of our work, we will assume that X is a smooth n-dimensional (manifold) state

space. Let the obstacle space Xobs be an open subset of X . Xobs is the set of all states where

the system is in collision with obstacles in the workspace. The free space Xfree is defined as

Xfree = X \ Xobs. Xfree is the closed subset of X and set of all states that are not defined by

obstacle collisions. Let the start state be denoted by xstart, goal state xgoal and the goal region

be Xgoal with xstart ∈ Xfree, and xgoal ∈ Xgoal and Xgoal ⊂ Xfree. Let U be the m-dimensional

input control set. The system dynamics satisfies the differential equation of the form:

ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U . (3.1)

Definition 6. A valid trajectory (also called edge) E of duration tE is a continuous function that

satisfies E : [0, tE ]→ Xfree, where E is generated by forward integrating system (3.1) using a

control function Υ : [0, tE ]→ U .

Definition 7. A distance metric dM is a function that satisfies dM : X × X → R+ where × de-

notes the cartesian product and R+ is the set of non-negative real numbers. Assuming x1, x2, x3 ∈

X , dM satisfies the following properties.

• Non-negativity: dM(x1, x2) ≥ 0

• Identity of indiscernibles: dM(x1, x2) = 0 ⇐⇒ x1 = x2

• Symmetry: dM(x1, x2) = dM(x2, x1)

26


• Triangle inequality: dM(x1, x2) + dM(x2, x3) ≥ dM(x1, x3)

Definition 8. A distance function dX is a function that satisfies dX : X × X → R where R is the

set of real numbers.

Unlike dM , dX must only satisfy the triangle inequality property. Thus dX has less restrictive

requirements than dM and this makes dX more general than dM .

Definition 9. The cost C of a trajectory is a function that satisfies C : E → R where E is the

space of all trajectories.

The cost can be defined as the distance traveled along a trajectory or the energy or time

required to execute the trajectory. In our experiments, we define C as the distance along a shortest

trajectory, C(E) = sup
∑k

i=1 dX (E(ti−1), E(ti)), where the supremum is taken over all k ∈ N

and all sequences t0 ≤ t1 ≤ t2 . . . ≤ tN in I where I ⊂ R and is a connected set. By construction,

dX satisfies the triangle inequality, and can thus be used to define the “distance” between two

points for the purposes of our algorithm. The trajectory cost has previously been used in the

context of AO planners, but in our work, we use it to define the cost-from-start (Def. 12) and

cost-to-goal (Def. 13) heuristics.

Definition 10. The clearance of a trajectory E is the maximal value δ such that

Bn
δ (E(t)) ⊂ Xfree∀t ∈ [0, tE ], where Bn

δ is the n-dimensional hyper-ball of radius δ centered

along a point in E(t).

Definition 11. Let G ≜ (V,E) be a graph data structure embedded in X representing a mo-

tion search tree. Let V and E represent the node set and edge set respectively. Let xa and

xb represent any two nodes ∈ V. Then a feasible path Γa,b between xa and xb is defined as

27


Γa,b ≜ {E1, E2...Em}, , where {E1, E2...Em} is an ordered set of m valid trajectories with Ei ∈ E

and E1(0) = xa and Em(tEm) = xb and Ei(tEi) = Ei+1(0).

Definition 12. Let Gfor ≜ (Vfor,Efor) be a graph data structure embedded in X representing

a forward search tree rooted at xstart. Vfor and Efor represent the node set and edge set re-

spectively. Let Γstart,a be a feasible path connecting xstart and xa ∈ Vfor. The cost-from-start

heuristic g of the state xa is defined as g(xa) =
∑m

i=1 C (Ei), where Ei ∈ Γstart,a, E1(0) = xstart,

Em(tEm) = xa, and m is the number of edges connecting xstart to xa.

Definition 13. Let Grev ≜ (Vrev,Erev) be a graph data structure embedded in X representing a

reverse search tree rooted at xgoal. Vrev and Erev represent the node set and edge set respectively.

Let Γa,goal be a feasible path connecting x ∈ Vrev and xgoal. The cost-to-goal heuristic h of a

state xa ∈ Vrev is defined as h(xa) =
∑n

i=1 C (Ei) , where Ei ∈ Γa,goal, E0(0) = xa, En(tEn) =

xgoal, and n is the number of edges connecting xa to xgoal.

3.2 Primitive operations of single-query SBMP

In this section, we delve into the fundamental primitive operations that underlie SBMP. We

commence by examining the operation of sampling nodes, the most basic operation in SBMP.

Nodes establish connections between feasible trajectories in trees and roadmaps.

3.2.1 Sampling nodes

We sample nodes from the state space of the robot. Our aim is to create a densely sampled

sequence of nodes [5], effectively covering every point in the state space. The simplest approach

to achieve this is by employing uniform random sampling of the state space. For spaces obtained

28


by taking the Cartesian product of individual spaces, the complete composite sample is obtained

by directly selecting random samples from each individual space. However, when dealing with

spaces incorporating rotations (e.g. SO(3) spaces) [5], the random unit quaternions need to be

sampled, as opposed to Euler angles, to ensure a truly random sample sequence. An alternative to

random sampling is dispersion sampling, which involves strategically placing samples in the state

space to minimize the largest uncovered area. Notable examples of dispersion sampling methods

include Halton [101] and Hammersley [101] methods. A more comprehensive exploration of

sampling nodes can be found in [5]. In our work, we employ uniform random sampling in all of

our proposed algorithms.

In the following section, we will discuss the Nearest Neighbor (NN) operation, a crucial

operation within most SBMP, responsible for determining the next node to expand in a growing

tree.

3.2.2 Nearest neighbor (NN)

The nearest neighbor (NN) refers to the problem of finding a nearest point to a query point

in a given set according to some similarity metric. If V is the input set, xquery is the given point,

dM is the similarity metric, then NN(xquery) is defined as

NN(xquery) ≜ argmin
x∈V

dM (x, xquery) (3.2)

This problem has been extensively studied in the computer science literature owing to its

relevance in various applications such as computer graphics and data mining [102, 103]. The

most straightforward approach to address this problem is through a linear search (O(n)), which

29


involves iterating over all the elements in a given set to find the nearest one. However, for mo-

tion planning algorithms that require swift operations, employing specialized spatial data struc-

tures like k-dimensional (k-d) trees [102] improves the computational complexity from O(n) to

O(log(n)). The motion planning algorithms we propose do not require an exact nearest neighbor

search, as is the case with other proposed SBMP algorithms [31, 52]. They can also work with

‘approximate’ solutions to these operations [104–106]. Using the definition given in [52], a node

y is an ϵ-approximate neighbor of x if ||y − x|| ≤ (1 + ϵ)||z − x|| where z is the true nearest

neighbor of x.

Next, we will delve into the Range-search (RS) operation. This operation is commonly

employed to locate a node that fulfills certain criteria within a hyper-ball characterized by a

specific radius, determined either by a user-chosen or algorithmically determined radius.

3.2.3 Range search (RS)

Range Search refers to the problem of finding all points within a ball of input radius cen-

tered an input query point. If xquery is the query point, r is the ball radius, V is the input point

set, then RS(xquery, r) ⊆ V can be defined as

RS(xquery, r) ≜ {x ∈ V | dM(xquery, x) ≤ r} (3.3)

Using k − d trees, this query can be done in O(n1− 1
d+m) [52] where m is the number of

points returned. Like NN, our motion planning algorithms does not require exact RS query and

will work with approximate solutions.

A data structure that supports fast dynamic insertion, NN and approximate RS operations is

30


the Skip Quadtree [106]. The skip data structure supports both new point insertions and look-ups

in expected time logNT . This data structure returns all points within range rk are returned plus

possibly some additional points that are further than rk but within rk(1 + ε) for a user defined ε.

The approximate range query returns ν points in expected time O(ε1−d log(NT ) + ν).

Next, we will discuss about the collision-check operation, one of the most time-consuming

operation in SBMP.

3.2.4 Collision check

A collision-check operation checks if a robot configuration collides with an obstacle or not.

This operation can be represented as a logical predicate [5], denoted as ϕ : X → TRUE, FALSE.

When a state x ∈ Xobs, the predicate evaluates to TRUE, indicating a collision, whereas for states

outside obstacle regions, the predicate evaluates to FALSE.

Collision-checking is a major bottleneck in motion planning algorithms [107,108] because

they typically involve checking for intersection between non-uniform geometric shapes. Adding

to this challenge, collision checks must be executed in every iteration, contributing to increasing

computational complexity as the number of iterations increases. Finally, sampling-based motion

planning typically involve high-dimensional state spaces with collision-checking increasing in

complexity with each added state. Consequently, collision checking becomes particularly com-

putationally demanding in high-dimensional spaces.

Typically in sampling-based motion planning, we also require a check if a trajectory col-

lides with an obstacle or not. A naive way to implement trajectory’s collision check is to sample

the trajectory at a pre-defined resolution ∆x and subsequently subjecting the resulting configura-

31


tions to collision checks. However, this method comes with its drawbacks. If ∆x is excessively

small, it leads to redundant configuration checks, squandering valuable time. Conversely, with an

overly large ∆x, the risk emerges that the robot might collide with intricate or narrow obstacles.

There are more efficient algorithmic ways to perform collision check than the naive way which

can be found in [5] [109].

3.2.5 TPBVP for motion planning

Given a n-dimensional, first-order differential equation (3.1) which we restate for conve-

nience,

ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U , (3.4)

with constraints

x(t) ∈ Xfree ∀t

||x(n)(t)|| ≤ βn

(3.5)

The first constraint implies collision free planning. The second constraint implies bounds

on the higher-order derivatives (e.g. velocities, accelerations, etc) owing to the robot’s physical

limitations. If the initial time is ti and the final time is tf , the objective of solving the TVBVP is

to come up with a control input u(t) such a we find a trajectory E(t) that exactly connects x(ti)

and x(tf ) i.e. E(0) = x(ti) and E(1) = x(tf ), subject to the constraints.

Continuing, we will delve into forward propagation, a crucial operation utilized to bypass

the direct solving of TPBVP in SBMP.

32


3.2.6 Forward propagation

A common operation among all our proposed algorithms is conducting forward propagation

(as known as Monte-Carlo propagation or random propagation), as outlined by Kleinbort et al.

[30] and Li et al. [31]. This technique serves as the foundation for generating viable trajectories in

our algorithms. In this approach, we are basically solving the IVP [24] assuming a random input

u, a time resolution parameter ∆t and max propagation time tmax. The IVP is effectively solved

through the utilization of numerical integration methods. While various numerical integration

approaches exist, our implementations primarily leverage the Euler and Runge-Kutta (RK-4)

[110] integration methods. A single operation of Euler integration is given as

xnew ← xinit + f(x, u)∆t (3.6)

where xinit is the initial condition and xnew is the final condition. Although Euler integration

offers advantages of simplicity and computational efficiency, they cause inaccuracy, stability

issues and cumulative errors in many problems. To overcome these issues, we employ RK-4.

A single operation of RK-4 integration is

f1 ← f(x, u)

f2 ← f(x+∆tf1/2.0, u)

f3 ← f(x+∆tf2/2.0, u)

f4 ← f(x+∆tf3, u)

xnew ← xinit +∆t(f1 + 2.0f2 + 2.0f3 + f4)/6.0

(3.7)

33


where f1, f2, f3, and f4 are intermediate outputs involving derivatives. RK-4 balances precision

and computational cost. If a random control is used, then we perform random propagation which

endowes probabilistic completeness property to a feasible algorithm.

Finally, we will discuss the best-input propagation operation, commonly used for perform-

ing exploitation search in SBMP.

3.2.7 Best-input propagation

All our algorithms incorporate heuristics to expedite motion planning. Employing a heuris-

tic typically entails generating a trajectory whose final configuration closely approaches a ‘heuris-

tic’ target configuration. Given that we operate under the assumption that solving the TPBVP is

not feasible, exact overlap of the final configuration trajectory to the target configuration cannot

be assumed.

In cases where closed-loop control is available, the controller plays a pivotal role in devis-

ing controls that guide the trajectory toward proximity with the target configuration. However,

in the context of our SBMP approach, open-loop control predominates. Consequently, under

open-loop control, the most viable strategy is to generate a set of NB Monte-Carlo trajectories

(Section 3.2.6) and subsequently select the trajectory whose endpoint best approximates the tar-

get configuration. The degree of ‘closeness’ to the target configuration is evaluated based on the

distance between the terminal configuration of the trajectory and the target configuration, using

the designated distance metric dX .

Let E = {E1, E2...ENB
} be a list of random NB propagated trajectories and xtarget be a

target configuration. The best-input Ebest−input trajectory is calculated as

34


Ebest−input ← argmin
E∈E

(dX(E .f inalNode(), xtarget)) (3.8)

The value of NB determines the time taken to execute the blossom propagation operation

[38,111] (Fig. 3.1). Opting for a larger value of NB facilitates the generation of larger number of

trajectories thereby increasing the likelihood of generating a trajectory that closely approaches the

target configuration. Conversely, a smaller value of NB could potentially result in the generation

of trajectories that have not been propagated in alignment with the intended direction of the target

configuration. Therefore, determining an appropriate value of NB involves finding a balance

between accuracy and computational efficiency.

Blossom propagation for different values of NB

𝑥𝑖𝑛𝑖𝑡

𝑥𝑡𝑎𝑟𝑔𝑒𝑡

𝑥𝑖𝑛𝑖𝑡

𝑥𝑡𝑎𝑟𝑔𝑒𝑡 𝑥𝑡𝑎𝑟𝑔𝑒𝑡

𝑥𝑖𝑛𝑖𝑡

𝑁𝐵 = 5 𝑁𝐵 = 7 𝑁𝐵 = 10

Small 𝑁𝐵 value Appropriate 𝑁𝐵 value Large 𝑁𝐵 value

Figure 3.1: NB = 7 balances accuracy and computation cost

35


Part II

Single-agent motion planning without two-point boundary value solution

36


Chapter 4: Bidirectional sampling based motion planning without TPBVP so-

lution

4.1 Introduction

Feasible algorithms find initial solutions quickly without considering solution quality. They

are important because they can be used to quickly solve problems such as the Alpha Puzzle [12],

which are high-dimensional and complex, and solving them quickly is considered an achieve-

ment. Furthermore, feasible planning methods are computationally less expensive than AO algo-

rithms and can be used in computationally constrained systems. Feasible algorithms are different

from AO planning algorithms where priority is given over solution quality (efficient paths) rather

than initial solution time.

Single-query sampling-based feasible motion planning algorithms often utilize unidirec-

tional or bidirectional search strategies [5] for planning. Unidirectional search involves the con-

struction of a single search tree that expands from the start state towards the goal state or vice

versa. In contrast, bidirectional search employs two search trees: one expanding forward from

the start state and the other expanding backward from the goal state (Fig. 4.1).

Bidirectional searches are typically preferred over unidirectional searches (Fig. 4.2) for

fast feasible motion planning due to their capability to explore the state space much more rapidly

37


than unidirectional searches [5]. The expedited solution discovery can be attributed to several fac-

tors, including the utilization of parallel processing, the elimination of unpromising possibilities

through pruning, and the exploration of the state space from two distinct directions.

Different types of searches

Start Start StartGoal Goal Goal

Unidirectional Search Bidirectional Search Bidirectional Search
(Our approach)

Forward Search Reverse Search Forward Search (Heuristic)
Obstacle

Figure 4.1: Comparison of unidirectional and bidirectional searches commonly used in single-
query feasible planning, alongside our bidirectional search approach, where the forward search
uses a heuristic from the reverse search to reach the goal.

However, in bidirectional search, it is common to connect the forward and reverse trees

when they approach each other closely. This connectivity implies a solution to the TVBVP for the

considered dynamical system, which is not universally achievable. To overcome this limitation,

this chapter introduces bidirectional search strategies that remain effective even when TPBVP

solutions are not available.

We make the assumption in this chapter that the workspace remains static, with known

obstacles, and that the initial and goal states, along with the robot’s designated goal region, are

provided. However, it is crucial to emphasize that our motion planning is conducted within the

robot’s state space, where the specific arrangement of obstacles is unknown. In the scope of this

work, we do not tackle scenarios involving dynamic obstacles for which TPBVP are unavailable.

This aspect represents an ongoing research domain.

The main contribution of this chapter is an novel bidirectional single-query feasible

sampling-based motion planning algorithm called Generalized Bidirectional

38


Comparative performance of unidirectional vs. bidirectional search in higher dimensions
Without Obstacle With Obstacle

Figure 4.2: The left graph was generated without any obstacles, whereas the right graph was
generated with a hypersphere of radius one serving as the obstacle, positioned at the center of the
workspace.

Availability and absence of TPBVP in bidirectional search

Point 1 Point 2 Point 1 Point 2

2-point BVP solution exists

Tree-tree connection is easy Tree-tree connection is hard or  

`

(Not what we study)  impossible (Topic of this paper)

No 2-point BVP solution exists

Forward Search Reverse Search Obstacle

Figure 4.3: Figures show instances where the TPBVP can and cannot be solved for connecting
the forward and reverse trees in a bidirectional search.

Rapidly-exploring Random Tree (GBRRT) that does not require solving the TPBVP (Fig. 4.3).

The term ‘Generalized’ implies that this algorithm can be applied to many systems including

systems that are: holonomic, non-holonomic, and/or kinodynamic. This is the first approach

that considers bidirectional search when the TPBVP solutions are not available. Our method

(Fig. 4.5) differs from the bidirectional RRT-Connect [37] approach in that GBRRT does not try

to connect the two trees when they are close; instead, it uses the heuristic information provided

by the reverse search tree to help the forward tree quickly grow towards the goal configuration.

39


Difference between GBRRT and GABRRT

Forward Search Tree (Explore) Reverse Search Tree

Start

Goal

Start

Goal

Goal RegionForward Search Tree (Exploit)

GBRRT
Dynamics (Curved Trajectories) 

in Reverse Tree

GABRRT
No Dynamics (Straight Trajectories) 

in Reverse Tree

Figure 4.4: The figures illustrate our proposed algorithms: GBRRT (with dynamics in the reverse
tree) and GABRRT (with no dynamics in the reverse tree).

Another contribution of this chapter is an asymmetric variant of GBRRT called General-

ized Asymmetric Bidirectional Rapidly-exploring Random Tree (GABRRT) that uses a naive

reverse search with no dynamical trajectories in the reverse tree (Fig. 4.4) to provide heuristic

information to the forward tree; GABRRT performs better than GBRRT in certain conditions

(Section 4.4.3). We prove that GBRRT and GABRRT are probabilistically complete [5]. We run

multiple experiments in simulation to evaluate the performance of GBRRT and GABRRT versus

seven other algorithms across six different dynamical systems. We also run hardware experiments

in a quadrotor test-bed to evaluate GABRRT in a more realistic physical setting.

40


GBRRT algorithm applied to a maze environment

Start

Goal

Iteration 1

Start

Goal

Iteration 400

Start

Goal

Iteration 750

Start

Goal

Iteration 800

Start

Goal

Iteration 880

Reverse Search Tree (Explore) - Cost To Goal heuristic

Free Space Obstacle Space Goal Region

0 250

Initial Feasible PathForward Search Tree (Explore)
Forward Search Tree (Exploit using heuristic provided by reverse search)

`

Figure 4.5: Proposed GBRRT algorithm, employing the forward search tree to utilize a combi-
nation of exploration and exploitation (heuristic provided by the reverse search tree) search to
quickly find a feasible path in a maze.

4.2 Preliminaries

We use the trajectory definition in Def. 6 in our work. Note that this definition is valid for

trajectories generated during forward search. We generate reverse tree trajectories by performing

backwards integration of system (3.1).

In the specific implementations used in our experiments, tE ∈ [0, Tmax] where Tmax is a

user-defined maximum propagation time. We randomly vary tE but apply a constant control input

for the entire duration of tE as done in [30, 31]. Although constant control is useful in practice

41


due to the discretization inherent in digital controllers [31], the application of constant control is

not a general requirement of our proposed algorithms.

GBRRT and GABRRT only require a distance function that satisfies the triangle inequality

property as defined in Def. 8. We also define dNDX as the distance function on a lower-dimensional

subspace of X where the higher-order dynamics (velocity or accelerations) are not included. The

superscript ND refers to “No Dynamics”. GABRRT (Algorithm 12) uses this distance function to

generate the cost-to-go-heuristic from a non-dynamical reverse tree (GBRRT, Algorithm 1, does

not). We use the cost definition in Def. 9. By construction, dX satisfies the triangle inequality, and

can thus be used to define the “distance” between two points for the purposes of our algorithm.

4.3 Problem definition

We state the assumptions, constraints and the problem to be solved for feasible motion

planning.

Assumptions:

We assume system dynamics of the form (3.1) which we restate for convenience.

ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U . (4.1)

System (??) is assumed to be Lipschitz continuous in both state and input arguments if

there exists Kx, Ku > 0 such that for all x1, x2 ∈ X , u1, u2 ∈ U :

42


∥f(x1, u1)− f(x2, u1)∥ ≤ Kx∥x1 − x2∥,

∥f(x1, u1)− f(x1, u2)∥ ≤ Ku∥u1 − u2∥,
(4.2)

We assume a static environment with pre-known static obstacles i.e. Xfree does not change

over time.

Constraints:

x(t) ∈ Xfree(t) ∀t

||x(n)(t)|| ≤ βn

Γstart,g(0) = xstart

Γstart,g(1) ∈ Xgoal

(4.3)

We consider states as valid only if it lies in free space. We constraint the magnitude of the higher

order dynamics to be less than a constant βn due to the physical limitations of the system. We

constraints the start of feasible path at xstart and end of the feasible path in Xgoal.

Solve: Single-query feasible motion planning

Given (xstart, Xfree, Xgoal), find a Γstart,g if one exists.

4.4 Algorithm description

The main idea behind our proposed algorithms is to use heuristic information from the

reverse search tree to guide the forward search tree’s advancement towards the goal. The search

begins by using the forward and reverse search trees to explore different parts of the search space

43


starting from both the start and goal states. Once the two trees encounter each other, the forward

tree combines ongoing exploration with an exploitation strategy that leverages the cost-to-goal

values stored in the reverse tree to direct the search. This causes the forward search tree to

maneuver around obstacles and grow towards the goal quickly.

A design choice is made in our algorithms to maintain a feasible but not a (near-) opti-

mal reverse tree, even though refining the reverse tree toward optimality could arguably provide

a more focused guiding heuristic after the two trees become intertwined. Instead, because our

algorithms are designed to solve the feasible motion planning problem, computational effort is fo-

cused on expediting the trees’ mutual encounters by exploring the free space—so that the guiding

heuristic can be used as soon as possible.

In Section 4.4.1, we present the proposed GBRRT algorithm. In Section 4.4.2, we provide

the general outline for the exploration and exploitation searches utilized in GBRRT. Finally, we

present the GABBRT algorithm in Section 4.4.3.

4.4.1 Proposed Generalized Bidirectional RRT (GBRRT)

4.4.1.1 Inputs

The inputs to the GBRRT (Algorithm 1) include the scenario specific start node xstart, goal

node xgoal, and goal region Xgoal; the system specific control input set U ; as well as a max propa-

gation time Tmax (maximum duration of a trajectory); and two user defined parameters δhr and P .

The parameter δhr defines the maximum distance within which nodes from the reverse tree have

a focusing effect on the growth of the forward tree, where 0 < δhr <∞ (hr stands for heuristic

radius). The parameter P defines the algorithm’s balance between performing exploration and

44


Algorithm 1: GBRRT(xstart, xgoal,Xgoal,U , Tmax, δhr,P)
1 Gfor ← {Vfor ← {xstart},Efor ← {}}
2 Grev ← {Vrev ← {xgoal},Erev ← {}}
3 Q← {}
4 for k ← 1 to Miter do

5 rk ← min(γ
(

log |Vrev|
|Vrev|

) 1
d+1

, δhr)

// Reverse Tree Expansion (lines 6-11)
6 Erev ← RevSrchFastExplore(Grev,U , Tmax)
7 if not CollisionCheck(Erev) then
8 xrev ← Erev.initialNode()
9 Erev ← Erev ∪ {Erev}

10 Vrev ← Vrev ∪ {xrev}
11 updatePriorityQueue (Gfor,Q, xrev, rk)

// Forward Tree Expansion (lines 12-30)
12 q ← P(k)
13 crand ∼ U ([0, 1])
14 if crand < q then
15 Efor = NULL

// Pop node from Q
16 xpop ← Pop (Q)
17 if xpop ̸= NULL then
18 Efor ← ForSrchExploit (Grev,U , Tmax, xpop, rk)

19 if Efor = NULL then
20 Efor ← ForSrchFastExplore (Gfor,U , Tmax)

21 if crand ≥ q or Efor = NULL then
// For Probabilistic completeness

22 Efor ← ForSrchRandomExplore (Gfor,U , Tmax)

23 if Efor ̸= NULL then
24 if not CollisionCheck(Efor) then
25 xfor ← Efor.finalNode()
26 Efor ← Efor ∪ {Efor}
27 Vfor ← Vfor ∪ {xfor}
28 if goalRegionReached(Xgoal, xfor) then
29 return Path(Gfor, xstart, xfor)

30 insertToPriorityQueue (Grev,Q, xfor, rk)

31 return NULL

exploitation during forward search, where 0 ≤ P ≤ 1 and, with a slight abuse of notation, 0 and

1 refer to the zero and one function, respectively. δhr and P are discussed more in Section 4.5.2.

4.4.1.2 Initialization

The forward search tree Gfor is initialized line 1 using the node set Vfor (initialized to

xstart) and edge set Efor. The reverse tree Grev is initialized line 2 using the node set Vrev

(initialized to xgoal) and edge set Erev line 2. We use a priority queue Q to maintain a list of

potential forward tree nodes for quick expansion line 3.

45


4.4.1.3 Shrinking neighborhood radius rk

Shrinking the radius rk of the neighborhood in which the two trees influence each other

is necessary to balance the competing desires for a quick expected per-iteration runtime (re-

quiring smaller rk) with almost sure (A.S.) asymptotic graph connectivity between xstart and

xg ∈ Xgoal (requiring larger rk) [52, 112]. Functions for rk that have previously appeared in the

sampling based motion planning literature include rk = min(γ( log |Vrev|
|Vrev| )

1/d
, δhr) from [52] and

rk = min(γ( log |Vrev|
|Vrev| )

1/(d+1)
, δhr) from [112], where d is the dimension of the system and γ is a

scenario specific percolation-theoretic parameter1. In Appendix 4.6.2, we analyze the expected

per-iteration runtime that results from using either function as well from using a constant rk.

In our experiments, we calculate rk using the expression from [112], we discuss why this

rate was chosen in Appendix 4.6.2. rk is used in the insert and update Q operations line 11 and

line 30 and exploitation search (line 18).

4.4.1.4 Reverse tree expansion

We expand the reverse search tree in lines 6-11. The reverse tree uses a pure explo-

ration strategy for expansion using RevSrchFastExplore (line 6). Its purpose is to explore the

state space as fast as possible and get close to and provide a focusing heuristic to the forward

tree. The edge Erev returned by RevSrchFastExplore (line 6) is checked for collision using

1Regarding γ, if rk is defined as in [52], then A.S. graph connectivity requires γ ≥ (2((1 + 1
d )

1/d)
|Xfree|

ζd
)
1/d

,
where |Xfree| is the volume of free space and ζd is the volume of the unit d-ball [52]. A new proof in [112] uses
γ ≥ (2 + θ)( (1+ϵ/4)c∗

(d+1)θ(1−µ)
|Xfree|

ζd
)

1
d+1 , where ϵ ∈ (0, 1) and θ ∈ (0, 1/4), and µ > 0 , and c∗ is the A.S. convergent

length of the path as n → ∞. Both bounds on γ are of a theoretical interest rather than prescriptive because we
typically do not know |Xfree| or c∗, though we may have bounds on them. Practical selection of γ requires making
an educated guess and refining if necessary. Anecdotally, we observe that the range of γ producing decent results is
large in practice.

46


CollisionCheck (line 24). If no collision exists, then Erev and its corresponding final node xrev

are added to Erev (line 9) and Vrev (line 10) respectively. xrev is utilized to perform a heuristic up-

date in Q using updatePriorityQueue function (line 11). We describe updatePriorityQueue

(Algorithm 3) as well as insertToPriorityQueue (Algorithm 2) later in this section.

4.4.1.5 Forward tree expansion

We expand the forward search tree in lines 12-30. The forward search uses a combi-

nation of exploitation (line 18) and exploration (line 20, line 22) strategies to expand the tree.

The exploitation strategy uses the reverse tree’s cost information for focused expansion, while

the exploration strategy uses both random exploration (for probabilistic completeness) and fast

exploration (Fig. 4.7). These strategies are discussed in detail in Section 4.4.2.

The choice of whether exploitation is performed in the current iteration is determined by

P which outputs the probability q based on k’s value (line 12). q is compared to a value crand

generated uniformly at random on [1, 0] (line 13). If crand < q, the edge is generated by an

exploitation process by popping the forward tree node xpop from Q (line 16) and then using

ForSrchExploit (line 18). This routine chooses the ‘best’ reverse tree node and generates a

trajectory towards xpop to make focused progress towards the goal.

The condition of xpop = NULL implies Q is empty. This may happen, e.g.,, near the begin-

ning of the search when the forward tree has not yet encountered the reverse tree. If xpop = NULL,

the algorithm defaults to performing fast exploration using ForSrchFastExplore ((line 20)).

This function is designed to quickly explore the state space (see Fig. 4.7). If crand ≥ q or

Efor = NULL (where Efor = NULL implies both exploitation and fast exploration were unsuccess-

47


ful), the algorithm performs a random exploration using ForSrchRandomExplore (line 22). The

use of ForSrchRandomExplore provides probabilistic completeness. Using only the exploitation

and fast exploration may produce an algorithm that is not probabilistically complete. Probabilis-

tic completeness is further discussed in Section 4.6.

The edge Efor is checked for collision (line 24) and if no collision exists, Efor and its

corresponding final node xfor are added to Efor (line 26) and Vfor (line 27) respectively. If

xfor ∈ Xgoal, checked using goalRegionReached (line 28), then the output path is returned

using Path (line 29).

Insert and update operations of Q

𝑥𝑐𝑙𝑜𝑠𝑒𝑠𝑡

ℰ𝑓𝑜𝑟

Forward Search Tree
Reverse Search Tree

New Forward Tree Edge ( ℰ𝑓𝑜𝑟)

𝑥𝑓𝑜𝑟

Start

Goal

Q
𝑟𝑘

INSERT

Q

UPDATE

Start

Goal

𝑟𝑘

New Reverse Tree Edge (ℰ𝑟𝑒𝑣) 

Insert to Q during forward 
search  

Update Q during reverse 
search 

Figure 4.6: On the left, a new forward tree node, labeled as xfor, is inserted into the priority
queue Q with a key value of dX(xfor, xclosest) + h(xclosest) during the forward tree expansion.
On the right, you can observe the cost update process for the same forward tree node within Q
during a reverse search expansion.

48


Algorithm 2: insertToPriorityQueue(Grev,Q, xfor, rk)

1 xclosest ← Grev.NearestNeighbor(xfor)
2 if xclosest ≤ rk then
3 Q.insert (xfor, dX(xfor, xclosest) + h(xclosest))

4.4.1.6 Priority queue Q insert

GBRRT (Algorithm 1, line 30) adds the new forward tree node xfor to Q (Fig. 4.6) us-

ing insertPriorityQueue (Algorithm 2). This algorithm first determines the closest node

xclosest ∈ Grev to xfor within a ball of radius rk (line 1). If xclosest ̸= NULL, we use the key

value dX(xfor, xclosest) + h(xclosest) to push xfor to Q (line 3). The cost includes the actual cost

of reaching xfor plus the heuristic estimate of reaching the goal from xfor. The condition of

xclosest = NULL is true if the forward tree has not encountered the reverse tree.

Note that xfor is pushed to Q during the forward tree’s exploration and exploitation step

but only popped from Q during the exploitation step. This necessitates the use of a priority queue

Q in GBRRT to store the set of potential forward tree nodes obtained from both these steps and

pop the best one for the current exploitation step. 2

Algorithm 3: updatePriorityQueue(Gfor,Q, xrev, rk)

1 xclosest ← Gfor.NearestNeighbor(xrev)
2 if xclosest ≤ rk then
3 Q.update (xclosest, dX(xclosest, xrev) + h(xrev))

2A design choice worthy of mention is that we do not reinsert the node back into Q once it is popped (line 16
in Algorithm 1). We found through experiments that although reinserting the node helps in some cases, there may
be cases where it causes the performance to decrease. An example is reinserting a node that is close to the goal
but blocked by an obstacle. There may be intelligent ways of re-pushing such a node when certain conditions are
satisfied, rather than naively re-pushing them all the time. This could be a direction for future research.

49


4.4.1.7 Priority queue Q update

The heuristic update in Q during the reverse search (Fig. 4.6) occurs in

updatePriorityQueue (Algorithm 3). The new reverse tree node xrev obtained from expanding

the reverse tree is used to select the closest forward tree node xclosest within a ball of radius

rk (line 1). If xclosest ̸= NULL, we update the heuristic of xclosest and corresponding key value

dX(xclosest, xrev)+h(xrev) (line 3) only if the new value is less than the old key value in Q. While

insertPriorityQueue adds a new (forward tree) node to Q, updatePriorityQueue updates

the key value and position of a (forward tree) node already in Q.

4.4.1.8 Naive goal biasing vs. Cost-To-Goal heuristic

Naive goal biasing [37], a common approach where the goal node is s