ABSTRACT Title of Dissertation: FAST FEASIBLE MOTION PLANNING WITHOUT TWO-POINT BOUNDARY VALUE SOLUTION Sharan Nayak Doctor of Philosophy, 2023 Dissertation Directed by: Professor Michael W. Otte Department of Aerospace Engineering Autonomous robotic systems have seen extensive deployment across domains such as man- ufacturing, industrial inspection, transportation, and planetary surface exploration. A crucial requirement for these systems is navigating from an initial to a final position, while avoiding potential collisions with obstacles en route. This challenging task of devising collision-free tra- jectories, formally termed as motion planning, is of prime importance in robotics. Traditional mo- tion planning approaches encounter scalability challenges when planning in higher-dimensional state-spaces. Morever, they rarely consider robot dynamics during the planning process. To ad- dress these limitations, a class of probabilistic planning methods called Sampling-Based Motion Planning (SBMP) has gained prominence. SBMP strategies exploit probabilistic techniques to construct motion planning solutions. In this dissertation, our focus turn towards feasible SBMP algorithms that prioritize rapidly discovering solutions while considering robot kinematics and dynamics. These algorithms find utility in quickly solving complex problems (e.g, Alpha puzzle) where obtaining any feasible solution is considered as an achievement. Furthermore, they find practical use in computationally constrained systems and in seeding time-consuming optimal solutions. However, many existing feasible SBMP approaches assume the ability to find precise tra- jectories that exactly connect two states in robot’s state space. This challenge is framed as the Two-Point Boundary Value Problem (TPBVP). But finding closed-form solutions for the TPBVP is difficult, and numerical approaches are computationally expensive and prone to precision and stability issues. Given these complexities, this dissertation’s primary focus resides in the devel- opment of SBMP algorithms for different scenarios where solving the TPBVP is challenging. Our work addresses four distinct scenarios – two for single-agent systems and two for multi-agent systems. The first single-agent scenario involves quickly finding a feasible path from the start to goal state, using bidirectional search strategies for fast solution discovery. The sec- ond scenario focuses on performing prompt motion replanning when a vehicle’s dynamical con- straints change mid-mission. We leverage the heuristic information from the original search tree constructed using the vehicle’s nominal dynamics to speed up the replanning process. Both these two scenarios unfold in static environments with pre-known obstacles. Transitioning to multi- agent systems, we address the feasible multi-robot motion planning problem where a robot team is guided towards predefined targets in a partially-known environment. We employ a dynamic roadmap updated from the current known environment to accelerate agent planning. Lastly, we explore the problem of multi-robot exploration in a completely unknown environment applied to the CADRE mission. We demonstrate how our proposed bidirectional search strategies can facilitate efficient exploration for robots with significant dynamics. The effectiveness of our al- gorithms is validated through extensive simulation and real-world experiments. FAST FEASIBLE MOTION PLANNING WITHOUT TWO-POINT BOUNDARY VALUE SOLUTION by Sharan Nayak Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Michael Otte, Chair/Advisor Professor Pratap Tokekar, Dean’s Representative Professor Derek Paley Professor Huan Xu Professor Jeffrey Herrmann © Copyright by Sharan Nayak 2023 Acknowledgments Firstly, I would like to express my sincerest gratitude to my advisor, Dr. Michael Otte, for providing me with the invaluable opportunity to work in the Motion and Teaming (MO-T) Laboratory. Over the past five years, it has been a fascinating experience learning from him how to conduct original scientific research, write research papers, and prepare and present scientific work. Additionally, Dr. Otte gave me the freedom to determine the direction of my research while offering guidance when needed. I am thankful to him for extending discussions beyond meeting hours and engaging in stimulating research conversations, as well as for addressing concerns raised by journal reviewers. Lastly, I am appreciative of his efforts in facilitating my internship at the NASA Jet Propulsion Lab (JPL), California Institute of Technology. I extend my gratitude to Dr. Derek Paley, Dr. Jeffrey Herrmann, Dr. Huan Xu, and Dr. Pratap Tokekar for agreeing to be part of my dissertation committee and for providing guidance and feedback during the final phase of my Ph.D. journey. I am particularly thankful to Dr. Paley for his instruction in Advanced Dynamics (ENAE646) and Non-linear Control (ENAE743), as the knowledge gained from these courses greatly contributed to my Ph.D. research. My appreciation also goes to Dr. Herrmann and Dr. Xu, as well as to Dr. Azarm for their constructive feedback and support during the multi-robot task allocation project and the preparation of the Robotics and Automation Letters (RAL) paper. I thank Dr. Tokekar for his leadership and support dur- ing my voluntary participation at the WAFR-2022 conference. Lastly, a special thanks goes out ii to Professor Mount for teaching the Computational Geometry (CMSC756) class as the insights gained from this class become useful in implementing efficient collision-check algorithms, and the multi-robot exploration algorithms for the CADRE mission. I am grateful to my collaborators at UMD, including Khalid, Estefany, Suyash, Elliot, and Ruchir. It was a pleasure working together on the multi-robot task allocation project. I also ex- tend my thanks to my lab-mates (Khalid, George, Dalan, Jack, Alkesh, Yi-Hsuan, Alex, Ollie, Rachel) for their friendship, motivating suggestions, and insightful questions during presenta- tions. Special thanks to Nitin Sanket and Chahat Deep Singh from the Perception and Robotics (PRG) group for their generosity in lending the Bebop 2 quadrotor for my various hardware ex- periments. Lastly, I appreciate the efforts of the numerous anonymous reviewers who contributed to the significant improvement of my manuscripts from their preliminary versions. I would like to express my gratitude to the members of JPL, notably Federico Rossi, Michael Paton, JP La-Croix, Roland Brockers, Grace Lim, Dustin Aguilar, Libby Boroson, Amir Rahmani, and Hari Nayar, and many others, for their guidance, support, and the opportunities I had while collaborating with them during two internships for nearly two years. I am particularly grateful to Michael Paton for assisting me with initial multi-robot experiments at the Mars Mini Yard and as well as for generously providing transportation to and from JPL. Furthermore, I wish to express my appreciation to Federico for arranging my second internship at JPL. His support not only provided me an opportunity to run multi-robot experiments using the Mercury-7 rovers but also develop actual flight code for the CADRE rovers which are destined for Lunar exploration. I would like to express my appreciation to my parents, brother, uncles, aunts, cousins, and friends (especially Abe Rajan) for their continuous inspiration and unwavering motivation during my Ph.D.. iii Lastly, I want to acknowledge the financial support provided by the Clark Doctoral Fel- lowship, the Minta Martin Research Fellowship Fund, and the U.S. Office of Naval Research (ONR) grant N00014-20-1-2712 without which this dissertation would not have been possible. The research for algorithms proposed in Chapter 6 and 7 was carried out at the Jet Propulsion Laboratory, California Institute of Technology and was sponsored by the National Aeronautics and Space Administration (NASA) under Grant 80NM0018D0004. iv Table of Contents Acknowledgements ii Table of Contents v List of Tables ix List of Figures x List of Abbreviations xii I Introduction 1 Chapter 1: Introduction 2 1.1 Motivation for feasible algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Feasible planning without Two-Point Boundary Value Solution (TPBVP) . . . . . 5 1.3 Contributions to the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Fast feasible motion planning . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Motion replanning under dynamics variation . . . . . . . . . . . . . . . . 8 1.3.3 Multi-agent motion planning . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Multi-robot exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Outline of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Literature review 12 2.0.1 Solving TPBVP in sampling-based motion planning . . . . . . . . . . . . 12 2.0.2 Use of heuristics in sampling-based motion planning . . . . . . . . . . . 14 2.1 Single-agent motion planing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Bidirectional strategies for fast motion planning . . . . . . . . . . . . . . 15 2.1.2 Motion replanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Multi-robot motion planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Roadmap based approaches for multi-robot motion planning . . . . . . . 18 2.2.2 Multi-robot motion planning in dynamic environments . . . . . . . . . . 19 2.2.3 Multi-robot exploration algorithms . . . . . . . . . . . . . . . . . . . . . 20 2.2.4 Map-partition based frontier-based exploration . . . . . . . . . . . . . . 21 2.2.5 Communication in multi-robot exploration . . . . . . . . . . . . . . . . . 22 Chapter 3: Technical background 23 v 3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Primitive operations of single-query SBMP . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Sampling nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Nearest neighbor (NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Range search (RS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.4 Collision check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.5 TPBVP for motion planning . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.6 Forward propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.7 Best-input propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 II Single-agent motion planning without two-point boundary value solution 36 Chapter 4: Bidirectional sampling based motion planning without TPBVP solution 37 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4.1 Proposed Generalized Bidirectional RRT (GBRRT) . . . . . . . . . . . . 44 4.4.2 General outline of exploration and exploitation searches . . . . . . . . . 50 4.4.3 Generalized Asymmetric Bidirectional RRT (GABRRT) . . . . . . . . . 54 4.5 GBRRT specific implementation of subroutines used in our experiments . . . . . 55 4.5.1 Specific edge-generation types . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.2 Parameters of GBRRT/GABRRT . . . . . . . . . . . . . . . . . . . . . . 59 4.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6.1 Probabilistic completeness . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.2 Computational complexity and shrinking d-ball rate . . . . . . . . . . . . 65 4.7 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.7.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7.2 Statistics chosen for GBRRT insights . . . . . . . . . . . . . . . . . . . 82 4.7.3 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.8 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.8.1 Simulation experiment results . . . . . . . . . . . . . . . . . . . . . . . 85 4.8.2 NBRRT comparison results . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.8.3 Interpretation of GBRRT experimental statistics . . . . . . . . . . . . . . 88 4.8.4 Hardware experiment results . . . . . . . . . . . . . . . . . . . . . . . . 89 4.8.5 Limitations of our proposed algorithms . . . . . . . . . . . . . . . . . . 90 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 5: Motion replanning under changed dynamical constraints without TPBVP solution 94 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 vi 5.3.1 Selection of heuristic radius δhr . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 III Multi-agent motion planning without two-point boundary value solution 110 Chapter 6: Dynamic multi-robot motion planning without TPBVP solution 111 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.4 Framework architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.5 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.5.1 Algorithm component that runs on the base station . . . . . . . . . . . . 118 6.5.2 Algorithm component that runs on an agent . . . . . . . . . . . . . . . . 120 6.5.3 Heuristic guided feasible sampling-based motion planning . . . . . . . . 121 6.5.4 Handling target-reached condition . . . . . . . . . . . . . . . . . . . . . 126 6.5.5 Robot-obstacle and robot-robot collision avoidance . . . . . . . . . . . . 126 6.6 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.6.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.6.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.7 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Chapter 7: Multi-robot exploration without TPBVP solution 137 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.3 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.3.1 Map-partition and assignment . . . . . . . . . . . . . . . . . . . . . . . 142 7.3.2 Robot-partition assignment . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.3.3 Boundary computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.3.4 Agent-based exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.4 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.4.1 Simulation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.4.2 Hardware experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.5 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.6 Multi-robot exploration without TPBVP solution . . . . . . . . . . . . . . . . . 157 7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 vii IV Conclusion 161 Chapter 8: Conclusion and future work 162 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Bibliography 166 viii List of Tables 4.1 Experimental dynamical systems and input parameters for GBRRT and GABRRT 74 4.2 Experimental statistics of GBRRT for different systems . . . . . . . . . . . . . . 80 4.3 NBRRT results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1 Experiment parameters for RAPID . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Hardware experiment results for RAPID vs. RRT . . . . . . . . . . . . . . . . . 105 6.1 Hardware experiment results for MR-HIRRT . . . . . . . . . . . . . . . . . . . . 134 7.1 Hardware experiment results for CADRE multi-robot exploration tests . . . . . . 154 ix List of Figures 1.1 Finding a feasible path for a unicycle robot using SBMP . . . . . . . . . . . . . 3 1.2 Comparison of Feasible vs. AO algorithm . . . . . . . . . . . . . . . . . . . . . 4 1.3 Presence and absence of TPBVP . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Blossom propagation for different values of N . . . . . . . . . . . . . . . . . . . 35 4.1 Different types of searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Comparison of unidirectional vs. bidirectional search in higher dimensions . . . . 39 4.3 Availability and absence of TPBVP in bidirectional search . . . . . . . . . . . . 39 4.4 Difference between GBRRT and GABRRT . . . . . . . . . . . . . . . . . . . . 40 4.5 GBRRT algorithm applied to a maze environment . . . . . . . . . . . . . . . . . 41 4.6 Insert and update operations of Q . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.7 Search types used in forward expansion of GBRRT . . . . . . . . . . . . . . . . 51 4.8 Specific edge-generation types used in our version of GBRRT . . . . . . . . . . . 57 4.9 Pop operation from Q for heuristic edge-generation . . . . . . . . . . . . . . . . 58 4.10 Variation of initial solution time vs. parameters of GABRRT . . . . . . . . . . . 60 4.11 Covering ball sequence used to prove probabilistic completeness of GBRRT . . . 63 4.12 Runtime flowchart for different variations of GBRRT . . . . . . . . . . . . . . . 73 4.13 Maneuver libraries for unicycle and quadrotor used in simulation experiments . . 74 4.14 Initial feasible solution found by our algorithms in simulation experiments . . . . 76 4.15 Hardware experiment preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.16 Simulation experiment results comparing GBRRT and GABRRT against state- of-the-art algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.17 Initial solution cost results of simulation and hardware experiments . . . . . . . . 86 4.18 Hardware experiment results comparing GABRRT and RRT . . . . . . . . . . . 89 5.1 Quadrotor encounters dynamics change and replans with RAPID algorithm . . . 95 5.2 Edge addition in RAPID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.3 Quadrotor used to test the RAPID algorithm . . . . . . . . . . . . . . . . . . . . 103 5.4 Obstacle course and trajectory libraries used in hardware experiments to test RAPID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5 Comparison of RAPID vs. RRT for two trials of hardware experiments . . . . . . 106 5.6 Simulation experiment results for RAPID algorithm . . . . . . . . . . . . . . . . 106 x 5.7 Replanned paths using RAPID, RRT-Extend and SST-Random algorithms for unicycle and quadrator vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1 Heuristic-guided dynamical multi-rover sampling-based motion planning without TPBVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 Multi-robot motion planning architecture for MR-HIRRT . . . . . . . . . . . . . 117 6.3 Push and pop operations for exploitation process in MR-HIRRT . . . . . . . . . 122 6.4 Different environments used in MI-HIRRT simulation experiments . . . . . . . . 128 6.5 Safety trajectories and trajectory library used in MR-HIRRT experiments . . . . . 128 6.6 Results of simulation experiments comparing MR-HIRRT against state-of-the-art algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.7 Analysis of PRM nodes count vs planning time for MR-HIRRT . . . . . . . . . . 133 6.8 Scalability experiment results for MR-HIRRT . . . . . . . . . . . . . . . . . . . 133 6.9 Testing MR-HIRRT at Lunar mini yard at JPL . . . . . . . . . . . . . . . . . . . 134 7.1 Map-partition based multi-robot exploration . . . . . . . . . . . . . . . . . . . . 138 7.2 Testing our proposed map-partition multi-robot exploration algorithm in the Lu- nar mini yard at JPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.3 Proposed centralized-decentralized approach for multi-robot exploration . . . . . 141 7.4 Map-partition and assignment in leader agent . . . . . . . . . . . . . . . . . . . 143 7.5 Frontier-based exploration in agent . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.6 Even-odd rule algorithm used to test existence of a point in polygon . . . . . . . 145 7.7 Different layouts of unexplored space used in simulation experiments . . . . . . . 148 7.8 Map partitions for different input layouts of unexplored space using K-means algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.9 Progression of an exploration mission using our proposed multi-robot exploration 150 7.10 Simulation experiment results from testing our proposed multi-robot exploration algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.11 CADRE Mercury-7 rovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.12 Visual-guided navigation pipeline used in the CADRE rovers. . . . . . . . . . . . 152 7.13 Original image showing a crater in the front of the rover and the corresponding stereo pipeline output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.14 Two different scenarios used to test our exploration algorithm in real-world ex- periments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.15 Progression of a real-world experiment having two robots exploring 6mx6m region155 7.16 Progression of a real-world experiment testing resiliency of our proposed algorithm156 7.17 Results for multi-robot exploration using a feasible planner without TPBVP solution159 xi List of Abbreviations AIT Adaptively Informed Trees AO Asymptotically Optimal AO-RRT Asymptotically Optimal - Rapidly Exploring Random Tree A.S. Almost Sure BVP Boundary Value Problem CADRE Co-operative Autonomous Distributed Robotic Exploration CSMA-CA Carrier Sense Multiple Access - Collision Avoidance DI Dynamic Insertion DIRT Dominance-Informed Region Trees DOF Degrees of Freedom GBDIRT Generalized Bidirectional Dominance-Informed Region Trees GBRRT Generalized Bidirectional Rapidly Exploring Random Tree GABDIRT Generalized Asymmetric Bidirectional Dominance-Informed Region Trees GABRRT Generalized Asymmetric Bidirectional Rapidly Exploring Random Tree iSST Informed Stably Sparse Rapidly Exploring Random Tree IVP Initial Value Problem JPL Jet Propulsion Laboratory KS Kolmogorov Smirnov LPA* Lifelong Planning A* MR-HIRRT Multi-Robot - Heuristic Informed Rapidly Exploring Random Tree NASA National Aeronautics and Space Administration NBRRT Numerical Bidirectional Rapidly Exploring Random Tree NN Nearest Neighbor ONR Office of Naval Research PD Proportional-Derivative PRM Probabilistic Roadmap RAPID Rapid Planning for Interchanged Dynamics RRT Rapidly Exploring Random Tree RS Range-Search ROS Robot Operating System SBMP Sampling Based Motion Planning SST Stably-Sparse Rapidly Exploring Random Tree TPBVP Two-point Boundary Value Problem xii Part I Introduction 1 Chapter 1: Introduction Autonomous robotic systems have found widespread applications in various domains, in- cluding manufacturing [1], industrial inspection [2], transportation [3], and planetary surface exploration [4]. These systems necessitate the ability to plan and execute movements from an initial to a final position while ensuring collision-free navigation around obstacles, structures, and people. This crucial capability hinges upon the concept of motion planning [5], which in- volves charting a safe feasible path from start to goal. Informally, a feasible path is one that a robot can execute accounting for its kinematics and dynamics. With many robotic systems having high dimensional state spaces due to their numerous degrees of freedom and higher-order derivatives, conventional motion-planning methods such as visibility graphs [6] or cell decomposition [7] become impractical as their computational de- mands increase exponentially with dimensionality [8]. Furthermore, these methods ignore ve- hicle dynamics while planning. This dilemma leads us to using a relatively new paradigm of probabilistic methods known as Sampling-Based Motion Planning (SBMP) [5]. SBMP strategies involve the construction of probabilistic trees (Fig. 1.1) or roadmap graphs, where each edge represents a feasible trajectory for the robot. Through intelligently selected ran- dom samples in the robot’s state space, SBMP methods employ exploration and exploitation techniques to iteratively expand and refine these trees or graphs. The inherent stochastic nature 2 allows for the efficient exploration of the state space, leading to the generation of feasible paths subject to robot’s kinematic and dynamic constraints. The balance of exploration and exploitation techniques enables SBMP methods to adapt to diverse environments where traditional methods struggle due to the curse of dimensionality [8] and complex dynamics. SBMP approaches have found applications [9] in protien folding, digitial animation, navigation of self-driving vehicles, and industrial manipulator robots. Sampling-based motion planning Node Feasible Trajectory Feasible Path Obstacles Goal Figure 1.1: Finding a feasible path for a unicycle robot using SBMP In the realm of SBMP, two categories of algorithms have emerged: Feasible [5] and Asymp- totically Optimal (AO) [10]. Feasible algorithms promptly yield initial solutions to motion plan- ning challenges, prioritizing speed over solution quality. We apply them to complex problems (e.g., Piano mover’s problem [11], Alpha puzzle [12]) where finding any feasible solution is con- sidered an achievement. Conversely, AO algorithms progressively refine their solutions towards optimality with available planning time. While both categories hold significance (Fig. 1.2), our focus within this thesis centers on the development of feasible motion planning algorithms. 3 Comparison of Feasible vs. AO algorithm So lu ti o n C o st Time Feasible 0 AO 𝑡𝑖1 𝑡𝑖2 𝑡𝑖1 - Initial Solution Time (Feasible) 𝑡𝑖2 - Initial Solution Time (AO) 𝑡𝑖1 > 𝑡𝑖2 Figure 1.2: Feasible algorithms are faster at generating initial solutions than AO algorithms i.e. ti1 > ti2 where ti1 is the initial solution time of feasible algorithm and ti2 is the initial solution time of AO algorithm. AO algorithms excel in optimizing solution costs over time. 1.1 Motivation for feasible algorithms There are several reasons why we choose to focus on feasible sampling-based motion al- gorithms in this thesis which are outlined below: 1. Fast solution discovery: Feasible motion planning algorithms prioritize the rapid gen- eration of valid solutions, rendering them particularly well-suited for tackling complex challenges such as the Alpha puzzle [12] or protein folding [13], where any solution is deemed an accomplishment. Moreover, their efficacy extends to time-critical applications like robotics, autonomous vehicles, and electronic games. 2. Computational efficiency: Feasible algorithms often require less computational power than AO algorithms. This makes them suitable for resource-constrained systems where computational time is limited. 3. Problem simplification: Feasible solutions can be used as a start step in solving complex problems. Once a feasible solution is obtained, it can serve as a starting point for further 4 optimization or refinement. 4. Dynamic environments: In dynamic environments with moving obstacles and other robots, generating fast feasible solutions [14] can be more valuable than optimizing them. Presence and absence of TPBVP 𝑥0 𝑥1 ℰ1 𝑥2 ℰ2 TPBVP Solution Available ( ℰ1 exactly connects 𝑥0 and 𝑥1 ) No TPBVP Solution ( ℰ2 does not connect 𝑥0 and 𝑥2 ) Figure 1.3: E1 exactly connects x0 and x1, thus solving TPBVP solution. E2 does not connect x0 and x2, thus no TPBVP solution. 1.2 Feasible planning without Two-Point Boundary Value Solution (TPBVP) Many existing feasible SBMP algorithms [15] operate under the assumption that a trajec- tory precisely connecting two points in multi-dimensional state space can be found – a conun- drum known as the Two-Point Boundary Value Problem (TPBVP) [5, 16, 17]. Addressing this problem entails solving a multi-dimensional first-order differential equation subject to specified initial and final conditions. Finding solutions to this problem especially in high-dimensional state spaces are inherently complex, with analytical solutions only available to specific canoni- cal systems [18] [19]. Numerical approaches [16], on the other hand, come with computational overheads and susceptibility to precision and stability issues. Consequently, this dissertation con- centrates on devising feasible SBMP algorithms tailored for scenarios where the solution to the 5 challenging TPBVP is computationally hard and/or impossible (Fig. 1.3). These algorithms find valuable applications, particularly in motion planning for modern, complex, high-dimensional state space systems, including autonomous vehicles, industrial manipulators, humanoid robots, multi-agent systems, and bio-inspired robots. 1.3 Contributions to the state-of-the-art This dissertation contributes to the field of feasible sampling-based motion planning by addressing the challenge of planning feasible motions that consider a robot’s kinematics and dynamics when TPBVP solutions are unavailable. These contributions have either been published or submitted (in process) to archival journals [20–22] and presented at a major conference [23]. Our main theme is explored through a range of scenarios, including two for single-agent systems and two for multiple-agent systems, as outlined below. The integration of various heuristics accelerates the discovery of solutions across these diverse scenarios. 1. Single-agent systems (a) Fast feasible motion planning: Quickly find a feasible motion path from a start to goal configuration in a static known environment. (b) Motion replanning under dynamics variation: Efficiently replanning a feasible path back to home base accounting for vehicle’s dynamics change in a static known environment. 2. Multi-agent systems (a) Multi-robot motion planning: Expediently solving the multi-robot feasible motion 6 planning problem where robots visit pre-selected unlabeled targets in a partially- known dynamic environment. (b) Multi-robot exploration: Quickly solving the multi-robot feasible exploration prob- lem where robots explore a completely unknown environment using a divide and conquer approach. In the subsequent sections, we will delve into these four scenarios elaborating on our con- tributions to each. 1.3.1 Fast feasible motion planning In this scenario, we tackle the challenge of rapidly solving feasible motion planning prob- lems in static known environments, which is crucial for addressing high-dimensional motion planning challenges. We introduce novel bidirectional search strategies that expedite the discov- ery of solutions. Our approach involves the forward search tree, combining exploration and ex- ploitation to quickly find a path to the goal. The exploitation process leverages knowledge gained from the growing reverse search tree to focus the forward search on the goal region. Notably, this work represents the first application of bidirectional search strategies in situations where TPBVP solutions are unavailable. The primary contribution of this work is the proposal of two feasible algorithms: Generalized Bidirectional Rapidly Exploring Random Tree (GBRRT) and GABRRT (Generalized Asymmetric Bidirectional Rapidly Exploring Random Tree). Additional contributions include: • Provision of proofs for GBRRT and GABRRT being probabilistically complete. • Derivation of computational complexities for GBRRT and GABRRT. 7 • Empirical analysis of input parameters of GABRRT vs intial solution time. • Experimental verification employing simulation experiments for six different non-linear dynamical systems, demonstrating that GBRRT and GABRRT performs comparable or better than existing approaches. • Experimental verification using real-world quadrotor experiments on a 3D obstacle course. 1.3.2 Motion replanning under dynamics variation Our second scenario involves motion replanning under changed dynamical constraints, with practical applications in swiftly re-routing systems back to their home base when faced with sudden dynamics changes (e.g., vehicle damage). We propose a novel approach that lever- ages the topological information stored in the original search tree to expedite the replanning process. This work represents the first approach that considers motion planning under dynamics changes without solving TPBVP. The primary contribution of this work is the introduction of the RAPID algorithm (Rapid Planning for Interchanged Dynamics) that performs quick replanning. Additional contributions include: • Substantiation of the algorithm’s efficacy through simulation experiments involving two distinct dynamical systems. • Demonstrating practicality using real-world quadrotor experiments on a 2D obstacle course. 8 1.3.3 Multi-agent motion planning Transitioning to multi-agent scenarios, we first tackle a multi-agent feasible motion plan- ning problem within a partially-known dynamic environment. This scenario is particularly rel- evant in planetary surface missions, where robots need to visit predetermined points of interest based on lower-resolution maps chosen by scientists. Compared to previous works, this is the first work that considers mult-robot motion planning problem in partially-known environments when TPBVP solutions are unavailable. In this context, we maintain a centralized roadmap using the initial environmental map, dynamically updating it as robots encounter new obstacles during their operations. This dynamic roadmap serves as a heuristic to expedite motion planning and facilitate robot-target assignment. Our primary contribution is the development of a multi-robot motion planning algorithm known as MR-HIRRT (Multi-Robot Heuristic Informed Rapidly Ex- ploring Tree). Additional contributions include: • Experimental verification using simulation experiments involving two-dynamical systems and three different scenarios. • Implementation on real-world prototype lunar rovers to test feasibility of proposed algo- rithm. • An analysis of the overall planning time variation vs. number of initial PRM nodes for two high-dimensional dynamical systems. • A scalability experiment analysis showing the variation of total planning time and mission time with increase in the number of robots and targets. 9 1.3.4 Multi-robot exploration Our final scenario addresses the feasible multi-robot exploration problem, where teams of robots collaborate to explore uncharted territories. This challenge has practical applications in rapidly and efficiently exploring completely unknown environments, such as extra-terrestrial surface missions or search and rescue operations. Our main contribution includes two aspects: firstly, a comprehensive study of multi-robot exploration within the context of the upcoming CADRE mission, employing a divide-and-conquer approach adhering to CADRE’s mission con- straints. Secondly, an extension of this algorithm to accommodate robots with substantial dy- namics using feasible sampling-based motion planners when TPBVP solutions are unavailable. Additional contributions include: • Experimental verification using simulation experiments encompassing different initial shaped environments. • Experimental verification using real-world prototype lunar rovers. • Experimental verification of the advantage of using heuristic based planners for motion planning when TPBVP solutions are not available. 1.4 Outline of this document The outline for the rest of the document is as follows. Chapter 2 offers a comprehen- sive literature review contextualizing the work presented in this dissertation. Chapter 3 provides an introductory technical foundation for feasible sampling-based motion planning. Chapter 4- Chapter 7 present the technical details, discussions, and experiments related to the four algo- 10 rithmic contributions highlighted in the preceding section. In particular, Chapter 4 outlines the GBRRT and GABRRT algorithms, proposed for executing kinodynamic bidirectional searches in scenarios where the TPBVP is inaccessible. Chapter 5 delves into the RAPID algorithm, formulated to enable swift re-planning when a vehicle’s dynamic constraints undergo alteration mid-mission. Chapter 6 introduces MR-HIRRT, a framework to efficiently enable navigation of ‘n’ agents visiting ‘m’ unlabeled targets in partially-known environments. Chapter 7 introduces the map-partition based multi-robot exploration algorithm for the CADRE mission. The culmi- nation of the dissertation is found in Chapter 8, summarizing the primary contributions made and engaging in discussions regarding potential avenues for future research. 11 Chapter 2: Literature review This dissertation proposes several feasible sampling-based motion planning algorithms, tai- lored to address general single and multi-agent scenarios where solutions to the TPBVP remain elusive. Within the context of single-agent scenarios, this work encompasses fast feasible mo- tion planning and motion replanning under the influence of changing dynamical constraints. For multi-agent scenarios, the focus extends to encompass multi-robot motion planning and multi- robot exploration, the former within partially known and the latter in entirely unknown environ- ments respectively. Throughout these scenarios, the strategic utilization of pre-existing or on- the-fly calculated heuristics serves as a fundamental approach to expedite the motion planning process. In the subsequent sections, we first discuss background on finding TPBVP solutions for motion planning and use of heuristics for accelerated motion planning which are common themes in all our proposed algorithms. We then discuss related work on motion planning for single and multi-agent systems. 2.0.1 Solving TPBVP in sampling-based motion planning Solving a TPBVP in sampling-based motion planning entails finding a solution to a n− dimensional first-order differential equation (where n is number of states) constrained to the 12 given start and end boundary conditions. This is an extension of the well-studied IVP [24] where only the initial condition is specified. Finding the TPBVP solution is non-trivial for general non-linear systems and closed form-expressions for TPBVP have been only be attained for some specific systems [16, 25, 25]. Consequently, researchers have explored various approaches to tackle the TPBVP motion planning problem. These include generating approximations [26], utilizing numerical methods such as shooting approaches [27], finite-difference methods [16], and collocation methods [28]. Additionally, strategies involving the simplification of dynamics via linearization [29] and subsequently solving the resulting linear TPBVP have been employed. It should be noted, however, that numerous numerical methods tend to be computationally intensive and susceptible to stability and precision challenges [16]. More recently, a variety of algorithms based on random forward-propagation [30] [31] have emerged. These algorithms circumvent the need to solve the TPBVP, instead constructing trees by exclusively addressing the IVP. Notable examples of these forward-propagation algorithms encompass the Stable Sparse-RRT (SST) [31], Informed SST (iSST) [32], Dominance-Informed Region Trees (DIRT) [33], Asymptotically Optimal RRT (AO-RRT) [34] [35], and the algorithm employing bundles of edges (BOE) [36]. It is important to note, however, that these algorithms fall under the AO category, prioritizing the enhancement of solution quality over expedited so- lution discovery. Consequently, a gap exists in the domain of algorithms that swiftly identify feasible initial solutions—a gap this dissertation tries to address. 13 2.0.2 Use of heuristics in sampling-based motion planning Numerous heuristics have been previously employed within the context of sampling-based motion planning, as a well-selected heuristic frequently leads to improved search efficiency. A common approach involves probabilistically sampling the goal node, a technique known as goal biasing [37]. Akgun et al. [38] utilize sampling heuristics to increase the number of nodes sam- pled near a discovered path, thereby increasing the rate of finding a more optimal one. Urmson et al. [39] introduce the heuristically guided RRT (hRRT), which leverages a quality measure to strike a balance between exploration and exploitation when selecting the next node to expand, consequently resulting in lower-cost solutions. Anytime-RRT [40] presents a general heuristic for rejecting samples in the context of any- time planning whenever the heuristic cost from the start to the new sample plus the heuristic cost from the sample to the goal is greater than the length of the best known path. C-FOREST [41] ex- tends this idea to AO sampling-based motion planning on both single-process and multi-process implementations. In D-dimensional Euclidean spaces (RD), the sample region from [40, 41] takes the special form of a hyperellipse; Informed RRT* extends the idea from [41] by show- ing how to generate new samples from the hyperellipse embedded in RD directly (without using rejection sampling). DIRT [33] and iSST [32] use user-defined heuristics to provide high rates of convergence towards optimal solution. However, sometimes it is difficult to come up with a heuristic that provides a benefit in a particular problem instance, even though heuristic for the en- tire problem domain may be easy to formulate [42]. To alleviate this issue, Strub et al. [42] [43] propose a geometric planner that uses the problem-specific cost-to-go heuristic provided by the ever-improving reverse search tree obtained from LPA* to converge towards the optimum. Our 14 methods use a similar approach to [42] in using the reverse search tree to get the cost-to-goal heuristic but differ in that our algorithms do not require solving the TPBVP. In the context of multi-robot motion planning, various heuristics have been used in both unlabeled [44] and labeled [45] goals scenarios. In labeled goals scenario, each robot is given a specific target to visit whereas in the unlabeled goal scenario, it does not matter which robot visits what target as long as all targets are visited. Le and Plaku introduced various centralized approaches in both both labeled [46–48], and unlabeled goal [49] scenarios using a pre-built lower-dimensional PRM roadmap [50] as a heuristic for each robot to guide the search. Thus, for both single and multi-agent systems, a diverse array of heuristics have been harnessed to expedite solution discovery and enhance solution quality. In this thesis, we also utilize different types of heuristics into our proposed algorithms to accelerate initial solution identification. 2.1 Single-agent motion planing We first discuss previous literature regarding bidirectional SBMP which forms the basis for our work on bidirectional search without TPBVP. 2.1.1 Bidirectional strategies for fast motion planning There have been a variety of bidirectional SBMP algorithms proposed in the literature. A seminal work is bidirectional RRT-Connect [37] which grows two trees toward each other by having each tree grow toward the nearest vertex of the opposite tree. Jordan et al. [51] develop a bidirectional variant of RRT* [52] that provides asymptotic optimality utilizing various heuristics and procedures to improve the convergence rate. Tahir et al. [53] and Xinyu et al. [54] use arti- 15 ficial potential fields [55] to generate intelligent samples using gradient descent methods to im- prove speed of convergence of bidirectional RRT*. Devaurs et al. [56] update the unidirectional Transition-RRT (T-RRT) algorithm to create a bidirectional variant that achieves fast conver- gence and better quality paths over configuration spaces where cost functions are defined. Starek et al. [57] propose a bidirectional approach to Fast Marching Trees (FMT*), which they show has better convergence rates to high-quality paths than many existing unidirectional planners. Strub et al. [42] use an asymmetric bidirectional search in Adaptively Informed Trees (AIT*) with the reverse tree created using Lifelong Planning A* (LPA*). Although the algorithms men- tioned above provide fast convergence rates, they cannot be used (in their current form) when the TPBVP cannot be solved, or used in practice when doing so is computationally expensive. Bidirectional algorithms [37,58] that do not require solving TPBVP, instead require extra compu- tation to connect the discontinuity that exists between the two trees, e.g., using methods like point perturbation [37] or Bézier curves [58]. In contrast, our approach avoids the tree-tree connection problem entirely by using the heuristic provided by the reverse search tree to grow the forward tree towards the goal. In the previous section, we discussed bidirectional strategies proposed for fast solution discovery and improving solution quality. However, these approaches assume that the vehicle dynamics does not change during the mission. While this assumption holds true for certain missions, numerous situations arise where changes in vehicle dynamics necessitate the need for replanning. We address this problem in Chapter 5 in our thesis and provide the related work on it in the next section. 16 2.1.2 Motion replanning A survey of the sampling-based motion planning literature reveals limited work on re- planning under changed dynamical constraints but numerous existing work on replanning under changed dynamic environments. For example, Bekris et al. [59] present a tree-based planner that deals with evolving configuration spaces by incrementally updating a tree-based data structure re- tained from previous steps and then biasing the search with a greedy and probabilistic complete exploration approach. Ferguson et al. [60] make repairs to an already existing RRT tree, remov- ing invalid parts and then growing the changed tree until a new solution is found to account for the changed configuration space. Yoshida et al. [61] present a reactive method that uses a com- bination of path replanning and path deformation to account for changed environments. Otte et al. [62] present RRT-X, the first AO algorithm in dynamic environments that uses ideas from the D* algorithm for quick replanning. While such algorithms work for replanning in evolving con- figuration spaces, they are not directly applicable to replanning in response to changing vehicle dynamics. Nonetheless, our method is inspired by the same high-level idea of using an existing but outdated solution to speed up the replanning process. 2.2 Multi-robot motion planning In this section, we focus our attention towards multi-robot motion planning and exploration, laying the foundation for our forthcoming work in Chapters 6 and 7. We first review related work on roadmap-based approaches for multi-robot motion planning, and motion planning in dynamic environments, ideas of which are used to expedite motion planning in Chapter 6. Following this, we provide an overview of multi-robot exploration algorithms. We then review map-partition 17 based multi-robot exploration which forms the basis for our proposed algorithm used for the CADRE mission (Chapter 7). Concluding this section, we delve into the realm of communication strategies for multi-robot exploration, clarifying the rationale behind our selected communication strategy. 2.2.1 Roadmap based approaches for multi-robot motion planning Švestka et al. [63] use a centralized approach for roadmap-based multi-robot path planning. They plan on an explicit higher-dimensional composite roadmap created by taking the cartesian product of individual robot roadmaps. Wagner et al. [64] improve this work by showing it is sufficient to use an efficient implicit composite roadmap representation and apply their M* algo- rithm [65] to calculate paths. dRRT [66] and dRRT* [67] further use this implicit representation to perform an RRT-like search in discrete space to find robot paths efficiently. However, these approaches do not work for complex robot dynamics when the TPBVP cannot be solved. Van der Berg and Overmars [68] use a local planner that encodes dynamic constraints while employing A* to calculate global paths on a pre-computed roadmap. They use a prioritized plan- ning scheme where each robot is assigned a priority, with motion planning prioritized for robots with decreasing priority. Le and Plaku introduced various centralized approaches for multi-robot motion planning for labeled [46–48] and unlabeled goals [48] while considering robot dynam- ics. They build a motion tree over the composite state space of all robots while using a lower- dimensional roadmap for each robot to guide the search. The completed motion tree’s trajectories correspond to each robot’s feasible paths. However, these approaches assume a fully known en- vironment at the start of the mission. In contrast, our work considers partially known settings 18 where new obstacles are detected as robots move in the environment. 2.2.2 Multi-robot motion planning in dynamic environments Prior work has employed disparate approaches to handle dynamic environments. The def- inition of a “dynamic” environment varies in the literature but is usually characterized by the encounter of another robot or unexpected obstacle appearance, disappearance, or movement un- known during initial planning. Shwail et al. [69] construct a roadmap in the preprocessing step and use A* to find paths for robots in an offline phase. In the online phase, after each delta time step, a collision check with other robots is performed, and the robot executes a bang-bang control along with a wait option to avoid other robots and reach the goal position. Tordesillas and How [70] present MADER, where the trajectories of robots are characterized by an outer polyhedral surrounding them. MADER introduces linear separability between polyhedra using planes in an optimization problem for collision avoidance with other robots and obstacles. A whole body of work has focused on using machine learning strategies. Zhu et al. [71] use a recurrent neural network to learn multi-robot motion behaviors from demonstrated trajec- tories. The learned trajectory model is used in conjunction with a model predictive controller (MPC) to perform robot-robot and robot-obstacle collision avoidance. PRIMAL [72] uses re- inforcement and imitation learning to learn decentralized robot policies for robot and obstacle collision avoidance that can extend to any number of robots. PRIMAL 2 [73] extends this work to lifelong planning, where robots visit other goals upon reaching their assigned goal. However, these machine learning strategies do not encode robot dynamics while planning. Our multi-robot motion planning work differs from previous work in that we consider both 19 robot dynamics and partially known environments. Although moving obstacles (besides robots) can also be incorporated, we stick to only using static obstacles to emulate the planetary explo- ration environment in our experiments. 2.2.3 Multi-robot exploration algorithms There have been a variety of multi-robot exploration algorithms proposed in the literature. Notable approaches include frontier-based [74,75], information theory-based [76,77] and poten- tial field-based, [78,79]. Frontier-based algorithms drive the robot to “frontiers” which are points on the boundary between explored and unexplored space. Information theory-based approaches move the robot to areas so as to reduce the total entropy, or maximize the mutual information [80] in the map. Finally, potential-field methods use a potential field [81] function to steer the robots to unexplored areas and away from explored areas of the map. Frontier-based exploration is the most commonly used among the proposed approaches due to its effectiveness and simplicity. Various frontier-based methods have been presented [82] to minimize time [83], avoid redundant coverage [84, 85], and reduce communication [86]. Frontier-based approaches can be divided into centralized [85,87] and decentralized schemes [88] depending on which entity generates the frontier points. Centralized schemes have a central entity like a base station or a leader agent that assigns frontier points to the individual agents. For example, Banfi et al. [87] use Integer Linear Programming (ILP) on a central base station to assign frontier points to each agent to achieve efficient exploration while enforcing recurrent connectivity constraints. In contrast, decentralized schemes have individual agents themselves assign their next frontier point. For example, Bautin et al. [88] have individual agents choose 20 their next frontier point by considering the distance of other agents to the frontier and how many agents are closer to a frontier. In our scheme, we use a map-partition based approach that uses elements of centralized and decentralized approaches, which we discuss next. 2.2.4 Map-partition based frontier-based exploration An important topic in multi-robot frontier-based exploration is assigning frontiers to dif- ferent robots as to reduce the overall mission time for exploration [75]. Previous approaches [75, 83, 89] propose a way to achieve this objective is by dividing the environment into disjoint regions and assigning frontier points within these regions for the individual robots to explore. For example, The works by Wurm et al. [75], Lopez et al. [89], and Wu et al. [90] employ the Voronoi partition to segment environments (rooms, corridors), whereas Solanas et al. [83] uses K-means-based partitioning for evironment segmentation. These approaches then use the robot’s location to determine the robot-region assignment. Similar to these methods, we propose using a map-partition approach in our work due to its exploration efficiency. However, our work differs from these methods in the following ways. First, Wurm et al. [75] and Solanas et al. [83] divide map into regions and then assign frontier to robots (within a region) in a centralized fashion. Wurm et al. [75] also calculates new partitions at every iteration. In contrast, our algorithm has agents individually determine frontier points in their assigned static regions in a decentralized fashion. Second, Lopez et al. [89] shares maps between different robots, and the agents indi- vidually calculate their Voronoi partitions, whereas we use a centralized entity to partition the map. Our approach offers the advantages of both decentralized and centralized methods in that it facilities autonomous exploration of individual regions by agents, while resulting in efficient co- 21 ordination (less communication) between them. Although, it is true that with the use of a central entity introduces a single point of failure, we mitigate this problem by planning to use a leader election scheme [91, 92] to select another leader agent if the current one gets disabled or stuck. An additional difference to previous works is that they focus on testing the algorithm either in simulation [83, 89] or an indoor-office structured environment [75]. In contrast, we test our exploration algorithm in both simulation and an outdoor setting with obstacles and craters that resembles the Lunar surface (Fig. 7.2) to cater to our mission. 2.2.5 Communication in multi-robot exploration Communication is essential for multi-robot exploration as it affects the coordination be- tween robots [93]. Inefficient coordination may lead to poor exploration efficiency, with the same areas being visited multiple times by different robots. Prior research has delved into diverse com- munication strategies [87] to facilitate multi-robot exploration coordination. While providing an exhaustive treatment of these strategies lies beyond the scope of this work, we highlight three common scenarios frequently employed: 1) Full communication, where communication is al- ways available, 2) Periodic communication [94], where communication is available periodically and 3) Recurrent communication [87], where communication is available only at the deployment locations. Ideally, in our work, we only need communication at the mission start to communi- cate the regions and mission end to communicate back the locally explored maps. This contrasts with algorithms like [87, 95], where the base station communicates the frontier points to agents at every iteration. 22 Chapter 3: Technical background Sampling-based motion planning approaches [96] [9] have gained significant popularity in the field of robotics due to their simplicity and their efficiency in exploring high-dimensional state spaces. These approaches involve the random sampling of collision-free feasible states for a robot within its state space and moving between them in a way that respects a robot’s kinematics and dynamics. Broadly, the literature categorizes these approaches into two primary categories namely roadmap approaches [97] and incremental planning [98] methods. We employ incremental methods for single-query motion planning [99] and roadmap approaches for multi- query motion planning [100]. In single-query motion planning, the central objective revolves around constructing a data structure, often represented as a tree, to effectively solve a single query. This query entails finding a feasible path between a provided start state and a designated goal state. In contrast, multi-query motion planning pertains to the creation of a data structure, frequently represented as a graph, capable of handling multiple queries. These queries involve finding paths between various start and end states. A prominent example of a roadmap approach is the Probabilistic Roadmap (PRM) pio- neered by Kavraki et al. [50] In its simplest form, PRM involves the sampling of collision-free states within the state space. These sampled states are then connected to a subset of their nearest 23 neighbors. During the online phase, the shortest path between a given start and goal state can be determined through the graph. The versatility of PRM enables its application in multi-query motion planning scenarios. However, it’s important to note that the construction of a roadmap assumes the solvability of the TPBVP for the given dynamical system. Feasible motion planning has traditionally been employed to find solutions to intricate problems, such as the Piano Mover’s problem [11] and the Alpha puzzle [12], where the attainment of any valid solution is deemed a significant achievement. In such contexts, the focus is on discovering any valid solution, with all solutions considered equally valuable. Recently, feasible planning has also found application as the initial step in any-time AO motion planning.The AO approach aims to provide a workable solution while simultaneously refining its quality as additional planning time becomes available, particularly towards achieving optimality. Throughout this dissertation, our primary emphasis remains on expeditiously solving the feasible motion planning problem across various scenarios. In the following section, we present definitions for the various primitives employed in feasible single-query motion planning algorithms, which will be consistently utilized throughout this thesis. 3.1 Definitions Definition 1. The configuration of a robot is defined as the minimum set of coordinates required to completely specify the degrees of freedom (DOF) of the robot. Typical examples of configurations of various robots include: 1. Unicycle Robot: Its configuration is completed specified by its 2D position (x, y) and θ i.e. (x, y, θ). 24 2. Manipulator robot: The configuration of a manipulator robot with three revolute joints with joint angles θ1, θ2, θ3 is (θ1, θ2, θ3). Definition 2. The configuration space C (also called C-space) is the space of all allowable con- figurations of a robot. For the unicycle robot, the configuration space corresponds to the Special Euclidean Group SE(2), while for the manipulator robot with three revolute joints, it is represented as a three- dimensional Torus (T 3). Definition 3. The state of robot includes variables that fully describe its current state including its configuration and some of its higher order derivatives (e.g. velocities and accelerations). Unlike a robot’s configuration, the robot’s state is not unique and can include higher-order derivatives such as accelerations, jerk, snap, etc. The decision of which higher-order derivatives to incorporate depends on the specific application and striking a fine-balance between employing higher-fidelity models and availability of computational power and sensor data. In the context of a unicycle robot, a possible representation of its state could comprise (x, y, θ, v, ω) where v is the linear velocity and ω is the angular velocity. Similarly, for the ma- nipulator robot featuring three revolute joints, a suitable state space representation might involve (θ1, θ2, θ3, ω1, ω2, ω3) where ω1, ω2, and ω3 are the angular velocities of the revolute joints. Definition 4. The state space S of the robot is the space of all states of a robot. Definition 5. The workspace of a robotW is the space of possible physical locations a robot can reach. 25 The workspace for the unicycle robot is the set of 2D locations the robot can reach. For the manipulator robot with n revolute joints, the workspace is the set of 3D locations its end-effector can reach. For the rest of our work, we will assume that X is a smooth n-dimensional (manifold) state space. Let the obstacle space Xobs be an open subset of X . Xobs is the set of all states where the system is in collision with obstacles in the workspace. The free space Xfree is defined as Xfree = X \ Xobs. Xfree is the closed subset of X and set of all states that are not defined by obstacle collisions. Let the start state be denoted by xstart, goal state xgoal and the goal region be Xgoal with xstart ∈ Xfree, and xgoal ∈ Xgoal and Xgoal ⊂ Xfree. Let U be the m-dimensional input control set. The system dynamics satisfies the differential equation of the form: ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U . (3.1) Definition 6. A valid trajectory (also called edge) E of duration tE is a continuous function that satisfies E : [0, tE ]→ Xfree, where E is generated by forward integrating system (3.1) using a control function Υ : [0, tE ]→ U . Definition 7. A distance metric dM is a function that satisfies dM : X × X → R+ where × de- notes the cartesian product and R+ is the set of non-negative real numbers. Assuming x1, x2, x3 ∈ X , dM satisfies the following properties. • Non-negativity: dM(x1, x2) ≥ 0 • Identity of indiscernibles: dM(x1, x2) = 0 ⇐⇒ x1 = x2 • Symmetry: dM(x1, x2) = dM(x2, x1) 26 • Triangle inequality: dM(x1, x2) + dM(x2, x3) ≥ dM(x1, x3) Definition 8. A distance function dX is a function that satisfies dX : X × X → R where R is the set of real numbers. Unlike dM , dX must only satisfy the triangle inequality property. Thus dX has less restrictive requirements than dM and this makes dX more general than dM . Definition 9. The cost C of a trajectory is a function that satisfies C : E → R where E is the space of all trajectories. The cost can be defined as the distance traveled along a trajectory or the energy or time required to execute the trajectory. In our experiments, we define C as the distance along a shortest trajectory, C(E) = sup ∑k i=1 dX (E(ti−1), E(ti)), where the supremum is taken over all k ∈ N and all sequences t0 ≤ t1 ≤ t2 . . . ≤ tN in I where I ⊂ R and is a connected set. By construction, dX satisfies the triangle inequality, and can thus be used to define the “distance” between two points for the purposes of our algorithm. The trajectory cost has previously been used in the context of AO planners, but in our work, we use it to define the cost-from-start (Def. 12) and cost-to-goal (Def. 13) heuristics. Definition 10. The clearance of a trajectory E is the maximal value δ such that Bn δ (E(t)) ⊂ Xfree∀t ∈ [0, tE ], where Bn δ is the n-dimensional hyper-ball of radius δ centered along a point in E(t). Definition 11. Let G ≜ (V,E) be a graph data structure embedded in X representing a mo- tion search tree. Let V and E represent the node set and edge set respectively. Let xa and xb represent any two nodes ∈ V. Then a feasible path Γa,b between xa and xb is defined as 27 Γa,b ≜ {E1, E2...Em}, , where {E1, E2...Em} is an ordered set of m valid trajectories with Ei ∈ E and E1(0) = xa and Em(tEm) = xb and Ei(tEi) = Ei+1(0). Definition 12. Let Gfor ≜ (Vfor,Efor) be a graph data structure embedded in X representing a forward search tree rooted at xstart. Vfor and Efor represent the node set and edge set re- spectively. Let Γstart,a be a feasible path connecting xstart and xa ∈ Vfor. The cost-from-start heuristic g of the state xa is defined as g(xa) = ∑m i=1 C (Ei), where Ei ∈ Γstart,a, E1(0) = xstart, Em(tEm) = xa, and m is the number of edges connecting xstart to xa. Definition 13. Let Grev ≜ (Vrev,Erev) be a graph data structure embedded in X representing a reverse search tree rooted at xgoal. Vrev and Erev represent the node set and edge set respectively. Let Γa,goal be a feasible path connecting x ∈ Vrev and xgoal. The cost-to-goal heuristic h of a state xa ∈ Vrev is defined as h(xa) = ∑n i=1 C (Ei) , where Ei ∈ Γa,goal, E0(0) = xa, En(tEn) = xgoal, and n is the number of edges connecting xa to xgoal. 3.2 Primitive operations of single-query SBMP In this section, we delve into the fundamental primitive operations that underlie SBMP. We commence by examining the operation of sampling nodes, the most basic operation in SBMP. Nodes establish connections between feasible trajectories in trees and roadmaps. 3.2.1 Sampling nodes We sample nodes from the state space of the robot. Our aim is to create a densely sampled sequence of nodes [5], effectively covering every point in the state space. The simplest approach to achieve this is by employing uniform random sampling of the state space. For spaces obtained 28 by taking the Cartesian product of individual spaces, the complete composite sample is obtained by directly selecting random samples from each individual space. However, when dealing with spaces incorporating rotations (e.g. SO(3) spaces) [5], the random unit quaternions need to be sampled, as opposed to Euler angles, to ensure a truly random sample sequence. An alternative to random sampling is dispersion sampling, which involves strategically placing samples in the state space to minimize the largest uncovered area. Notable examples of dispersion sampling methods include Halton [101] and Hammersley [101] methods. A more comprehensive exploration of sampling nodes can be found in [5]. In our work, we employ uniform random sampling in all of our proposed algorithms. In the following section, we will discuss the Nearest Neighbor (NN) operation, a crucial operation within most SBMP, responsible for determining the next node to expand in a growing tree. 3.2.2 Nearest neighbor (NN) The nearest neighbor (NN) refers to the problem of finding a nearest point to a query point in a given set according to some similarity metric. If V is the input set, xquery is the given point, dM is the similarity metric, then NN(xquery) is defined as NN(xquery) ≜ argmin x∈V dM (x, xquery) (3.2) This problem has been extensively studied in the computer science literature owing to its relevance in various applications such as computer graphics and data mining [102, 103]. The most straightforward approach to address this problem is through a linear search (O(n)), which 29 involves iterating over all the elements in a given set to find the nearest one. However, for mo- tion planning algorithms that require swift operations, employing specialized spatial data struc- tures like k-dimensional (k-d) trees [102] improves the computational complexity from O(n) to O(log(n)). The motion planning algorithms we propose do not require an exact nearest neighbor search, as is the case with other proposed SBMP algorithms [31, 52]. They can also work with ‘approximate’ solutions to these operations [104–106]. Using the definition given in [52], a node y is an ϵ-approximate neighbor of x if ||y − x|| ≤ (1 + ϵ)||z − x|| where z is the true nearest neighbor of x. Next, we will delve into the Range-search (RS) operation. This operation is commonly employed to locate a node that fulfills certain criteria within a hyper-ball characterized by a specific radius, determined either by a user-chosen or algorithmically determined radius. 3.2.3 Range search (RS) Range Search refers to the problem of finding all points within a ball of input radius cen- tered an input query point. If xquery is the query point, r is the ball radius, V is the input point set, then RS(xquery, r) ⊆ V can be defined as RS(xquery, r) ≜ {x ∈ V | dM(xquery, x) ≤ r} (3.3) Using k − d trees, this query can be done in O(n1− 1 d+m) [52] where m is the number of points returned. Like NN, our motion planning algorithms does not require exact RS query and will work with approximate solutions. A data structure that supports fast dynamic insertion, NN and approximate RS operations is 30 the Skip Quadtree [106]. The skip data structure supports both new point insertions and look-ups in expected time logNT . This data structure returns all points within range rk are returned plus possibly some additional points that are further than rk but within rk(1 + ε) for a user defined ε. The approximate range query returns ν points in expected time O(ε1−d log(NT ) + ν). Next, we will discuss about the collision-check operation, one of the most time-consuming operation in SBMP. 3.2.4 Collision check A collision-check operation checks if a robot configuration collides with an obstacle or not. This operation can be represented as a logical predicate [5], denoted as ϕ : X → TRUE, FALSE. When a state x ∈ Xobs, the predicate evaluates to TRUE, indicating a collision, whereas for states outside obstacle regions, the predicate evaluates to FALSE. Collision-checking is a major bottleneck in motion planning algorithms [107,108] because they typically involve checking for intersection between non-uniform geometric shapes. Adding to this challenge, collision checks must be executed in every iteration, contributing to increasing computational complexity as the number of iterations increases. Finally, sampling-based motion planning typically involve high-dimensional state spaces with collision-checking increasing in complexity with each added state. Consequently, collision checking becomes particularly com- putationally demanding in high-dimensional spaces. Typically in sampling-based motion planning, we also require a check if a trajectory col- lides with an obstacle or not. A naive way to implement trajectory’s collision check is to sample the trajectory at a pre-defined resolution ∆x and subsequently subjecting the resulting configura- 31 tions to collision checks. However, this method comes with its drawbacks. If ∆x is excessively small, it leads to redundant configuration checks, squandering valuable time. Conversely, with an overly large ∆x, the risk emerges that the robot might collide with intricate or narrow obstacles. There are more efficient algorithmic ways to perform collision check than the naive way which can be found in [5] [109]. 3.2.5 TPBVP for motion planning Given a n-dimensional, first-order differential equation (3.1) which we restate for conve- nience, ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U , (3.4) with constraints x(t) ∈ Xfree ∀t ||x(n)(t)|| ≤ βn (3.5) The first constraint implies collision free planning. The second constraint implies bounds on the higher-order derivatives (e.g. velocities, accelerations, etc) owing to the robot’s physical limitations. If the initial time is ti and the final time is tf , the objective of solving the TVBVP is to come up with a control input u(t) such a we find a trajectory E(t) that exactly connects x(ti) and x(tf ) i.e. E(0) = x(ti) and E(1) = x(tf ), subject to the constraints. Continuing, we will delve into forward propagation, a crucial operation utilized to bypass the direct solving of TPBVP in SBMP. 32 3.2.6 Forward propagation A common operation among all our proposed algorithms is conducting forward propagation (as known as Monte-Carlo propagation or random propagation), as outlined by Kleinbort et al. [30] and Li et al. [31]. This technique serves as the foundation for generating viable trajectories in our algorithms. In this approach, we are basically solving the IVP [24] assuming a random input u, a time resolution parameter ∆t and max propagation time tmax. The IVP is effectively solved through the utilization of numerical integration methods. While various numerical integration approaches exist, our implementations primarily leverage the Euler and Runge-Kutta (RK-4) [110] integration methods. A single operation of Euler integration is given as xnew ← xinit + f(x, u)∆t (3.6) where xinit is the initial condition and xnew is the final condition. Although Euler integration offers advantages of simplicity and computational efficiency, they cause inaccuracy, stability issues and cumulative errors in many problems. To overcome these issues, we employ RK-4. A single operation of RK-4 integration is f1 ← f(x, u) f2 ← f(x+∆tf1/2.0, u) f3 ← f(x+∆tf2/2.0, u) f4 ← f(x+∆tf3, u) xnew ← xinit +∆t(f1 + 2.0f2 + 2.0f3 + f4)/6.0 (3.7) 33 where f1, f2, f3, and f4 are intermediate outputs involving derivatives. RK-4 balances precision and computational cost. If a random control is used, then we perform random propagation which endowes probabilistic completeness property to a feasible algorithm. Finally, we will discuss the best-input propagation operation, commonly used for perform- ing exploitation search in SBMP. 3.2.7 Best-input propagation All our algorithms incorporate heuristics to expedite motion planning. Employing a heuris- tic typically entails generating a trajectory whose final configuration closely approaches a ‘heuris- tic’ target configuration. Given that we operate under the assumption that solving the TPBVP is not feasible, exact overlap of the final configuration trajectory to the target configuration cannot be assumed. In cases where closed-loop control is available, the controller plays a pivotal role in devis- ing controls that guide the trajectory toward proximity with the target configuration. However, in the context of our SBMP approach, open-loop control predominates. Consequently, under open-loop control, the most viable strategy is to generate a set of NB Monte-Carlo trajectories (Section 3.2.6) and subsequently select the trajectory whose endpoint best approximates the tar- get configuration. The degree of ‘closeness’ to the target configuration is evaluated based on the distance between the terminal configuration of the trajectory and the target configuration, using the designated distance metric dX . Let E = {E1, E2...ENB } be a list of random NB propagated trajectories and xtarget be a target configuration. The best-input Ebest−input trajectory is calculated as 34 Ebest−input ← argmin E∈E (dX(E .f inalNode(), xtarget)) (3.8) The value of NB determines the time taken to execute the blossom propagation operation [38,111] (Fig. 3.1). Opting for a larger value of NB facilitates the generation of larger number of trajectories thereby increasing the likelihood of generating a trajectory that closely approaches the target configuration. Conversely, a smaller value of NB could potentially result in the generation of trajectories that have not been propagated in alignment with the intended direction of the target configuration. Therefore, determining an appropriate value of NB involves finding a balance between accuracy and computational efficiency. Blossom propagation for different values of NB 𝑥𝑖𝑛𝑖𝑡 𝑥𝑡𝑎𝑟𝑔𝑒𝑡 𝑥𝑖𝑛𝑖𝑡 𝑥𝑡𝑎𝑟𝑔𝑒𝑡 𝑥𝑡𝑎𝑟𝑔𝑒𝑡 𝑥𝑖𝑛𝑖𝑡 𝑁𝐵 = 5 𝑁𝐵 = 7 𝑁𝐵 = 10 Small 𝑁𝐵 value Appropriate 𝑁𝐵 value Large 𝑁𝐵 value Figure 3.1: NB = 7 balances accuracy and computation cost 35 Part II Single-agent motion planning without two-point boundary value solution 36 Chapter 4: Bidirectional sampling based motion planning without TPBVP so- lution 4.1 Introduction Feasible algorithms find initial solutions quickly without considering solution quality. They are important because they can be used to quickly solve problems such as the Alpha Puzzle [12], which are high-dimensional and complex, and solving them quickly is considered an achieve- ment. Furthermore, feasible planning methods are computationally less expensive than AO algo- rithms and can be used in computationally constrained systems. Feasible algorithms are different from AO planning algorithms where priority is given over solution quality (efficient paths) rather than initial solution time. Single-query sampling-based feasible motion planning algorithms often utilize unidirec- tional or bidirectional search strategies [5] for planning. Unidirectional search involves the con- struction of a single search tree that expands from the start state towards the goal state or vice versa. In contrast, bidirectional search employs two search trees: one expanding forward from the start state and the other expanding backward from the goal state (Fig. 4.1). Bidirectional searches are typically preferred over unidirectional searches (Fig. 4.2) for fast feasible motion planning due to their capability to explore the state space much more rapidly 37 than unidirectional searches [5]. The expedited solution discovery can be attributed to several fac- tors, including the utilization of parallel processing, the elimination of unpromising possibilities through pruning, and the exploration of the state space from two distinct directions. Different types of searches Start Start StartGoal Goal Goal Unidirectional Search Bidirectional Search Bidirectional Search (Our approach) Forward Search Reverse Search Forward Search (Heuristic) Obstacle Figure 4.1: Comparison of unidirectional and bidirectional searches commonly used in single- query feasible planning, alongside our bidirectional search approach, where the forward search uses a heuristic from the reverse search to reach the goal. However, in bidirectional search, it is common to connect the forward and reverse trees when they approach each other closely. This connectivity implies a solution to the TVBVP for the considered dynamical system, which is not universally achievable. To overcome this limitation, this chapter introduces bidirectional search strategies that remain effective even when TPBVP solutions are not available. We make the assumption in this chapter that the workspace remains static, with known obstacles, and that the initial and goal states, along with the robot’s designated goal region, are provided. However, it is crucial to emphasize that our motion planning is conducted within the robot’s state space, where the specific arrangement of obstacles is unknown. In the scope of this work, we do not tackle scenarios involving dynamic obstacles for which TPBVP are unavailable. This aspect represents an ongoing research domain. The main contribution of this chapter is an novel bidirectional single-query feasible sampling-based motion planning algorithm called Generalized Bidirectional 38 Comparative performance of unidirectional vs. bidirectional search in higher dimensions Without Obstacle With Obstacle Figure 4.2: The left graph was generated without any obstacles, whereas the right graph was generated with a hypersphere of radius one serving as the obstacle, positioned at the center of the workspace. Availability and absence of TPBVP in bidirectional search Point 1 Point 2 Point 1 Point 2 2-point BVP solution exists Tree-tree connection is easy Tree-tree connection is hard or ` (Not what we study) impossible (Topic of this paper) No 2-point BVP solution exists Forward Search Reverse Search Obstacle Figure 4.3: Figures show instances where the TPBVP can and cannot be solved for connecting the forward and reverse trees in a bidirectional search. Rapidly-exploring Random Tree (GBRRT) that does not require solving the TPBVP (Fig. 4.3). The term ‘Generalized’ implies that this algorithm can be applied to many systems including systems that are: holonomic, non-holonomic, and/or kinodynamic. This is the first approach that considers bidirectional search when the TPBVP solutions are not available. Our method (Fig. 4.5) differs from the bidirectional RRT-Connect [37] approach in that GBRRT does not try to connect the two trees when they are close; instead, it uses the heuristic information provided by the reverse search tree to help the forward tree quickly grow towards the goal configuration. 39 Difference between GBRRT and GABRRT Forward Search Tree (Explore) Reverse Search Tree Start Goal Start Goal Goal RegionForward Search Tree (Exploit) GBRRT Dynamics (Curved Trajectories) in Reverse Tree GABRRT No Dynamics (Straight Trajectories) in Reverse Tree Figure 4.4: The figures illustrate our proposed algorithms: GBRRT (with dynamics in the reverse tree) and GABRRT (with no dynamics in the reverse tree). Another contribution of this chapter is an asymmetric variant of GBRRT called General- ized Asymmetric Bidirectional Rapidly-exploring Random Tree (GABRRT) that uses a naive reverse search with no dynamical trajectories in the reverse tree (Fig. 4.4) to provide heuristic information to the forward tree; GABRRT performs better than GBRRT in certain conditions (Section 4.4.3). We prove that GBRRT and GABRRT are probabilistically complete [5]. We run multiple experiments in simulation to evaluate the performance of GBRRT and GABRRT versus seven other algorithms across six different dynamical systems. We also run hardware experiments in a quadrotor test-bed to evaluate GABRRT in a more realistic physical setting. 40 GBRRT algorithm applied to a maze environment Start Goal Iteration 1 Start Goal Iteration 400 Start Goal Iteration 750 Start Goal Iteration 800 Start Goal Iteration 880 Reverse Search Tree (Explore) - Cost To Goal heuristic Free Space Obstacle Space Goal Region 0 250 Initial Feasible PathForward Search Tree (Explore) Forward Search Tree (Exploit using heuristic provided by reverse search) ` Figure 4.5: Proposed GBRRT algorithm, employing the forward search tree to utilize a combi- nation of exploration and exploitation (heuristic provided by the reverse search tree) search to quickly find a feasible path in a maze. 4.2 Preliminaries We use the trajectory definition in Def. 6 in our work. Note that this definition is valid for trajectories generated during forward search. We generate reverse tree trajectories by performing backwards integration of system (3.1). In the specific implementations used in our experiments, tE ∈ [0, Tmax] where Tmax is a user-defined maximum propagation time. We randomly vary tE but apply a constant control input for the entire duration of tE as done in [30, 31]. Although constant control is useful in practice 41 due to the discretization inherent in digital controllers [31], the application of constant control is not a general requirement of our proposed algorithms. GBRRT and GABRRT only require a distance function that satisfies the triangle inequality property as defined in Def. 8. We also define dNDX as the distance function on a lower-dimensional subspace of X where the higher-order dynamics (velocity or accelerations) are not included. The superscript ND refers to “No Dynamics”. GABRRT (Algorithm 12) uses this distance function to generate the cost-to-go-heuristic from a non-dynamical reverse tree (GBRRT, Algorithm 1, does not). We use the cost definition in Def. 9. By construction, dX satisfies the triangle inequality, and can thus be used to define the “distance” between two points for the purposes of our algorithm. 4.3 Problem definition We state the assumptions, constraints and the problem to be solved for feasible motion planning. Assumptions: We assume system dynamics of the form (3.1) which we restate for convenience. ẋ(t) = f (x(t), u(t)) , x(t) ∈ X , u(t) ∈ U . (4.1) System (??) is assumed to be Lipschitz continuous in both state and input arguments if there exists Kx, Ku > 0 such that for all x1, x2 ∈ X , u1, u2 ∈ U : 42 ∥f(x1, u1)− f(x2, u1)∥ ≤ Kx∥x1 − x2∥, ∥f(x1, u1)− f(x1, u2)∥ ≤ Ku∥u1 − u2∥, (4.2) We assume a static environment with pre-known static obstacles i.e. Xfree does not change over time. Constraints: x(t) ∈ Xfree(t) ∀t ||x(n)(t)|| ≤ βn Γstart,g(0) = xstart Γstart,g(1) ∈ Xgoal (4.3) We consider states as valid only if it lies in free space. We constraint the magnitude of the higher order dynamics to be less than a constant βn due to the physical limitations of the system. We constraints the start of feasible path at xstart and end of the feasible path in Xgoal. Solve: Single-query feasible motion planning Given (xstart, Xfree, Xgoal), find a Γstart,g if one exists. 4.4 Algorithm description The main idea behind our proposed algorithms is to use heuristic information from the reverse search tree to guide the forward search tree’s advancement towards the goal. The search begins by using the forward and reverse search trees to explore different parts of the search space 43 starting from both the start and goal states. Once the two trees encounter each other, the forward tree combines ongoing exploration with an exploitation strategy that leverages the cost-to-goal values stored in the reverse tree to direct the search. This causes the forward search tree to maneuver around obstacles and grow towards the goal quickly. A design choice is made in our algorithms to maintain a feasible but not a (near-) opti- mal reverse tree, even though refining the reverse tree toward optimality could arguably provide a more focused guiding heuristic after the two trees become intertwined. Instead, because our algorithms are designed to solve the feasible motion planning problem, computational effort is fo- cused on expediting the trees’ mutual encounters by exploring the free space—so that the guiding heuristic can be used as soon as possible. In Section 4.4.1, we present the proposed GBRRT algorithm. In Section 4.4.2, we provide the general outline for the exploration and exploitation searches utilized in GBRRT. Finally, we present the GABBRT algorithm in Section 4.4.3. 4.4.1 Proposed Generalized Bidirectional RRT (GBRRT) 4.4.1.1 Inputs The inputs to the GBRRT (Algorithm 1) include the scenario specific start node xstart, goal node xgoal, and goal region Xgoal; the system specific control input set U ; as well as a max propa- gation time Tmax (maximum duration of a trajectory); and two user defined parameters δhr and P . The parameter δhr defines the maximum distance within which nodes from the reverse tree have a focusing effect on the growth of the forward tree, where 0 < δhr <∞ (hr stands for heuristic radius). The parameter P defines the algorithm’s balance between performing exploration and 44 Algorithm 1: GBRRT(xstart, xgoal,Xgoal,U , Tmax, δhr,P) 1 Gfor ← {Vfor ← {xstart},Efor ← {}} 2 Grev ← {Vrev ← {xgoal},Erev ← {}} 3 Q← {} 4 for k ← 1 to Miter do 5 rk ← min(γ ( log |Vrev| |Vrev| ) 1 d+1 , δhr) // Reverse Tree Expansion (lines 6-11) 6 Erev ← RevSrchFastExplore(Grev,U , Tmax) 7 if not CollisionCheck(Erev) then 8 xrev ← Erev.initialNode() 9 Erev ← Erev ∪ {Erev} 10 Vrev ← Vrev ∪ {xrev} 11 updatePriorityQueue (Gfor,Q, xrev, rk) // Forward Tree Expansion (lines 12-30) 12 q ← P(k) 13 crand ∼ U ([0, 1]) 14 if crand < q then 15 Efor = NULL // Pop node from Q 16 xpop ← Pop (Q) 17 if xpop ̸= NULL then 18 Efor ← ForSrchExploit (Grev,U , Tmax, xpop, rk) 19 if Efor = NULL then 20 Efor ← ForSrchFastExplore (Gfor,U , Tmax) 21 if crand ≥ q or Efor = NULL then // For Probabilistic completeness 22 Efor ← ForSrchRandomExplore (Gfor,U , Tmax) 23 if Efor ̸= NULL then 24 if not CollisionCheck(Efor) then 25 xfor ← Efor.finalNode() 26 Efor ← Efor ∪ {Efor} 27 Vfor ← Vfor ∪ {xfor} 28 if goalRegionReached(Xgoal, xfor) then 29 return Path(Gfor, xstart, xfor) 30 insertToPriorityQueue (Grev,Q, xfor, rk) 31 return NULL exploitation during forward search, where 0 ≤ P ≤ 1 and, with a slight abuse of notation, 0 and 1 refer to the zero and one function, respectively. δhr and P are discussed more in Section 4.5.2. 4.4.1.2 Initialization The forward search tree Gfor is initialized line 1 using the node set Vfor (initialized to xstart) and edge set Efor. The reverse tree Grev is initialized line 2 using the node set Vrev (initialized to xgoal) and edge set Erev line 2. We use a priority queue Q to maintain a list of potential forward tree nodes for quick expansion line 3. 45 4.4.1.3 Shrinking neighborhood radius rk Shrinking the radius rk of the neighborhood in which the two trees influence each other is necessary to balance the competing desires for a quick expected per-iteration runtime (re- quiring smaller rk) with almost sure (A.S.) asymptotic graph connectivity between xstart and xg ∈ Xgoal (requiring larger rk) [52, 112]. Functions for rk that have previously appeared in the sampling based motion planning literature include rk = min(γ( log |Vrev| |Vrev| ) 1/d , δhr) from [52] and rk = min(γ( log |Vrev| |Vrev| ) 1/(d+1) , δhr) from [112], where d is the dimension of the system and γ is a scenario specific percolation-theoretic parameter1. In Appendix 4.6.2, we analyze the expected per-iteration runtime that results from using either function as well from using a constant rk. In our experiments, we calculate rk using the expression from [112], we discuss why this rate was chosen in Appendix 4.6.2. rk is used in the insert and update Q operations line 11 and line 30 and exploitation search (line 18). 4.4.1.4 Reverse tree expansion We expand the reverse search tree in lines 6-11. The reverse tree uses a pure explo- ration strategy for expansion using RevSrchFastExplore (line 6). Its purpose is to explore the state space as fast as possible and get close to and provide a focusing heuristic to the forward tree. The edge Erev returned by RevSrchFastExplore (line 6) is checked for collision using 1Regarding γ, if rk is defined as in [52], then A.S. graph connectivity requires γ ≥ (2((1 + 1 d ) 1/d) |Xfree| ζd ) 1/d , where |Xfree| is the volume of free space and ζd is the volume of the unit d-ball [52]. A new proof in [112] uses γ ≥ (2 + θ)( (1+ϵ/4)c∗ (d+1)θ(1−µ) |Xfree| ζd ) 1 d+1 , where ϵ ∈ (0, 1) and θ ∈ (0, 1/4), and µ > 0 , and c∗ is the A.S. convergent length of the path as n → ∞. Both bounds on γ are of a theoretical interest rather than prescriptive because we typically do not know |Xfree| or c∗, though we may have bounds on them. Practical selection of γ requires making an educated guess and refining if necessary. Anecdotally, we observe that the range of γ producing decent results is large in practice. 46 CollisionCheck (line 24). If no collision exists, then Erev and its corresponding final node xrev are added to Erev (line 9) and Vrev (line 10) respectively. xrev is utilized to perform a heuristic up- date in Q using updatePriorityQueue function (line 11). We describe updatePriorityQueue (Algorithm 3) as well as insertToPriorityQueue (Algorithm 2) later in this section. 4.4.1.5 Forward tree expansion We expand the forward search tree in lines 12-30. The forward search uses a combi- nation of exploitation (line 18) and exploration (line 20, line 22) strategies to expand the tree. The exploitation strategy uses the reverse tree’s cost information for focused expansion, while the exploration strategy uses both random exploration (for probabilistic completeness) and fast exploration (Fig. 4.7). These strategies are discussed in detail in Section 4.4.2. The choice of whether exploitation is performed in the current iteration is determined by P which outputs the probability q based on k’s value (line 12). q is compared to a value crand generated uniformly at random on [1, 0] (line 13). If crand < q, the edge is generated by an exploitation process by popping the forward tree node xpop from Q (line 16) and then using ForSrchExploit (line 18). This routine chooses the ‘best’ reverse tree node and generates a trajectory towards xpop to make focused progress towards the goal. The condition of xpop = NULL implies Q is empty. This may happen, e.g.,, near the begin- ning of the search when the forward tree has not yet encountered the reverse tree. If xpop = NULL, the algorithm defaults to performing fast exploration using ForSrchFastExplore ((line 20)). This function is designed to quickly explore the state space (see Fig. 4.7). If crand ≥ q or Efor = NULL (where Efor = NULL implies both exploitation and fast exploration were unsuccess- 47 ful), the algorithm performs a random exploration using ForSrchRandomExplore (line 22). The use of ForSrchRandomExplore provides probabilistic completeness. Using only the exploitation and fast exploration may produce an algorithm that is not probabilistically complete. Probabilis- tic completeness is further discussed in Section 4.6. The edge Efor is checked for collision (line 24) and if no collision exists, Efor and its corresponding final node xfor are added to Efor (line 26) and Vfor (line 27) respectively. If xfor ∈ Xgoal, checked using goalRegionReached (line 28), then the output path is returned using Path (line 29). Insert and update operations of Q 𝑥𝑐𝑙𝑜𝑠𝑒𝑠𝑡 ℰ𝑓𝑜𝑟 Forward Search Tree Reverse Search Tree New Forward Tree Edge ( ℰ𝑓𝑜𝑟) 𝑥𝑓𝑜𝑟 Start Goal Q 𝑟𝑘 INSERT Q UPDATE Start Goal 𝑟𝑘 New Reverse Tree Edge (ℰ𝑟𝑒𝑣) Insert to Q during forward search Update Q during reverse search Figure 4.6: On the left, a new forward tree node, labeled as xfor, is inserted into the priority queue Q with a key value of dX(xfor, xclosest) + h(xclosest) during the forward tree expansion. On the right, you can observe the cost update process for the same forward tree node within Q during a reverse search expansion. 48 Algorithm 2: insertToPriorityQueue(Grev,Q, xfor, rk) 1 xclosest ← Grev.NearestNeighbor(xfor) 2 if xclosest ≤ rk then 3 Q.insert (xfor, dX(xfor, xclosest) + h(xclosest)) 4.4.1.6 Priority queue Q insert GBRRT (Algorithm 1, line 30) adds the new forward tree node xfor to Q (Fig. 4.6) us- ing insertPriorityQueue (Algorithm 2). This algorithm first determines the closest node xclosest ∈ Grev to xfor within a ball of radius rk (line 1). If xclosest ̸= NULL, we use the key value dX(xfor, xclosest) + h(xclosest) to push xfor to Q (line 3). The cost includes the actual cost of reaching xfor plus the heuristic estimate of reaching the goal from xfor. The condition of xclosest = NULL is true if the forward tree has not encountered the reverse tree. Note that xfor is pushed to Q during the forward tree’s exploration and exploitation step but only popped from Q during the exploitation step. This necessitates the use of a priority queue Q in GBRRT to store the set of potential forward tree nodes obtained from both these steps and pop the best one for the current exploitation step. 2 Algorithm 3: updatePriorityQueue(Gfor,Q, xrev, rk) 1 xclosest ← Gfor.NearestNeighbor(xrev) 2 if xclosest ≤ rk then 3 Q.update (xclosest, dX(xclosest, xrev) + h(xrev)) 2A design choice worthy of mention is that we do not reinsert the node back into Q once it is popped (line 16 in Algorithm 1). We found through experiments that although reinserting the node helps in some cases, there may be cases where it causes the performance to decrease. An example is reinserting a node that is close to the goal but blocked by an obstacle. There may be intelligent ways of re-pushing such a node when certain conditions are satisfied, rather than naively re-pushing them all the time. This could be a direction for future research. 49 4.4.1.7 Priority queue Q update The heuristic update in Q during the reverse search (Fig. 4.6) occurs in updatePriorityQueue (Algorithm 3). The new reverse tree node xrev obtained from expanding the reverse tree is used to select the closest forward tree node xclosest within a ball of radius rk (line 1). If xclosest ̸= NULL, we update the heuristic of xclosest and corresponding key value dX(xclosest, xrev)+h(xrev) (line 3) only if the new value is less than the old key value in Q. While insertPriorityQueue adds a new (forward tree) node to Q, updatePriorityQueue updates the key value and position of a (forward tree) node already in Q. 4.4.1.8 Naive goal biasing vs. Cost-To-Goal heuristic Naive goal biasing [37], a common approach where the goal node is s