ABSTRACT Title of Dissertation: APPLIED AERIAL ROBOTICS FOR LONG RANGE AUTONOMY AND ADVANCED PERCEPTION Wei Cui Doctor of Philosophy, 2024 Dissertation Directed by: Professor Derek A. Paley Department of Aerospace Engineering This dissertation addresses the challenges of conducting autonomous long-distance operations in settings where communication is restricted or unavailable. It involves the development of aerial autonomy software, ground station user interface, and simulation tools. Field experiments are conducted to assess the real-world performance and scalability of the developed autonomous multi-vehicle systems. A search and revisit framework involving multiple UAS engaged in expansive area exploration has been developed. By employing the ARL MAVericks autonomy stack, we have devised three system designs with improving levels of autonomy. This approach is effective in developing autonomous system capabilities for extended-range missions, enhancing effectiveness in reconnaissance, search, and rescue missions. Furthermore, the dissertation introduces an innovative application of enhanced target detection and localization techniques tailored specifically for small UAS deployment. Neural network fine- tuning and AprilTag detector selection are carefully conducted. Augmented by a meticulously designed workflow for performance evaluation and validation, our approach aims to improve the precision of target detection and localization using a single RGB camera module. Additionally, the dissertation presents the implementation of a specialized ground control user interface. Functioning as a centralized command center, the user interface facilitates real- time monitoring and coordination of heterogeneous aerial and ground robotic platforms engaged in collaborative search missions. By streamlining air-ground coordination and human-robot interaction, the custom user interface optimizes the collective capabilities of diverse aerial and ground robotic platforms, enhancing overall mission effectiveness. The experimental results from multi-vehicle autonomous search missions, evaluating centralized and decentralized control in beyond visual line of sight scenarios, are presented, proving the efficacy of the search and revisit framework operating in real-world scenarios. Finally, the dissertation covers the design and implementation of a resilient network link tailored for robotic platforms operating in environments with limited bandwidth. This essential infrastructure enhancement is devised to overcome communication constraints, ensuring reliable data exchange, and strengthening the resilience of autonomous systems in bandwidth-limited environments. APPLIED AERIAL ROBOTICS FOR LONG RANGE AUTONOMY AND ADVANCED PERCEPTION by Wei Cui Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2024 Advisory Committee: Dr. Derek A. Paley, Chair/Advisor Dr. Shuvra S. Bhattacharyya Dr. Joseph K. Conroy Dr. Michael W. Otte Dr. Mumu Xu © Copyright by Wei Cui 2024 Dedication To my beloved Athena and Annastasia, Every day with you has reminded me of the importance of curiosity, persistence, and passion. Your smiles and hugs gave me the strength to continue. I dedicate this work to you, with the hope that you always pursue your dreams and believe in your abilities. Thank you for being my sunshine and my motivation. ii Acknowledgments I’d like to express my deepest gratitude to my esteemed adviser, Dr. Paley, for his mentorship throughout my time at the University of Maryland. His unwavering guidance, insightful advice, and steadfast support have been instrumental in my academic and personal growth. Without his dedication and encouragement, I would not have achieved the successes and milestones that I have. His mentorship has profoundly impacted my life, for which I am forever thankful and forever changed. Thank you to my committee members, Dr. Bhattacharyya, Dr. Conroy, Dr. Otte, and Dr. Xu, for taking the time from your busy schedules to support me. I am deeply grateful for the thoughtful feedback, guidance, and advice you have provided. I am honored to have had the opportunity to learn from each of you. I would like to express my gratitude to Dr. Stephen Nogar and Benjamin Linne from the Army Research Lab, as well as Mike Rawding, Isaac Carlson, Nikhil Deshmukh, Michael Smith, and Joel Witman from Survice Engineering. Their collective efforts in the development of both the hardware and software components of this project have been invaluable. Additionally, their provision of essential resources, unwavering support, and insightful guidance has been crucial to the success of this work. My appreciation also extends to Dr. Dinesh Manocha, Dr. Shuvra Bhattacharyya, Dr. Ben Riggan, and their students for their invaluable contributions. Their development of advanced iii perception algorithms played a crucial role in enhancing our autonomy stack. I appreciate my labmates Animesh Shastry, Zach Bortoff, Sydrak Abdi, Atharv Marathe, Qingwen Wei, Srijal Poojari, Rose Gebhardt for their invaluable contributions to this project. I am grateful for the stimulating environment we have created together and for the friendships that have formed as a result. Thank you all for your dedication and for making this journey a rewarding experience. Additionally, I appreciate Josh Gaus, Chris Titus, Grant Williams, McKenzie Turpin, and Darren Robey from the UROC for their collaborative efforts. Our teamwork in conducting numerous field experiments and collecting valuable data was instrumental to the development of this dissertation. Finally, I would like to acknowledge the funding provided by ArtIAMAS, the Army Cooperative Agreement with the University of Maryland, which made this project possible. This support was instrumental in enabling the research and development carried out in this dissertation. I would also like to extend my sincere gratitude to Colonel Luke Sauter for granting me the opportunity to pursue my PhD at UMD. His support and encouragement have been invaluable throughout my academic journey. iv Table of Contents Dedication ii Acknowledgements iii Table of Contents v List of Tables viii List of Figures ix List of Abbreviations xiii Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Relation to the State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Technical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 Novel Collaborative Search Framework with ARL Aerial Autonomy Stack 12 1.4.2 Improved Object Detection and Localization with Aerial Images . . . . . 13 1.4.3 Specialized Ground Control User Interfaces . . . . . . . . . . . . . . . . 13 1.4.4 Efficient Data Link for Aerial Operations in Bandwidth-limited Conditions 14 1.5 Outline of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2: Experiment Testbeds 16 2.1 Autonomy Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 MAVSDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Pixhawk Project 4 Autopilot . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Robot Operating System 2 . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.4 MAVericks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Simulation and Visualization Software . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 QGroundControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.3 Custom User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Uncrewed Aerial Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Radios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 Uncrewed Aerial Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Experimental Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 v 2.4.1 UMD Fearless Flight Facility . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2 UROC Raley Farm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Chapter 3: Autonomy Stack and Software Design 37 3.1 Problem Statement and Solution Designs . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Mission Planning and Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.1 Broad Area Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.2 Target Revisiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.3 Behavior Tree Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3 Target Detection and Localization . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.2 Object Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 Object Detection Clustering . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Advanced Perception Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.2 Person Re-identification . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Ground Control Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.1 Outdoor Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.2 3D Mapping Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Chapter 4: Object Detection and Localization using Aerial Images 55 4.1 Training a Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . 55 4.2 Loss Functions and Performance Metrics . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Custom Trained Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.1 ArtIAMAS Project 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 2023 NIST First Responder UAS 3D Mapping Challenge . . . . . . . . . 63 4.3.3 2021 NIST First Responder UAS FastFind Challenge . . . . . . . . . . . 67 4.4 Characterizing Detection and Localization Performance . . . . . . . . . . . . . . 69 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 AprilTag Detection and Localization using Aerial Images . . . . . . . . . . . . . 75 Chapter 5: Multi-vehicle Autonomous Search Missions 77 5.1 System Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 Multi-vehicle Coordination with Operator Supervision . . . . . . . . . . 83 5.3.2 Autonomous Search without Operator Supervision . . . . . . . . . . . . 87 5.3.3 Autonomous Search in Beyond Visual Line of Sight . . . . . . . . . . . . 89 5.4 Air-ground Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 6: Operating in Bandwidth-limited Environments 94 6.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 Communication Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2.1 Test Lane Construction and Mapping Quality Validation . . . . . . . . . 99 vi 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 7: Conclusion 110 7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Appendix A: Camera Calibration 113 Bibliography 116 vii List of Tables 1.1 Surveys of UAS challenges experimentally addressed. Multiple perception algorithms include YOLO, target localization, person re-identification, and action recognition 9 4.1 The VOXL2 CPU load test at various camera resolutions and frame rates for Grazer A-002 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 The CPU load test is performed using various YOLO versions and input image sizes for Grazer A-002. The high-resolution camera is configured to 720p at 1 fps. An ’x’ indicates that the camera server has crashed. . . . . . . . . . . . . . . 63 4.3 Performance for human detection using aerial images is improved incrementally. 64 4.4 Comparing the performance of fine-tuned YOLOv8m and YOLOv8n on a custom dataset for detecting humans and Landolt ’C’ rings . . . . . . . . . . . . . . . . 66 4.5 Process speed per image using YOLOv8m and YOLOv8n variants on the Gigabyte AERO groud stateion laptop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 Detection probability and accuracy at various altitudes and a camera angle of 15 degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.7 Detection probability and accuracy at various altitudes and a camera angle of 60 degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.8 Detection probability of Aruco Marker Detector with Grazer. The resolution is 1280 x 768px (720p) and the FOV is 120 degrees . . . . . . . . . . . . . . . . . 76 4.9 Detection probability of AprilTag 3 Detector with Grazer. The resolution is 1280 x 768px (720p) and the FOV is 120 degrees . . . . . . . . . . . . . . . . . . . . 76 5.1 Performance metrics of the end-to-end simulation . . . . . . . . . . . . . . . . . 80 5.2 Performance metrics of the end-to-end experiment . . . . . . . . . . . . . . . . . 85 5.3 Performance metrics of the autonomous end-to-end experiment . . . . . . . . . . 89 5.4 Performance metrics of the end-to-end experiment . . . . . . . . . . . . . . . . . 90 5.5 Performance for manikin detection using aerial images is improved with higher resolution images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.1 Performance comparison of TCP/IP Socket and ROS . . . . . . . . . . . . . . . 96 6.2 Size of messages per topic for 3D mapping. Raw images undergo M-JPEG compression for size reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.3 Comparison of ten ground truth and measured distances of the fiducials. The ten distances are defined in Figure 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . 102 viii List of Figures 2.1 Simulated manikins and AprilTags are added to the 3D scene to facilitate testing algorithms in Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Two cars are added to the 3D scene to enable testing algorithms detecting large targets with the pre-trained detection models . . . . . . . . . . . . . . . . . . . . 20 2.3 Some technologies used in custom user interface . . . . . . . . . . . . . . . . . . 22 2.4 Wireshark is employed to validate the QoS in ROS2, monitor network packets, and assess bandwidth utilization within a mesh network consisting of three UAS and one groundstation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Free-flowing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6 (left) Doodle embedded radio and (right) Doodle mini radio (Credit: Doodle Labs) 26 2.7 Software configuration page for the Doodle radios. Channel is set to 51 and Bandwidth is set to 15MHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.8 Sub-GHz RC transmitters used in this project : (a) 915MHz Dragon Link mounted on 2.4 GHz Jeti Transmitter, and (b) 900 MHz Jeti Transmitter . . . . . . . . . . 27 2.9 Modal AI m500 with FTDI adapter . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.10 Wire diagram of Modal AI m500 based quadcopter tailored for running MAVericks. The FTDI adapter enables RTPS communication between VOXL Flight onboard computer and the flight controller. . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.11 Wire diagram of the RB5 with Pixracer flight controller tailored for running MAVericks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.12 Jeti transmitter with Dragon Link and retrofitted RB5 platforms for MAVericks autonomy stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.13 Custom UAS platform by Survice Engineering - ARL Grazer and system diagram developed for MAVericks autonomy stack . . . . . . . . . . . . . . . . . . . . . 32 2.14 Doodle mini is connected to VOXL2 through the USB port open by the Modal AI USB 3.0 UART Add-on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.15 Wiring diagram of mRo Control Zero H7 and VOXL2 via USB 3.0 Add-on and M0076 Interposer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.16 Wiring diagram for UROC Chimera tailored for running MAVericks . . . . . . . 34 2.17 Fearless Flight Facility (F3) near the University of Maryland College Park Campus 36 2.18 Raley Farm flight faciltiy near the UMD UAS Research and Operations Center (UROC) in Southern Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 ix 3.1 Algorithm for the search and revisit: (left) preliminary broad area survey detecting targets on the ground and (right) subsequent target revisits involving multiple autonomous UAS and perception . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Configuration facilitating centralized control and PX4 flight stack . . . . . . . . . 39 3.3 Full-autonomous configuration with PX4 flight stack . . . . . . . . . . . . . . . 39 3.4 Full-MAVericks configuration with MAVericks navigation stack . . . . . . . . . . 40 3.5 K-means is utilized to split a large polygon into an arbitrary number of small polygons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 The VRP for two agents was resolved using the solver within Google OR-tools . 45 3.7 Behavior tree for search and revisit missions developed for MAVericks . . . . . . 46 3.8 Manikins and human actors for Re-ID running on VOXL2 . . . . . . . . . . . . 51 3.9 The outdoor variant of Fiona, crafted for search and revisit missions with an overlay of the Unity window on the left side. . . . . . . . . . . . . . . . . . . . . 52 3.10 Fiona also monitors the location of a smart binocular (black binocular icon), with the search area created by a sequence of waypoints (yellow stars) generated by the binocular’s LiDAR system . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.11 The custom UI displays a 3D mesh model generated by RTAB-Map . . . . . . . 54 3.12 Comparison of mesh model generated with and without AliceVision support. . . 54 4.1 Workflow to train a custom neural network running on VOXL2 . . . . . . . . . . 56 4.2 Sample images of the training dataset. The manikins are either standing or lying down viewed from 20m to 30m. . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 Sample validation images where the manikins are successfully detected using the custom-trained YOLOv5 small detection model . . . . . . . . . . . . . . . . . . 61 4.4 Training Progress of YOLO Model: A comprehensive view of training and validation metrics including object loss, class loss, and box loss for both training and validation sets. Additionally, training precision and recall, along with mean average precision (mAP) scores at 0.5 and 0.95 IOU thresholds, provide insights into model performance across various aspects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Sample training images for Landolt ”C” rings . . . . . . . . . . . . . . . . . . . 64 4.6 Training and validation performance metrics for the fine-tuned YOLOv8n capable of detecting persons and Landolt ”C” rings. . . . . . . . . . . . . . . . . . . . . 65 4.7 Detected humans and Landolt ”C” rings in the live test and evaluation phase of NIST First Responder UAS 3D Mapping Challenge in Salina, KS, April 2024 . . 66 4.8 Samples images in the manually developed dataset. Labels are manually added through Roboflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.9 Training and validation performance metrics for the fine-tuned YOLOv5s capable of detecting humans in densely forested areas using thermal images. . . . . . . . 68 4.10 The improved detection model is capable of automatically identifying humans beneath the canopy in thermal images. Extended observation of the thermal images without aids of the model may lead to eye strain. . . . . . . . . . . . . . 69 4.11 A camera angle of 15◦ positions the Grazer camera almost facing forward, whereas a 60◦ camera angle directs the camera more vertically downward. . . . . . . . . . 70 4.12 The quantification of object localization errors involves a series of postprocessing steps applied to rosbags collected in the field . . . . . . . . . . . . . . . . . . . . 71 x 4.13 Impact of camera angles on object localization errors . . . . . . . . . . . . . . . 72 4.14 Inaccuracies of placing bound boxes occur when the object is large and close to the UAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.15 UAS poses (blue chevrons) and estimated object locations (white circles) . . . . . 74 4.16 Observing AprilTag (36h11) from different distances from Grazer. The camera is configured to 1280 x 768 px (720p) with a 120-deg FOV lens. The high-resolution images are transformed into grayscale to facilitate AprilTag detection. . . . . . . 76 5.1 Digital twin of flight test facility in Southern Maryland built in Unity for high- fidelity simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Initial survey paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3 Revisiting the objects of interest with hexagonal orbits. . . . . . . . . . . . . . . 80 5.4 Simulated object images are available in Fiona’s gallery . . . . . . . . . . . . . 81 5.5 (a) UAS is selected (highlighted in yellow) and then redirected to the red pin, and (b) the UAS path (blue) is adjusted after manually redirecting the UAS. . . . . . 82 5.6 Testing beyond visual line of sight (BVLOS) operations: (top left) a UAV is flying over a lightly wooded area, and (satellite map) UAV flight paths and positions are tracked in real-time using a custom UI . . . . . . . . . . . . . . . . . . . . . . . 83 5.7 Flight paths for the initial broad area survey with six ground targets (black pins). The initial estimated target locations are depicted as yellow pins. . . . . . . . . . 84 5.8 Modified flight paths for revisiting the objects of interest . . . . . . . . . . . . . 85 5.9 True positives - two objects of interest are correctly identified by the detection model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.10 False positive - the detection model misclassifies a lamp pole as a person. . . . . 86 5.11 Results of advanced perception are transmitted back to the ground station. . . . . 86 5.12 Flight paths for initial broad survey using the full autonomous configuration. . . 87 5.13 Manikins detected in the autonomous search experiment . . . . . . . . . . . . . 88 5.14 Flight paths for the revisiting phase are generated onboard. . . . . . . . . . . . . 88 5.15 Flight paths for initial broad survey. . . . . . . . . . . . . . . . . . . . . . . . . 90 5.16 Manikins detected in the BVLOS experiment . . . . . . . . . . . . . . . . . . . 91 5.17 Manikins detected in the BVLOS experiment . . . . . . . . . . . . . . . . . . . 91 5.18 Manikin detected by the UGV . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1 Communication configuration for UAS operating in bandwidth-limited environments 99 6.2 (a) Dimensions of the ficual and (b) placement of the five fiducials in a test lane (Credit: NIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.3 Test lane with five fiducials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4 Fiducial top measurement. Ten distances are measured to quantify the distortion of the 3D map (Credit: NIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5 Real-time 3D point cloud for the test lane, along with RGB and depth images reconstructed on the ground station . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.6 Generated 3D mesh model with the measurement tool enabled in the custom UI. The measured distance reads 1.5 meters or 59 inches . . . . . . . . . . . . . . . 104 6.7 Real-time 3D mapping, along with rectified RGB and depth images in RViz, during a flight test at UMD Engineering Annex . . . . . . . . . . . . . . . . . . 106 xi 6.8 The 3D mesh model of UMD Engineering Annex generated by RTAB-Map in a live flight test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.9 Close-up perspective of the mesh model featuring targets: individual (highlighted in red), ”C” ring (displayed in magenta), and manual (depicted in blue). . . . . . 107 6.10 3D mesh map generated by RTAB-Map running on iPhone 14 Pro . . . . . . . . 107 6.11 A gallery of detected humans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.12 A gallery of a detected Landolt ”C” ring that indicates hazardous locations . . . . 108 A.1 A 8 x 11 checkerboard used for calibration . . . . . . . . . . . . . . . . . . . . . 114 A.2 Graphical user interface (GUI) for camera calibration. The enabled Save button indicates that the calibration is complete. . . . . . . . . . . . . . . . . . . . . . 114 A.3 Object detection is executed on a rectified image . . . . . . . . . . . . . . . . . . 115 xii List of Abbreviations AI Artificial Intelligence API Application Programming Interface AMAV Autonomous Micro Air Vehicle ArtIAMAS AI and Autonomy for Multi-Agent Systems ARL Army Research Laboratory BVLOS Beyond Visual Line of Sight CAS Close Air Support CNN Convolutional Neural Network COCO Common Objects in Context COTS Commercial of the Shelf CPU Central Processing Unit CUDA Compute Unified Device Architecture DARPA Defense Advanced Research Projects Agency DCIST Distributed, Collaborative, Intelligent Systems and Technology DDS Data Distribution Service DoD Departement of Defense F3 UMD Fearless Flight Facility FAA Federal Aviation Administration FOV Field of View FTDI Future Technology Devices International GUI Graphical User Interface HTTP Hypertext Transfer Protocol HTTPS Hypertext Transfer Protocol Secure IoT Internet of Things IoU Intersection over Union IMU Inertial Measurement Unit ISR intelligence, surveillance, and reconnaissance JPA Java Persistence API LiDAR Light Detection and Ranging LTS Long Term Support mAP mean Average Precision MAVLink Micro Air Vehicle Link MAVSDK Micro Air Vehicle Software Development Kit MIPI Mobile Industry Processor Interface xiii MPA Modal Pipe Architecture NASA National Aeronautics and Space Administration NDAA National Defense Authorization Act NIST National Institute of Standards and Technology OR Operations Research PPM Pulse Position Modulation PWM Pulse Width Modulation QoS Quality of Service QML Qt Modeling Language R2C2 Robotics Research Collaborative Campus RC Radio Control Re-ID Re-Identification RGB Red, Green, Blue RoCoG Robot Control Gestures ROS Robotic Operating System RRT Rapidly-exploring Random Trees RTAB-Map Real-Time Appearance-Based Mapping RTPS Real-Time Publish-Subscribe SBUS Serial Bus SDK Software Development Kit SITL Software in the Loop SSD Single Shot Multibox Detector TATRC Telemedicine & Advanced Technology Research Center TSP Traveling Salesman Problem UAV Uncrewed Aerial Vehicle UAS Uncrewed Aerial System UAS IPP UAS Integration Pilot Program UGV Uncrewed Ground Vehicle UI User Interface UMD University of Maryland uORBs micro Object Request Brokers UROC UMD UAS Research and Operations Center UTM UAS Traffic Management VIO Visual-Inertial Odometry VRP Vehicle Routing Problem YOLO You Only Look Once xiv Chapter 1: Introduction In recent years, the field of uncrewed aerial systems (UAS) has offered numerous opportunities and capabilities, ranging from search and rescue, precision agriculture, and disaster relief efforts [1]. One particular advancement within this field is the concept of a multi-UAS team, where multiple uncrewed aircraft collaborate and coordinate their actions to achieve complex objectives [2]. An illustrative scenario is a long-distance search operation where several UAS are deployed to search a specific area. However, conducting long-distance operations with UAS in communication- limited environments presents notable challenges and considerations within the domain of aerial autonomy. The ability to establish and maintain communication links is paramount for the safe and effective operation of UAS, particularly when operating beyond the visual line of sight [3]. While UAS may be able to maintain short-range links between nearby vehicles, long distances can introduce signal degradation, latency, and vulnerability to interference [4]. In the absence of reliable long-distance communication, UAS may encounter difficulties in real-time decision making by a groundstation operator, compromising their overall flight safety and efficiency [5]. 1.1 Motivation The confluence of artificial intelligence (AI) and drone technologies has presented a large number of opportunities and challenges across various domains in recent years [6]. The integration 1 of AI and drone technologies revolutionizes the way small military units operate in intricate and challenging environments [7]. Over the past few years, diligent efforts have guided the creation of a comprehensive vignette that unfolds cutting-edge advancements in military capabilities. In one envisioned scenario, the focal point revolves around leveraging AI and drone technologies to provide real-time situational awareness to small units navigating complex terrains [8]. The cornerstone of this vision involves the deployment of a swarm of UAS designed to execute broad area reconnaissance with a primary objective of detecting targets on the ground. Collective intelligence is then relayed to the soldiers on the ground, offering them an improved level of situational awareness and the ability to make informed decisions promptly. Soldiers on the ground also possess the capability to retask individual UAS assets, directing them to revisit specific objects of interest. This ability to adapt and modify the mission in real-time empowers the military units with a flexible and responsive approach to emerging situations in communication- limited environments [9]. Futhermore, the confluence of AI and UAS technologies in search and rescue operations is transformative for first responders. UAS equipped with high-resolution cameras and thermal imaging sensors, can efficiently survey large terrains that would be challenging and time-consuming for ground-based teams to cover [10]. The ability to access these aerial views enhances the overall efficiency of the search, as potential targets or individuals in distress can be spotted more rapidly [11]. Deep learning algorithms, designed for object detection and object localization, empower UAS to quickly identify and localize humans on the ground, even in challenging environments where visibility may be compromised [12]. The capability to swiftly cover vast search areas, combined with AI’s proficiency in swift and precise detection, creates a powerful synergy that improves the effectiveness of search and rescue missions within time constraints [13]. 2 1.2 Relation to the State of the Art In this section, we delve in the current state of the art in several areas related to UAS technology. We explore the advances and challenges with UAS swarm technologies, beyond visual line of sight (BVLOS) operations, communication-limited operations, human-robot interaction, and air-ground coordination. UAS have played an important role in military reconnaissance, surveillance, air interdiction, and close air support (CAS) [14]. UAS also have a wide range of applications in forest fire management [15], civil safety and security [16], agricultural remote sensing [17], weather assessment [18], urban traffic monitoring [19], network relays [20], and disaster relief in Hurricane Katrina, Typhoon Morakot, Tohoku Earthquake, and Haiti earthquake [21, 22]. UAS swarm technology is a paradigm shift in UAS operations, where multiple aircraft collaborate autonomously to achieve common objectives [23]. This approach offers advantages such as redundancy, survivability, scalability, low unit cost, and enhanced mission capabilities compared to a single UAS [24]. Swarm behavior is inspired by natural phenomena like bird flocking and ant colonies, facilitating adaptive and cooperative operations in dynamic environments [25]. Research in UAS swarm technology has emphasized the development of algorithms for swarm formation, task allocation, and cooperative decision-making [26]. Centralized and decentralized control mechanisms present contrasting strategies for orchestrating UAS swarms [27]. In centralized control, directing each UAS’s actions in an entire swarm and coordinating their movements towards a common objective is conducted through a central location. This approach offers advantages such as optimized path planning, streamlined coordination, and enhanced performance monitoring [28]. In 2015, Intel flew 100 drones simultaneously 3 dancing in unison over Germany, and 500 a year later at the Farnborough Air Show [29]. In 2018 PyeongChang Olympic Winter Games, viewers around the world witnessed history coming to life as 1,218 Intel Shooting Star drones decorated the night sky [30]. Research and development of swarm technology has been growing rapidly over the past few years [31]. However, centralized control suffers from a single point of failure, particularly in complex environments where communication is limited or denied [32]. Decentralized control, on the other hand, distributes decision-making authority among individual UAS, allowing each unit to autonomously adapt its behavior based on observations and interactions within its local environment [33]. This approach improves scalability, redundancy, adaptability to changing environments, and survivability to a single point of failure where communication is limited or denied, making it well-suited for scenarios where fault tolerance and distributed decision-making are critical. Advances in distributed decision- making algorithms have empowered UAS swarms to self-organize without centralized control [34]. BVLOS operations enable UAS to operate beyond the operator’s direct line of sight, expanding the range and scope of missions. Recent technological advances have enabled safe and reliable BVLOS operations [35]. Systems designed for sensing and avoidance, incorporating radar, LiDAR, and computer vision, empower UAS to autonomously detect and navigate around obstacles autonomously [36]. Additionally, technologies for remote identification have been developed, facilitating aerial situation awareness. Long-distance communication systems, including satellite links and cellular networks, enable continuous communication between ground control stations and UAS over extended distances [37]. Furthermore, advancements in autonomous navigation and control algorithms enable UAS to navigate in complex environments with little human intervention. Safety remains the most paramount in BVLOS operations, aiming to reduce the risks associated 4 with potential collisions, loss of communication, and loss of properties [38]. One notable initiative addressing challenges associated with BVLOS operations is the DARPA OFFSET program, which seeks to enable autonomous swarm BVLOS behaviors through innovative technologies [39]. Despite the technological advancements, BLVOS operations still face regulatory constraints and public acceptance challenges [40]. The Federal Aviation Administration (FAA) has established stringent guidelines for obtaining BVLOS flight permissions, including rigorous risk assessments and safety protocols [41]. Operating BVLOS in U.S. airspace mandates obtaining a waiver from the FAA [42]. National Aeronautics and Space Administration (NASA) UAS Traffic Management (UTM) Program focuses on integration of small UAS into the national space system for applications like package delivery beyond visual line of sight [43]. The FAA UAS IPP program test and evaluate a wide range of UAS applications such as package delivery, infrastructure inspection, and precision agriculture, emergency management beyond visual line of sight [44]. Efficient collaboration between UAS and ground-based vehicles can be beneficial for conducting missions in complex environments where UAS alone may be limited by airspeed, altitude, network bandwidth, and payload capacity [45]. ARL Distributed, Collaborative, Intelligent Systems and Technology (DCIST) program primarily focuses on developing distributed and collaborative intelligent systems for complex military tasks, including air-ground coordination and human- robot interaction operations [46]. Common communication protocols and interfaces have been developed to empower facilitate integration and interoperability between UAS and ground vehicles [47]. Collaborative mission planning enables real-time adaptation to dynamic environments and mission objectives, optimizing resource allocation and task execution [48]. Advances in human-machine interaction focus on designing intuitive user interfaces that enhance situational awareness and decision making for both UAS and ground vehicles [49]. Despite the progress 5 has been made in improving air-ground coordination capabilities, challenges such as bandwidth constraints and heterogeneous platform integration still persist, requiring continuous research and development efforts [50]. Operating UAS in communication-limited environments such as remote, indoor, or hostile environments presents a unique challenge that requires reliable and robust communication solution [51]. Ad-hoc networking enables UAS to form a self-contained local area network, enabling data exchange in communication-constrained environments [52]. Edge computing and onboard processing enable UAS to perform real-time data analysis and decision-making onboard, eliminating reliance on continuous communication with the ground station [53]. Optimized network protocol has been developed to support communications, where low bandwidth, high latency, and intermittent connectivity is inevitable, ensuring data transmission under communication-limited environments [54]. Communication-limited operations remain challenging due to limited communication range and bandwidth [55]. Raven small UAS Program focuses on using small UAS for Army and Air Force intelligence, surveillance, and reconnaissance (ISR) missions in communication-limited environments [56]. Exploring novel communication paradigms developing resilient communication architectures remain research directions in the near future. Searching remains an age-old problem that continues to garner widespread attention. Path planning for search missions generally fits into the following categories: • Grid-based search: search areas are divided into grids and algorithms like A* or Dijkstra’s are used to scan the entire area. This approach ensures systematic coverage [57]. • Random search: random points in search areas are picked and traversed. This approach is useful when exhaustive grid based search is not feasible. Random search explores the 6 search area more effectively given limited resources [58]. • Graph-based search: graphs are created to represent the search space. The edges are lines of sight or feasible routes, This approach is useful for scenarios with limited visibility or limited feasible routes we can take [59]. • Potential fields: UAS navigate by following attractive and repulsive forces generated by artificial potential fields, which can guide them towards targets while avoiding obstacles. This approach is effective for dynamic environments and obstacle avoidance [60]. • Sampling-based techniques: Techniques like Rapidly-exploring Random Trees (RRT*) are sampling based algorithms. They are efficient to explore large and unknown search areas. These methods can quickly generate feasible paths while avoiding obstacles [61]. • Machine learning: deep reinforcement learning and other machine learning techniques can be used to train UAVs to learn optimal navigation policies from existing data, improving adaptability and performance in diverse search missions [62]. In a complete search mission where the entire search space must be covered, a UAS has to visit every location. The worst-case scenario has a runtime of O(n). However, if the UAS has prior knowledge of the targets, it can prioritize high-probability areas first [63]. Additionally, the UAS can employ active search methods, dynamically adjusting the search area based on real- time data [64, 65]. Reward functions can be used to optimize the search area, thereby reducing the runtime if a complete search is not feasible [66]. RGB, infrared (IR) cameras, LiDAR, RF, acoustic sensors, and stereo cameras are among the popular sensors utilized for target detection and localization [67]. LiDAR and radar are known 7 for their precision in long-range detection, although they come with the disadvantage of added weight [68]. Conversely, acoustic sensors offer limited range. RGB, IR, and stereo cameras, on the other hand, are lighter sensors, offering moderate range and accuracy [69]. We employ a single RGB camera for small UAS to detect and localize objects. Running advanced perception algorithms on UAS involves performing complex data processing tasks, such as image processing and object detection, directly on edge devices [70]. This capability facilitates onboard decision-making and autonomous missions, reducing latency and reliance on high-bandwidth networks [71]. Handling intensive algorithms on edge devices in real-time presents several challenges: Edge devices typically have less processing power, memory, and storage compared to centralized servers, making it difficult to run complex algorithms efficiently [72]. High computational loads can generate significant heat, which can be challenging to dissipate in small, enclosed edge devices [73]. Advanced perception algorithms often need to be optimized or refactored to fit the resource constraints of edge devices without compromising performance [74]. Many UAS are battery-powered, requiring algorithms to be energy-efficient to avoid rapid battery depletion. Keeping the software and algorithms on edge devices up to date and secure requires robust maintenance strategies and efficient update mechanisms [75]. Since 2018, over 16,400 papers on Google Scholar include the keywords multi-agent, UAV, and search. However, only 663 of these papers feature the key phrase flight test or field experiment. Among these, 63 papers feature keyword UGV. Table 1.1 compares the papers that have conducted field experiments involving multi-vehicle systems in search missions. Most of these studies involve swarms of 2-3 UAVs or UGVs, utilize YOLO for object detection, and use a 2.4 GHz data link. In addition to previously published work, our research introduces human- robot interaction with smart binoculars and custom UI. Additionally, we present the experimental 8 Ref. Year #UAV #UGV Human- robot Interaction BVLOS Ops Data Link Onboard Perception [76] 2018 1 1 N/A No N/A None [77] 2020 1 1 Custom UI No 2.4 GHz and 5.8 GHz (Ubiquity) YOLO [78] 2021 2 0 N/A No 2.4 GHz and 5 GHz WiFi YOLO [79] 2021 1 3 Smart Phones No 5G and 2.4 GHz (DJI) None [80] 2021 2 2 Custom UI No 4G YOLO [81] 2021 3 0 N/A No 2.4 GHz WiFi Color/Blob [66] 2023 1 2 N/A No 2.4 GHz and 5.8 GHz (Little Hexy) YOLO [82] 2023 3 0 N/A No 2.4 GHz (DJI) RCNN + VGG16 [83] 2023 3 3 N/A No 5G CNN [84] 2023 3 0 N/A No 2.4 GHz WiFi None [85] 2024 1 1 N/A No 2.4 GHz (DJI) None Our work 2024 2 1 Smart Binocular and Custom UI Yes 2.4 GHz (Doodle) and 900 MHz (Microhard) Mulitiple Table 1.1: Surveys of UAS challenges experimentally addressed. Multiple perception algorithms include YOLO, target localization, person re-identification, and action recognition results of advanced perception algorithms such as object detection (YOLO), object localization, person re-identification, and action recognition excuted onboard. Furthermore, we push the boundaries of BVLOS operations and the implementation of an efficient data link using the sub-GHz band. The sub-GHz band offers superior performance in covering long distances and penetrating obstacles. 1.3 Technical Approach In this dissertation, we aim to address several common challenges in UAS, such as UAS swarming, human-robot interaction, air-ground coordination, long-range operations with limited 9 bandwidth, and the onboard execution of advanced perception algorithms. This comprehensive approach focuses on advancing aerial autonomy, advanced perception executed on UAS, user interfaces, and communication links. It builds on existing knowledge while introducing innovative solutions to enhance operational capabilities in complex environments. We conducted flight tests to validate the feasibility of our solutions under real-world conditions. We developed a novel search and revisit framework for long-range operations using the ARL aerial autonomy stack. This framework includes three distinct system configurations with improving levels of autonomy. The first configuration employs a centralized control model in a star topology, with UAS under operator supervision. Flight plans are created and uploaded using MAVSDK, a library of APIs for UAS interaction. The advanced perception algorithms were refactored for execution on the UAS, which has an onboard computer running Ubuntu 18 with Python 3.6 and TensorFlow 2.7. The second configuration operates without operator supervision; UAS autonomously generate flight paths for the PX4 flight stack onboard. In this setup, robotic platforms communicate through a mesh network, allowing UAS to maintain short-range links between vehicles even if communication with the ground station is lost during long-range operations. The third configuration further enhances autonomy by using the ARL navigation stack instead of the PX4 flight stack. A search area is defined in a YAML file and uploaded to the vehicles via a custom user interface. The navigation stack uses a custom-built behavior tree to plan flight paths for both search and revisit phases. The UAS remain connected through a mesh network, enabling long-range operations without operator input. To address the limitations of existing public datasets for training neural networks running on UAS, we created a dataset specifically tailored for high-altitude operations. Additionally, 10 we enhanced the stability of the UAS camera server, quadrupling the camera resolution and preventing silent failures during missions, especially under high-resolution settings. This initiative significantly improved the performance of our object detection model. While existing object localization techniques using RGB-D cameras or LiDAR are suitable for vehicles with large payloads, we developed an improved localization algorithm for small UAS. This algorithm combines the camera pinhole model with Gaussian Mixture Models (GMM) using a single RGB camera module. A simple camera pinhole model alone results in large localization errors due to factors like fast camera movement, GPS inaccuracies, and timestamp mismatches between the camera and UAS pose. By assuming that the localization estimates of the same target follow a Gaussian distribution, with the center representing the most probable target location, the use of GMM significantly reduces localization errors. We also focused on enhancing user interaction with UAS through developing two specialized ground control interfaces. The first interface allows operators to track UAS positions, flight paths, and target locations in real-time, facilitating effective air-ground coordination and human-robot interaction. This UI supports path planning and enhances situational awareness by providing access to target images. The second interface is designed for search and rescue missions, allowing first responders to view 3D maps, target locations, and detection images in bandwidth-limited environments. By integrating AliceVision, an open-source photogrammetry framework [86], we improved modeling quality and enhanced situational awareness for first responders. The software is modular, allowing easy migration to different UAS platforms. The development process follows a test-driven and agile approach, ensuring reliability and efficiency. By developing automated tests alongside production code, we can quickly introduce new features to meet operational requirements. 11 ROS is popular in robotics due to its modularity, open source nature, and large community support, with one of its key features being its communication infrastructure. However, ROS introduces overhead, impacting performance in real-time or latency-sensitive applications. In bandwidth-limited conditions, we observed that the ground station fails to receive ROS messages from UAS when bandwidth is less than 8 Mbps. To facilitate robotics in such conditions, we use TCP/IP sockets for transmitting raw data to the ground station. The ground station then reconstructs ROS messages using this data, enabling their use in ROS-based applications. Sockets provide a fundamental building block in network communication, offering low-level control over the communication process, data encoding, and transmission, operating with lower overhead than higher-level frameworks like ROS. With this configuration, we can precisely control what to send and how much data to send to the ground station in a network with less than 3 Mbps. 1.4 Contributions The contributions of this dissertation tackle several common challenges faced by the UAS. Many of these findings have been published [87,88] or demonstrated in the 2023 and 2021 NIST UAS First Responder Challenges [89, 90], where our team achieved an overall 3rd place with three out of five Best-In-Class awards and an overall 1st place with First Responder’s Choice award. 1.4.1 Novel Collaborative Search Framework with ARL Aerial Autonomy Stack We have developed and experimentally evaluated a novel search and revisit capability using the ARL aerial autonomy stack, designed for long-range operations. This framework 12 encompasses the development of aerial autonomy software, simulation tools, and a ground control user interface. Three system designs with improving levels of autonomy have been created. In one long-range field experiment, our framework detected and localized 67% of targets 500 meters away from the take-off point in under 10 minutes. 1.4.2 Improved Object Detection and Localization with Aerial Images We created our own aerial dataset to retrain the object detection model running on the UAS and improved the stability of the camera server. The overall approach improves precision from 0.586 to 0.731 and recall from 0.088 to 0.682. Additionally, we quantified the localization errors and detection probability, developing an improved object localization algorithm for small UAS using a combination of a camera pinhole model and a clustering algorithm. This enhancement reduces localization errors from 4 meters to an average of 2 meters and more than doubles the probability of detection. Finally, we meticulously designed and implemented an AprilTag localization algorithm that allows the UAS to correctly localize AprilTags from 6 meters away using 720p images captured by a single RGB module. 1.4.3 Specialized Ground Control User Interfaces Two specialized ground control user interfaces have been created to offer features not found in existing open-source and proprietary ground control software. The first interface allows for real-time tracking of UAS positions, flight paths, and target locations. It also facilitates path planning and tackles the challenges of air-ground coordination 13 and human-robot interaction that other interfaces overlook. The second interface is designed specifically for viewing 3D maps during indoor search missions in bandwidth-limited environments. Both user interfaces display target locations and images along with a satellite map for outdoor missions or a 3D map for indoor missions, greatly enhancing situational awareness for operators. 1.4.4 Efficient Data Link for Aerial Operations in Bandwidth-limited Conditions We designed and implemented a reliable data link specifically for UAS operating in bandwidth- limited environments within the sub-GHz band. Our approach employs TCP/IP sockets to transmit raw data from the UAS to the ground station, where the ground station reconstructs ROS messages from the socket data for use in ROS-based applications. This solution allows UAS to create 3D maps and localize missing persons and hazards in real time while maintaining bandwidth usage under 3 Mbps in the sub-GHz band. The data link is robust and outperforms the 2.4 GHz RadioMaster radio control link, which typically has a range of 1 to 2 kilometers. 1.5 Outline of Dissertation In Chapter 1, motivation, relation to the state of the art, technical approach, and contributions of the dissertation are presented. In Chapter 2, experiment testbeds are introduced, encompassing open-source autonomy software, simulation and visualization tools, unmanned aerial systems, and experimental flight facilities. Chapter 3 delves into our autonomy stack and software architecture concerning the search 14 and revisit framework. It commences with an exploration of the problem statement and solution design, followed by discussions on mission planning and adaptation. Additionally, it covers behavior tree modeling for achieving fully autonomous configuration, target detection and localization, advanced perception models, and introduces the ground station software crafted specifically for our needs. Chapter 4 elaborates on the utilization of aerial imagery for object detection and localization. It outlines the process of fine-tuning neural networks to suit various mission requirements, discusses pertinent loss functions and performance metrics, delineates detection and localization performance through a structured workflow, and examines choices for AprilTag detectors. Chapter 5 presents simulation and experimental results from multi-vehicle autonomous search missions, including outcomes from centralized and decentralized control, as well as BVLOS scenarios. Chapter 6 outlines the design and implementation of an efficient communication link tailored for ROS-based robotics operating within bandwidth-constrained environments. It also showcases the experimental results from a single UAS conducting 3D mapping and search missions in bandwidth-limited environments. Finally, Chapter 7 summarizes the dissertation’s contributions and discusses ongoing and future endeavors. 15 Chapter 2: Experiment Testbeds This chapter presents a comprehensive overview for implementing aerial autonomy within robotic platforms and ground stations. It covers autonomy software implemented on both aerial robotic platforms and ground stations. Additionally, it introduces simulation and visualization software essential for evaluating and visually representing the autonomy software’s performance. Furthermore, it explores the uncrewed aerial robotic systems we have developed and utilized since the inception of the project, discussing their key onboard electronics, capabilities, and applications. Lastly, it provides insights into experimental flight facilities, which serve as crucial environments for empirically testing autonomous technologies in real-world environments. 2.1 Autonomy Software This section introduces aerial autonomy software we leverage to orchestrate search and revisit missions. 2.1.1 MAVSDK The Micro Air Vehicle Link (MAVLink) Software Development Kit (MAVSDK) is an open-source initiative facilitating the development of MAVLink applications for UAS. MAVLink is a lightweight and efficient protocol designed for communication between systems in the robotics 16 and ground control stations. MAVSDK, developed by the Dronecode Foundation, offers a comprehensive set of Application Programming Interfaces (APIs) and libraries that enable developers to interact with various UAS platforms using a standardized and user-friendly interface. It supports multiple programming languages, including C++, Python, Java, and Swift, making it accessible to a wide range of developers. MAVSDK aims to simplify the integration of autonomous flight capabilities, payload control, and telemetry data retrieval, enabling innovation in the field of drone applications [91]. 2.1.2 Pixhawk Project 4 Autopilot Pixhawk Project 4 (PX4) Autopilot is open-source flight control software offering a flexible and modular architecture capable of controlling a wide range of UAS, from quadcopters to fixed- wing aircraft. Additionally, PX4 provides a variety of flight modes, including manual, stabilized, altitude hold, position hold, mode, and autonomous offboard mode. Furthermore, PX4 supports various protocols for communication and control, including MAVLink for small UAVs, Real- Time Publish-Subscribe (RTPS) within the Data Distribution Service (DDS) framework for real- time data exchange between distributed systems, MAVROS for communication between PX4 and ROS nodes, as well as Pulse Position Modulation (PPM), Serial Bus (SBUS), and Pulse Width Modulation (PWM) for various Radio Control (RC) systems [92]. We leverage a speical version of PX4 that has RTPS enabled. RTPS is another protocol designed for real-time communication in distributed systems. It is not limited to MAVLink specific applications, and can be used in a wider range of applications within a network. While MAVLink is essential for mission-critical communication between the autopilot and ground control station, RTPS complements MAVLink 17 by providing a more flexible and versatile communication framework for interconnecting various software modules and components within the PX4. 2.1.3 Robot Operating System 2 Robot Operating System 2 (ROS2) is an open-source framework that offers modern middleware layer that runs on top of a conventional operating system. Evolve from ROS, ROS2 provides enhanced features and capabilities to meet the demands of advanced robotics applications. ROS2 also offers redesigned communication middleware called Data Distribution Service (DDS), which enables efficient message passing and interoperability between nodes. Moreover, ROS2 incorporates security features, including support for secure communication channels and access control policies, making it suitable for deployment in environments where safety and privacy are paramount. ROS2 empowers developers to create robust, scalable, and secure robotic systems for various applications, ranging from research and education to industrial automation and beyond [47]. 2.1.4 MAVericks Our aerial autonomy stack is largely based on ARL MAVericks with additional capabilities developed by UMD researchers. MAVericks is a ROS2-based autonomy stack that works across both simulation and small UAS. It is largely leverages functionality of open-source and DoD- owned software. MAVericks can run on Modal AI VOXL and VOXL2, and it is capable of behavior tree navigation, object detection and localization, precision landing, multi-agent teaming, obstacle avoidance, digital elevation maps, OpenVINS visual inertial odometry [93]. It contains simulation environments in Unity suitable for verifying advanced perception algorithms in a 18 software-in-the-loop (SITL) simulation [87]. MAVericks also contains UGV bridge for ROS1 systems. It serves a pipeline to transition the state-of-the-art algorithms developed by academia into DoD [93]. 2.2 Simulation and Visualization Software This sections presents commercial-of-the-shelf (COTS) and custom-built simulation and visualization software we leverage to comprehend and test the aerial autonomy stack. 2.2.1 Unity Our autonomy algorithms undergo rigorous testing in the Unity simulation environment before any field experiments are conducted. Unity is a powerful tool in the field of robotics simulation, offering a versatile platform for developers to create highly realistic environments. With its extensive library of assets, Unity enables users to accurately model the dynamics of robots and their interactions with the environment. Additionally, it is flexible to integrate with other software and hardware components, facilitating the development and testing of algorithms in perception, planning, and navigation [94]. To accurately replicate the flight conditions in simulation, our team has developed 3D Unity scenes using photogrammetry. Also, our team is invited to compete in DARPA Triage Challenge, which focuses on the development of novel physiological features for medical triage in mass casualty incidents (MCIs), where medical resources are insufficient compared to the demand [95]. UAS are tasked with detecting AprilTags on A4 pages from a stand-off distance. Figure 2.1 shows that we have incorporated both manikins and AprilTags into our Unity scene, enabling us to test our detection algorithms and determine the 19 optimal flight altitude in simulation. Figure 2.1: Simulated manikins and AprilTags are added to the 3D scene to facilitate testing algorithms in Unity Figure 2.2: Two cars are added to the 3D scene to enable testing algorithms detecting large targets with the pre-trained detection models 20 Figure 2.2 illustrates the Unity scene that consists of two cars, allowing for the testing of detection models pre-trained by the COCO public dataset within Unity [96]. 2.2.2 QGroundControl QGroundControl is a well-known open-source ground control station software, offering comprehensive capabilities for commanding and controlling UAS. QGroundControl provides an interface for mission planning, telemetry monitoring, and vehicle control, empowering both hobbyists and professionals in the UAS community. Users can efficiently plan missions, monitor real-time telemetry data, and execute precise flight operations with its user interface and extensive feature set. QGroundControl enables collaboration and innovation, allowing developers to contribute improvements, customize new features, and integrate capabilities seamlessly into various UAS platforms. QGroundControl is developed using C++ and Qt Modeling Language (QML), leveraging the Qt framework to create responsive GUI across various platforms [97]. 2.2.3 Custom User Interface We developed a custom user interface to monitor, command, and control robotic platforms, providing additional capabilities not offered by QGroundControl. It is implemented in React [98] and Spring Boot [99] and brings together the power of two robust technologies to create dynamic and responsive web user interface. React, a popular open-source JavaScript library for building client-side components, provides a component-based framework that enables developers to create modular UI components. Spring Boot, a framework for building Java-based enterprise applications, offers simplicity and convention over configuration, allowing developers to quickly 21 set up and deploy new HTTP endpoints. By developing automated tests written in Jest [100], JUnit [101], and Cypress [102] alongside production code, new features can be introduced quickly to meet operational requirements. Figure 2.3 shows other technologies used in the user interface tailored for one of the outdoor missions. Google Maps Platform offer a comprehensive set of APIs and SDK that enable developers to integrate various mapping and location-based services into their applications [103]. Google OR-Tools [104] offers a collection of optimization algorithms and tools for solving various combinatorial optimization problems such as Vehicle Routing Problems (VRP) [105]. Node.js is an open-source, server-side JavaScript runtime environment that allows developers to build scalable and efficient web applications [106]. MySQL is a open-source database management system known for its reliability, performance, and ease of use. It is a key component of many web applications and software systems, powering data storage and retrieval for a diverse range of applications [107]. Docker is a popular technology used for containerization, enabling developers to package applications and their dependencies into isolated, lightweight containers that can run consistently across various environments. [108]. In this dissertation, both the user interface and MAVericks have been dockerized to enhance consistency, enabling fast deployment and testing. Figure 2.3: Some technologies used in custom user interface 22 In Figure 2.3, Hypertext Transfer Protocol Secure (HTTPS) is an extension of the HTTP protocol used for secure communication between React frontend and Spring Boot backend. The backend calls MySQL through Java Persistence API (JPA), which allows developers to interact with relational databases using Java objects. The user interface is designed with modularity in mind, allowing for easy migration to different UAS platforms. The development process follows a test-driven and agile approach, ensuring the reliability and efficiency of the software. 2.3 Uncrewed Aerial Systems This section presents NDAA-compliant uncrewed aerial vehicles (UAVs) as well as long- range communication systems and radio control (RC) transmitters utilized within this dissertation. These components are the cornerstone of our project, facilitating testing of autonomous behaviors of the UAS with precise control and robust data transmission. 2.3.1 Radios Operating UAS in environments where communication is limited poses significant challenges and requires careful consideration within the realm of aerial autonomy. The ability to establish and maintain communication links is paramount for the safe and effective operation of UAS, particularly when operating beyond the visual line of sight (BVLOS) [3]. While UAS may be able to maintain short-range links between nearby vehicles, long distances can introduce signal degradation, latency, and vulnerability to interference [4]. In the absence of reliable long-distance communication, UAS may encounter difficulties in real-time decision making by an operator, compromising their overall flight safety and efficiency [5]. Wireshark is employed to validate the 23 Figure 2.4: Wireshark is employed to validate the QoS in ROS2, monitor network packets, and assess bandwidth utilization within a mesh network consisting of three UAS and one groundstation Quality of Service (QoS) in ROS2, monitor network packets, and assess bandwidth utilization within a mesh network. This practice prevents unnecessary data exchange that could degrade communication performance between UAS during BVLOS operations. Figure 2.4 illustrates a significant exchange of packets between three UAS and the ground station, with ROS2 messages being repeatedly transmitted to the ground station, constraining the available network bandwidth. The QoS setting of the tf topic is set to reliable which is one of the ROS2 QoS settings that guarantees message delivery. Each tf message is about 3.7 KB in size. The publish rate is erroneously set to 70 Hz which consumes 2.59 MBps. When network bandwidth is limited, undelivered messages are repeatedly resent until successfully delivered, thereby exacerbating the strain on the already overburdened network. Fiona is the custom user interface that subscribes to the ROS2 topics from the UAS to process on the ground station. Figure 2.5 shows that packets flow freely between the same UAS and the ground station. 24 Figure 2.5: Free-flowing network After the initial handshake between the ground station and the UAS when Fiona is started, the UAS only transmit about 300 packets per second or 50 KBps of data to the ground station. 2.3.1.1 Doodle Labs Radios Doodle Labs radios represent a cutting-edge advancement in wireless communication technology, offering a diverse range of solutions for robotics, Internet of Things (IoT), and mission-critical infrastructures [109]. In this thesis, a 2.4GHz Doodle embedded radio is used for the groundstation and 2.4GHz Doodle mini radios are used on all the UAS. Compared to Doodle minis, embedded radios offer a higher network throughput with a larger form factor as shown in Figure 2.6. All the radios are set to operate at 2.450 GHz with 15 MHz bandwidth, which has the maximum transmission power among all the available frequencies. The data link is encrypted with WPA2-PSK AES 128 bit for ArtIAMAS and 256 bit for DAPAR Triage Challenge to protect sensitive data. The maximum throughput with encryption is capped at 60 Mbps and 25 Figure 2.6: (left) Doodle embedded radio and (right) Doodle mini radio (Credit: Doodle Labs) 12 Mbps, respectively. Figure 2.7 shows the configuration page where the frequency, bandwidth, and encryption can be set up. The firmware on the radios have been upgraded to the 2023 Long Term Time (LTS) which offers central configuration, frequency hopping, and link recovery [110]. Figure 2.7: Software configuration page for the Doodle radios. Channel is set to 51 and Bandwidth is set to 15MHz 26 2.3.1.2 Remote Control Transmitter Jeti Remote Control (RC) transmitters (Figure 2.8b) are used to adhere to the provisions and requirements outlined within National Defense Authorization Act (NDAA). In recent years, NDAA compliance has gained significant attention due to provisions related to cybersecurity, especially concerning the use of products and services from certain countries or companies that may pose risks to national security. The transmitters are made in the Czech Republic, offering high-quality radio control systems operating in 2.4GHz and 900MHz bands for hobbyists and professional radio operators. (a) 915 MHz Dragon Link on Jeti Transmitter (b) 900 MHz Jeti transmitter Figure 2.8: Sub-GHz RC transmitters used in this project : (a) 915MHz Dragon Link mounted on 2.4 GHz Jeti Transmitter, and (b) 900 MHz Jeti Transmitter To avoid interference with 2.4 GHz Doodle radios, either a 900 MHz Jeti transmitter or a 2.4 GHz Jeti transmitter with a 915 MHz DragonLink add-on module is used. The DragonLink is also a long-range RC module offering a range over 50 kilometers [111]. 27 2.3.2 Uncrewed Aerial Vehicles Since the inception of our project, we have been exploring cost-effective and NDAA- compliant UAV platforms. Modal AI m500 serve as our stepping stone, providing us with valuable insights into autonomy stack and the onboard electronics. As our project mature, we transition to the RB5 platforms, which offer enhanced capabilities and performance. Ultimately we migrate to the Modal AI VOXL2-based platforms such as ARL Grazer and UMD Chimera, which further enhance the reliability of our fleet. Considering that our autonomy stack relies on both MAVLink and RTPS communications, it becomes essential to adapt existing aerial robotic platforms to support both serial communication protocols. 2.3.2.1 Modal AI m500 Based Quadcopter Modal AI m500 only contains MAVLink communication between the onboard computer and the flight controller by default. Modification is needed to incorporate a serial cable for the operation of the autonomy stack on Modal AI m500 platforms. One solution is to connect J1010 UART on the VOXL flight controller to the USB port on the Modal AI VOXL Add-On board mounted on the VOXL onboard computer via an FTDI adapter. The FTDI adapter converts signals between USB and serial communication protocols, enabling RTPS communication between the VOXL Flight onboard computer and the flight controller. 1. RTPS is used to connect PX4 micro Object Request Brokers (uORBs) to ROS2 topics, with data transmission handled by an external FTDI adapter. uORBs serve as a messaging infrastructure within the PX4 autopilot system. 28 2. MAVLink is the communication protocol that connects QGroundControl with the flight controller, utilizing the internal UART provided by the m500 for transmission. Figure 2.10 depicts the modified Modal AI m500 featuring a mounted FTDI adapter. The UART is converted into VOXL USB using this adapter, eliminating the need for specific drivers. Furthermore, a dedicated FTDI adapter has been developed to link J1010 to the USB expansion port of the Microhard Add-On board. Any other USB Expander with host capabilities operating at 3.3V logic should also be compatible. While standard FTDI adapters will work, a custom one is developed by ARL to minimize the weight and size. Figure 2.9: Modal AI m500 with FTDI adapter 2.3.2.2 Modal AI RB5 Based Quadcopter Even though PX4 has the capability to operate on the Qualcomm QRB5165 premium tier robotics processor found in RB5, executing RTPS-enabled PX4 directly on the Qualcomm 29 Figure 2.10: Wire diagram of Modal AI m500 based quadcopter tailored for running MAVericks. The FTDI adapter enables RTPS communication between VOXL Flight onboard computer and the flight controller. processor poses challenges. An external Pixracer flight controller is added to the RB5 to bypass the limitation. Additionally, two serial cables are added to facilitate RTPS and MAVLink communications. Figure 2.11 shows the wire diagram of the modified RB5 platform tailored for running MAVericks. The USB Ethernet adapter is utilized to connect to the Doodle Labs radio for long-range communication. The theoretical communication range of the radio exceeds 10 km and it can achieve a maximum throughput of 100 Mbps. Figure 2.12 shows an advanced 915 MHz Dragon Link is added to Jeti transmitters to avoid interference with the Doodle Labs radios [111]. The RB5 platforms leveraging Integrated Qualcomm QRB5165 premium processor offer enhanced capabilities and performance. 30 Figure 2.11: Wire diagram of the RB5 with Pixracer flight controller tailored for running MAVericks (a) Dragon Link mounted on Jeti Transmitter (b) Modified RB5 Platform Figure 2.12: Jeti transmitter with Dragon Link and retrofitted RB5 platforms for MAVericks autonomy stack 31 2.3.2.3 ARL Grazer-A VOXL2 is ModalAI’s latest autopilot, also powered by the Qualcomm® Flight RB5 5G platform with a smaller form factor. In addition to VOXL2, the other major electronic components on the ARL Grazer (Figure 2.13a) include the mRo Control Zero H7 flight controller and the 2.4 GHz Doodle Labs mini radio. Compared to embedded radios, the theoretical communication range of the mini radio exceeds 10 km, and it can achieve a maximum throughput of 80 Mbps. The high-resolution camera is configured to face the ground at an angle of 15 to 60 degrees. The system diagram in Figure 2.13b illustrates the wiring connections between the VOXL2 onboard computer, the flight controller, and the Doodle Labs mini radio, enabling long-range communications. Rather than mounting a 915MHz Dragon Link module, we directly use 900 MHz Jeti transmitters to avoid interference with the Doodle Labs radios. Additionally, we utilize Sony Starvis IMX412 12MP (4K) MIPI Camera Module for onboard perception algorithms. (a) ARL Grazer (b) System diagram Figure 2.13: Custom UAS platform by Survice Engineering - ARL Grazer and system diagram developed for MAVericks autonomy stack 2.3.2.4 UMD Chimera We have also worked with UROC to configure MAVericks running on UMD Chimera, which has the same electronics as ARL Grazer. Rather than mounting the mRo Control Zero 32 and Doodle mini on a proprietary Survice carrier board, Chimera uses the Modal AI USB 3.0 UART Add-on board to expose a USB port and a UART port [112]. The Doodle mini connects to VOXL2 via the USB as shown in Figure 2.14. Figure 2.15 illustrates the wiring diagram for the Figure 2.14: Doodle mini is connected to VOXL2 through the USB port open by the Modal AI USB 3.0 UART Add-on mRo Control Zero H7 flight controller and VOXL2. The addition of the USB 3.0 Add-on exposes one UART for MAVLink communication, while the M0076 interposer [113] exposes the other UART on VOXL2 for RTPS communication. A bidirectional voltage level shifter is employed as J8 operates at 1.8V logic levels while the flight controller operates at 3.3V logic levels [114]. Figure 2.16 shows the wiring diagram of the Doodle mini radio, VOXL2, and mRo Control Zero H7 at a high level. This wiring diagram bypasses the Survice proprietary carrier board, 33 Figure 2.15: Wiring diagram of mRo Control Zero H7 and VOXL2 via USB 3.0 Add-on and M0076 Interposer reducing the cost of the Chimera by around $4,000. Figure 2.16: Wiring diagram for UROC Chimera tailored for running MAVericks 34 2.4 Experimental Facilities In addition to the prototyping and software integration carried out through the simulation process using software-in-the-loop testing, we conduct comprehensive flight tests to validate the efficacy and robustness of our autonomy software on custom quadcopters built upon the Modal AI VOXL, RB5, and VOXL2. This section provides an insight into the infrastructure and resources that constitute our flight facilities. By elucidating the key aspects of these facilities, we aim to ensure the reliability and effectiveness of our autonomy software in real-world applications. 2.4.1 UMD Fearless Flight Facility The Fearless Flight Facility (F3) is an outdoor flight facility for UMD College Park exclusively designated for testing UAS in the National Capital Region, where restricted airspace is imposed. With dimensions of 100 feet in width, 300 feet in length, and 50 feet in height, this facility serves a critical UAS testing site near the campus (Figure 2.17). F3 empowers us to conduct quick flight tests, ensuring a fast turnaround time for the evaluation and validation of the software on the robotic platforms. 2.4.2 UROC Raley Farm Our team is also in close partnership with UMD UAS Research and Operations Center (UROC). Situated in Southern Maryland, Raley Farm is a 70-acre field serving as the primary flight hub for the UMD UROC (Figure 2.18). UROC is one of the select few institutions across the nation directly collaborating with the FAA to drive forward UAS research and demonstrate 35 Figure 2.17: Fearless Flight Facility (F3) near the University of Maryland College Park Campus operational capacities. Leveraging UROC’s expertise in operational protocols through the partnership, we can consistently assess our autonomy and enhance perception algorithms within a high- fidelity operational environment. Figure 2.18: Raley Farm flight faciltiy near the UMD UAS Research and Operations Center (UROC) in Southern Maryland 36 Chapter 3: Autonomy Stack and Software Design This chapter outlines both the problem statement and the corresponding resolutions, encompassing perception, decision-making, and navigation components. Moreover, it discusses on navigation mission planning and adaptation specifically tailored for UAS. Additionally, it explores the methodologies behind target detection and localization utilizing computer vision and machine learning algorithms. Furthermore, it introduces two advanced perception algorithms aimed at identifying additional intelligence regarding the identified targets. Finally, it explores ground control software, which facilitates mission planning, air-ground coordination, and human-robot interaction. The ground control software also enhances situational awareness by continuously tracking robot positions, flight paths, target locations, and target images in real-time. 3.1 Problem Statement and Solution Designs Our primary challenge revolves around the necessity for the UAS to search, detect, and localize ground targets with high accuracy while minimizing time. This requires a sophisticated software architecture capable of handling sensor inputs, processing complex data streams, and making intelligent decisions, all while ensuring the safety and reliability of the UAS. Our search and revisit strategy involves two phases, during the initial phase, the search area is divided into smaller regions of similar size. Subsequently, lawnmower patterns are generated 37 for the UAS to sweep the search at a high altitude, the onboard object detection algorithm detects and localizes target-like objects on the ground. After the broad area survey, the UAS collectively plan the flight paths to revisit the targets and to extract additional information of the targets with the advanced perception algorithms. Figure 3.1 is a high-level view for the search and revisit algorithm. Our approach maximizes search coverage while minimizing flight time. Figure 3.1: Algorithm for the search and revisit: (left) preliminary broad area survey detecting targets on the ground and (right) subsequent target revisits involving multiple autonomous UAS and perception The solution design involves the creation of an autonomy stack, comprising perception, decision-making, and navigation. The perception module focuses on sensor fusion, incorporating data from a single RGB camera, IMU, and GPS to detect and localize targets. The decision- making module tracks status of a mission, analyzing the target data for mission replanning. Our autonomy stack is based on ARL’s MAVericks with additional capabilities developed by UMD researchers. Our solution emphasizes modularity and adaptability. The additional capabilities are developed with a set of interchangeable and upgradeable components, allowing for easy integration with MAVericks. This modularity facilitates scalability ensuring the autonomy 38 Figure 3.2: Configuration facilitating centralized control and PX4 flight stack Figure 3.3: Full-autonomous configuration with PX4 flight stack 39 Figure 3.4: Full-MAVericks configuration with MAVericks navigation stack stack can evolve alongside advancements in the capabilities. Here are three configurations for conducting search and revisit mission we have developed during the evolution of our autonomy stack. Figure 3.2 shows the configuration where the robotic platforms are centralized controlled. Mission plans are developed in JSON on the ground station and then transferred to the UAS via MAVSDK [91]. Most of the AI/ML algorithms such as object detection, advanced perception algorithms, and object geolocalization are executed on the UAS. The ground station tracks mission status and performs path planning with a VRP solver. ROS2 microservices collect telemetry data for Fiona, enabling real-time tracking of UAS positions, speed, and object locations. All UAS are linked to the common Doodle network through a star topology, providing both high throughput and scalability. The centralized control configuration is ideal for missions requiring continuous monitoring and operator in the loop. Figure 3.3 shows full-autonomous configuration where the mission plans for the initial 40 broad area survey are developed in JSON on the ground station and then transferred to the UAS via MAVSDK [91]. The UAS tracks the mission status and update the missions onboard. Subsequently, the missions are uploaded to PX4 flight deck via MAVSDK by the UAS. All UAS are connected to the same Doodle mesh network which ensures that object locations are synchronized across the UAS, guaranteeing that the VRP solver returns the same paths. The UAS may not have continuous communication with the ground station, but the UAS in close range can communicate with each other. This setup is well-suited for operations involving some operator control and beyond visual line of sight. Figure 3.4 illustrates fully autonomous configuration leveraging MAVericks navigation stack. This configuration involves running a variety of AI/ML algorithms on the UAS. The UAS also tracks mission status using a behavior tree and performs path planning with a VRP solver. All UAS are connected to the same Doodle mesh network. MAVericks navigation stack offers higher agility over PX4 flight deck enabling UAS to pivot quickly. The full MAVericks mode is suitable for running search and revisit missions beyond visual line of sight and facilitating incorporating other MAVericks capabilities developed by the community. The ROS2 microservices operating on the ground station retain the capability to gather telemetry data for the custom user interface as long as communication remains uninterrupted. This facilitates the real-time monitoring of UAS positions, speed, and object locations. 3.2 Mission Planning and Adaption Mission planning is a critical aspect of autonomous systems, involving the generation of a sequence of actions to achieve predefined objectives. Adaptation is a key feature, allowing the 41 autonomous system to adjust its behavior in the environment effectively, allowing the system to navigate complex scenarios. Our search framework spans across two distinctive phases. In the initial phase, the search expanse undergoes segmentation into smaller, uniformly sized regions. Subsequently, lawnmower patterns are generated for the UAS to systematically sweep the search area at high altitudes, while the onboard object detection algorithm identifies and geolocates targets on the ground. After the initial broad area survey of the expansive area, the UAS collaboratively replan flight paths to revisit the detected targets, facilitating the extraction of close-up imagery and supplementary intelligence for operators. 3.2.1 Broad Area Survey When creating lawn mower patterns for multi-agent systems, it’s crucial to divide a large area into smaller ones of similar size, ensuring each agent covers a comparable area. To achieve this objective, our approach starts with uniformly sampling the large area. Subsequently, we employ K-means to partition the search area into K smaller segments, as illustrated in Figure 3.5. Figure 3.5a demonstrates the division of a polygon into three segments of comparable size, while Figure 3.5b demonstrates the partitioning of a pentagon into five equal sized segments. The distance between two dots is determined by the track spacing. Subsequently, we employ a grid-based algorithm to connect the dots in each segment to generate the lawn mower patterns for the UAS [115]. The approach is also used by QGIS, which is a free and open-source cross-platform desktop geographic information system (GIS) application [116, 117]. 42 (a) Divided into small 3 polygons (b) Divided into small 5 polygons Figure 3.5: K-means is utilized to split a large polygon into an arbitrary number of small polygons. 3.2.2 Target Revisiting We use the Vehicle Routing Problem (VRP) solver for path replanning. VRP is an extension of Traveling Salesman Problem (TSP) for multi-agents. The goal of TSP is to find routes with the least total distance for one salesman visiting a set of locations. The TSP can be seen as a combinatorial optimization and an integer linear programming problem [118]. One of the formulations is described as follows. Let nodes be numbered 1, ..., n and define: xij =  1 a path from node i to j 0 otherwise For i = 1, ..., n, let wij be the distance between the nodes i, j. To solve a TSP problem, we need to find min n∑ i=1 n∑ j ̸=i,j=1 wijxij 43 where xi,j ∈ 0, 1. The Vehicle Routing Problem (VRP) extends the Traveling Salesman Problem (TSP) and falls within the realm of NP-hard problems [119]. VRP aims to determine optimal routes for multiple salesmen (or UAS), minimizing the total distance traveled while visiting a given set of locations. It is difficult to solve VRP in large scale. Probabilistic techniques are often preferred for applications where computation resources are too limited for deterministic algorithms, as the probabilistic methods are more adaptive to deal with a large number of constraints such as capacity, waiting time, etc [104]. There are many flavors of probabilistic methods that have been proposed such as particle swarm optimization [120], genetic algorithm [121], Tabu search [122], Simulated annealing [123]. The state-of-the-art probabilistic algorithms reach solutions within less than 1% of the optimum for problems consisting of millions of nodes [124]. Another recent notable trend is the utilization of reinforcement learning (RL) algorithms to train policies for solving combinatorial optimization problems [125]. Although RL has yet to surpass state-of-the-art approaches, it presents a promising alternative for automating search processes autonomously [126, 127]. We use the solver from Google OR-tools for its ability of solving problems with additional constraints such as different start and stop points. Google OR-tools often employs local search algorithms like Simulated Annealing or Tabu Search to improve solutions generated by other methods. These algorithms iteratively explore the solution space to find better solutions [104]. The solver also uses Constraint Programming (CP) which can handle complex constraints and dependencies between variables. The specific algorithm or a combination of the algorithms used in the solver can often be configured by the user through parameters and options provided by the API to achieve the best results [104]. 44 (a) A 20-node problem (b) A 100-node problem Figure 3.6: The VRP for two agents was resolved using the solver within Google OR-tools Figure 3.6 displays the outcomes of solving a 20-node problem and a 100-node problem with two agents using Google OR-tools, resulting in satisfactory performance. The two agents start their routes from distinct locations and return to the ’x’ point. 3.2.3 Behavior Tree Modeling Behavior trees are hierarchical structures that represent a set of tasks and their relationships, allowing for modular design of robot behaviors [128]. It is implemented to facilitate the development of intelligent and adaptive robots by providing a flexible framework for decision-making [129]. It enables engineers and developers to design robot behaviors in a modular and reusable manner, making it easier to create, modify, and maintain complex robotic systems [130]. ROS2 behavior tree plays a crucial role in orchestrating the execution of tasks, handling sensor input, and making decisions based on the robot’s environment, contributing to the overall efficiency and reliability of robotic applications [131]. Figure 3.7 illustrates the behavior tree tailored for achieving fully autonomous search and revisit operations using MAVericks. The highlighted subtree in light blue 45 orchestrates the development of lawnmower patterns for all UAS in the swarm; each follows a unique pattern based on its hostname. Furthermore, the orange-highlighted subtree directs the aircraft to initiate takeoff and execute the mission. In addition, the yellow-highlighted subtree guides the aircraft in following the lawnmower pattern for the initial broad area survey. After completion, the purple-highlighted subtree instructs the aircraft to hover and await the completion of the initial survey by other aircraft in the swarm. Subsequently, the green-highlighted subtree invokes the VRP solver for mission replanning, generating new flight paths for UAS to revisit targets. Since the VRP solver requires a few seconds to generate paths, the magenta-highlighted subtree directs the aircraft to hover until the solver finishes. Finally, the red-highlighted subtree commands the aircraft to commence the revisiting phase. Figure 3.7: Behavior tree for search and revisit missions developed for MAVericks 46 3.3 Target Detection and Localization 3.3.1 Object Detection You Only Look Once (YOLO) is the state-of-the-art neural network for real-time object detection [132]. It has been widely used in applications such as autonomous driving, personnel recovery and rescue, and object retrieval and delivery [133]. The architecture unifies object detection and object classification for real-time performance [134]. A YOLOv5 small variant model is integrated into our aerial autonomy stack running on VOXL2. On the ground station, both the YOLOv5 medium variant and the YOLOv8 medium variant have been utilized, as they offer superior performance albeit with reduced inference speed. 3.3.2 Object Localization Object localization is the process of finding the precise geographic location of a specific target. Existing techniques leveraging RGB-D cameras or LiDAR are suitable for vehicles that can carry a large payload. For UAS aiming to geolocate ground targets, one simple method involves employing a single camera to initially gauge the distance between the UAV and the target using a pinhole camera model. Subsequently, the target’s position is triangulated based on the camera bearing, angle, and the UAS GPS coordinates [135]. Based on the center of the detection bounding box and the camera intrinsic matrix, the distance of the object from the camera can be inferred from the GPS coordinates and altitude of the UAS and the ground plane. The algorithm is a relative cost-effective alternative to stereo camera and LiDAR to estimate the distance of an object of interest. After the distance from the 47 UAS to an object is found, a series of transformations are used to calculate the object position in the world-fixed frame [135], i.e., Tw o = Tw b T b cT c o where Tw o is the transformation of the object position relative to the world frame that is used for globally-consistent representations of distances; Tc o is the transformation of an object position relative to the camera frame; Tb c is the transformation from the camera frame to the vehicle body frame; and Tw b is the transformation of the vehicle body relative to the world frame. 3.3.3 Object Detection Clustering A limitation of object localization with a single camera is the potential for significant errors, which may arise from factors such as rapid camera movements, GPS inaccuracies, and aircraft vibrations. We operate under the assumption that the localization estimates of a particular object adhere to a Gaussian distribution. By employing Gaussian Mixture Models (GMM), we identify the centroids of these Gaussians, representing the most probable locations of the objects. Before running GMM, we must establish the number of centroids, which serves as a hyperparameter for the algorithm. The Silhouette score is a metric used to determine the optimal number of centroids in clustering algorithms [136]. The Silhouette measures the quality of clustering by calculating how similar an object is to its own cluster compared to other clusters [136]: s = b− a max(a, b) where a represents the average distance between the data point and all other points within the same cluster. b represents the average distance between the data point and all points in the nearest 48 neighboring cluster. We calculate the average of all individual Silhouette scores to derive the overall Silhouette score for a specified number of centroids. Subsequently, we iterate through all potential numbers of centroids (e.g., 10) to identify the optimal number associated with the highest score. The Silhouette score spans from -1 to 1, where a score of 1 signifies well-clustered data points, while a score of -1 indicates misalignments. A regularization term can also be used to prevent overfitting: smod = s+ γ · n (3.1) where γ represents the regularization term, and n signifies the number of clusters. When γ > 0, the Silhouette score faces a penalty for opting for a smaller cluster count. Conversely, when γ < 0, the Silhouette score experiences a penalty for favoring a larger number of clusters. 3.4 Advanced Perception Models Our autonomy stack has incorporated advanced perception algorithms such as action recognition and person re-identification developed by other ArtIAMAS researchers into MAVericks. The results haven been demonstrated in the ArtIAMAS 2023 field experiments. 3.4.1 Action Recognition Deep learning has been widely used in action recognition. The majority of the work focus on utilizing high-quality videos taken on the ground. When applying these algorithms directly to videos captured by UAS, a significant decrease in accuracy is observed due to factors such as low resolution of images and camera movement. Auto zoom and temporal reasoning (AZTR) is 49 one the state of the art algorithms for action recognition [137–139] The algorithm has attained a 95% top-1 accuracy on the Drone Action dataset, which captures action recognition data via UAS [140]. Our autonomy stack has been enhanced by integrating AZTR running on UAS. To demonstrate its effectiveness, the action recognition model can detect three crucial actions: halt, attention, and follow me, as defined in the Robot Control Gestures (RoCoG-v2) dataset [141, 142]. 3.4.2 Person Re-identification The goal of person re-identification is to match a person captured in one camera view, referred to as a query image or probe, with the same person appearing in other camera views, or the gallery or reference images. The task involves two main stages. Initially, a gallery of individuals is built using feature embeddings, which refer to transformation of raw input data (features) into a lower-dimensional vector space. The gallery images are prepared and uploaded to the UAS before a mission is started. Subsequently, a query image is utilized to compare with the gallery images. An unsupervised approach or a clustering method is utilized to find the best-matched gallery image. The algorithm integrated into our autonomy stack was initially created using PyTorch and CUDA [143]. Afterwards, the code underwent refactoring to ensure compatibility with our autonomy stack and enable execution on VOXL2. As a proof of concept, the person re-identification system stores approximately 32 images from six manikins and human actors, as shown in Figure 3.8. 50 Figure 3.8: Manikins and human actors for Re-ID running on VOXL2 3.5 Ground Control Software This section introduces two flavors of custom built ground control software: one designed for outdoor search and revsit missions and the other tailored for 3D mapping missions. Both variants of the user interface are created using React [98] and Spring Boot [99]. 3.5.1 Outdoor Variant Born out of 2021 First Responde