ABSTRACT

Title of Dissertation: APPLIED AERIAL ROBOTICS FOR LONG RANGE
AUTONOMY AND ADVANCED PERCEPTION

Wei Cui
Doctor of Philosophy, 2024

Dissertation Directed by: Professor Derek A. Paley
Department of Aerospace Engineering

This dissertation addresses the challenges of conducting autonomous long-distance operations

in settings where communication is restricted or unavailable. It involves the development of

aerial autonomy software, ground station user interface, and simulation tools. Field experiments

are conducted to assess the real-world performance and scalability of the developed autonomous

multi-vehicle systems.

A search and revisit framework involving multiple UAS engaged in expansive area exploration

has been developed. By employing the ARL MAVericks autonomy stack, we have devised three

system designs with improving levels of autonomy. This approach is effective in developing

autonomous system capabilities for extended-range missions, enhancing effectiveness in reconnaissance,

search, and rescue missions.

Furthermore, the dissertation introduces an innovative application of enhanced target detection

and localization techniques tailored specifically for small UAS deployment. Neural network fine-

tuning and AprilTag detector selection are carefully conducted. Augmented by a meticulously


designed workflow for performance evaluation and validation, our approach aims to improve the

precision of target detection and localization using a single RGB camera module.

Additionally, the dissertation presents the implementation of a specialized ground control

user interface. Functioning as a centralized command center, the user interface facilitates real-

time monitoring and coordination of heterogeneous aerial and ground robotic platforms engaged

in collaborative search missions. By streamlining air-ground coordination and human-robot

interaction, the custom user interface optimizes the collective capabilities of diverse aerial and

ground robotic platforms, enhancing overall mission effectiveness. The experimental results from

multi-vehicle autonomous search missions, evaluating centralized and decentralized control in

beyond visual line of sight scenarios, are presented, proving the efficacy of the search and revisit

framework operating in real-world scenarios.

Finally, the dissertation covers the design and implementation of a resilient network link

tailored for robotic platforms operating in environments with limited bandwidth. This essential

infrastructure enhancement is devised to overcome communication constraints, ensuring reliable

data exchange, and strengthening the resilience of autonomous systems in bandwidth-limited

environments.


APPLIED AERIAL ROBOTICS FOR LONG RANGE AUTONOMY AND
ADVANCED PERCEPTION

by

Wei Cui

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2024

Advisory Committee:
Dr. Derek A. Paley, Chair/Advisor
Dr. Shuvra S. Bhattacharyya
Dr. Joseph K. Conroy
Dr. Michael W. Otte
Dr. Mumu Xu


© Copyright by
Wei Cui

2024


Dedication

To my beloved Athena and Annastasia,

Every day with you has reminded me of the importance of curiosity, persistence, and

passion. Your smiles and hugs gave me the strength to continue. I dedicate this work to you,

with the hope that you always pursue your dreams and believe in your abilities. Thank you for

being my sunshine and my motivation.

ii


Acknowledgments

I’d like to express my deepest gratitude to my esteemed adviser, Dr. Paley, for his mentorship

throughout my time at the University of Maryland. His unwavering guidance, insightful advice,

and steadfast support have been instrumental in my academic and personal growth. Without his

dedication and encouragement, I would not have achieved the successes and milestones that I

have. His mentorship has profoundly impacted my life, for which I am forever thankful and

forever changed.

Thank you to my committee members, Dr. Bhattacharyya, Dr. Conroy, Dr. Otte, and Dr.

Xu, for taking the time from your busy schedules to support me. I am deeply grateful for the

thoughtful feedback, guidance, and advice you have provided. I am honored to have had the

opportunity to learn from each of you.

I would like to express my gratitude to Dr. Stephen Nogar and Benjamin Linne from the

Army Research Lab, as well as Mike Rawding, Isaac Carlson, Nikhil Deshmukh, Michael Smith,

and Joel Witman from Survice Engineering. Their collective efforts in the development of both

the hardware and software components of this project have been invaluable. Additionally, their

provision of essential resources, unwavering support, and insightful guidance has been crucial to

the success of this work.

My appreciation also extends to Dr. Dinesh Manocha, Dr. Shuvra Bhattacharyya, Dr. Ben

Riggan, and their students for their invaluable contributions. Their development of advanced

iii


perception algorithms played a crucial role in enhancing our autonomy stack.

I appreciate my labmates Animesh Shastry, Zach Bortoff, Sydrak Abdi, Atharv Marathe,

Qingwen Wei, Srijal Poojari, Rose Gebhardt for their invaluable contributions to this project.

I am grateful for the stimulating environment we have created together and for the friendships

that have formed as a result. Thank you all for your dedication and for making this journey a

rewarding experience.

Additionally, I appreciate Josh Gaus, Chris Titus, Grant Williams, McKenzie Turpin, and

Darren Robey from the UROC for their collaborative efforts. Our teamwork in conducting

numerous field experiments and collecting valuable data was instrumental to the development

of this dissertation.

Finally, I would like to acknowledge the funding provided by ArtIAMAS, the Army Cooperative

Agreement with the University of Maryland, which made this project possible. This support was

instrumental in enabling the research and development carried out in this dissertation. I would

also like to extend my sincere gratitude to Colonel Luke Sauter for granting me the opportunity

to pursue my PhD at UMD. His support and encouragement have been invaluable throughout my

academic journey.

iv


Table of Contents

Dedication ii

Acknowledgements iii

Table of Contents v

List of Tables viii

List of Figures ix

List of Abbreviations xiii

Chapter 1: Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Relation to the State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Technical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.1 Novel Collaborative Search Framework with ARL Aerial Autonomy Stack 12
1.4.2 Improved Object Detection and Localization with Aerial Images . . . . . 13
1.4.3 Specialized Ground Control User Interfaces . . . . . . . . . . . . . . . . 13
1.4.4 Efficient Data Link for Aerial Operations in Bandwidth-limited Conditions 14

1.5 Outline of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Chapter 2: Experiment Testbeds 16
2.1 Autonomy Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 MAVSDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Pixhawk Project 4 Autopilot . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Robot Operating System 2 . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 MAVericks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Simulation and Visualization Software . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 QGroundControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Custom User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Uncrewed Aerial Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Radios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Uncrewed Aerial Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 Experimental Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

v


2.4.1 UMD Fearless Flight Facility . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.2 UROC Raley Farm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 3: Autonomy Stack and Software Design 37
3.1 Problem Statement and Solution Designs . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Mission Planning and Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.1 Broad Area Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.2 Target Revisiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Behavior Tree Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Target Detection and Localization . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Object Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Object Detection Clustering . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Advanced Perception Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.2 Person Re-identification . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5 Ground Control Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 Outdoor Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.2 3D Mapping Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Chapter 4: Object Detection and Localization using Aerial Images 55
4.1 Training a Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . 55
4.2 Loss Functions and Performance Metrics . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Custom Trained Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3.1 ArtIAMAS Project 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 2023 NIST First Responder UAS 3D Mapping Challenge . . . . . . . . . 63
4.3.3 2021 NIST First Responder UAS FastFind Challenge . . . . . . . . . . . 67

4.4 Characterizing Detection and Localization Performance . . . . . . . . . . . . . . 69
4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.5 AprilTag Detection and Localization using Aerial Images . . . . . . . . . . . . . 75

Chapter 5: Multi-vehicle Autonomous Search Missions 77
5.1 System Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.3.1 Multi-vehicle Coordination with Operator Supervision . . . . . . . . . . 83
5.3.2 Autonomous Search without Operator Supervision . . . . . . . . . . . . 87
5.3.3 Autonomous Search in Beyond Visual Line of Sight . . . . . . . . . . . . 89

5.4 Air-ground Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Chapter 6: Operating in Bandwidth-limited Environments 94
6.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2 Communication Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2.1 Test Lane Construction and Mapping Quality Validation . . . . . . . . . 99

vi


6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Chapter 7: Conclusion 110
7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Appendix A: Camera Calibration 113

Bibliography 116

vii


List of Tables

1.1 Surveys of UAS challenges experimentally addressed. Multiple perception algorithms
include YOLO, target localization, person re-identification, and action recognition 9

4.1 The VOXL2 CPU load test at various camera resolutions and frame rates for
Grazer A-002 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 The CPU load test is performed using various YOLO versions and input image
sizes for Grazer A-002. The high-resolution camera is configured to 720p at 1
fps. An ’x’ indicates that the camera server has crashed. . . . . . . . . . . . . . . 63

4.3 Performance for human detection using aerial images is improved incrementally. 64
4.4 Comparing the performance of fine-tuned YOLOv8m and YOLOv8n on a custom

dataset for detecting humans and Landolt ’C’ rings . . . . . . . . . . . . . . . . 66
4.5 Process speed per image using YOLOv8m and YOLOv8n variants on the Gigabyte

AERO groud stateion laptop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Detection probability and accuracy at various altitudes and a camera angle of 15

degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7 Detection probability and accuracy at various altitudes and a camera angle of 60

degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.8 Detection probability of Aruco Marker Detector with Grazer. The resolution is

1280 x 768px (720p) and the FOV is 120 degrees . . . . . . . . . . . . . . . . . 76
4.9 Detection probability of AprilTag 3 Detector with Grazer. The resolution is 1280

x 768px (720p) and the FOV is 120 degrees . . . . . . . . . . . . . . . . . . . . 76

5.1 Performance metrics of the end-to-end simulation . . . . . . . . . . . . . . . . . 80
5.2 Performance metrics of the end-to-end experiment . . . . . . . . . . . . . . . . . 85
5.3 Performance metrics of the autonomous end-to-end experiment . . . . . . . . . . 89
5.4 Performance metrics of the end-to-end experiment . . . . . . . . . . . . . . . . . 90
5.5 Performance for manikin detection using aerial images is improved with higher

resolution images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.1 Performance comparison of TCP/IP Socket and ROS . . . . . . . . . . . . . . . 96
6.2 Size of messages per topic for 3D mapping. Raw images undergo M-JPEG

compression for size reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Comparison of ten ground truth and measured distances of the fiducials. The ten

distances are defined in Figure 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . 102

viii


List of Figures

2.1 Simulated manikins and AprilTags are added to the 3D scene to facilitate testing
algorithms in Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Two cars are added to the 3D scene to enable testing algorithms detecting large
targets with the pre-trained detection models . . . . . . . . . . . . . . . . . . . . 20

2.3 Some technologies used in custom user interface . . . . . . . . . . . . . . . . . . 22
2.4 Wireshark is employed to validate the QoS in ROS2, monitor network packets,

and assess bandwidth utilization within a mesh network consisting of three UAS
and one groundstation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Free-flowing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 (left) Doodle embedded radio and (right) Doodle mini radio (Credit: Doodle Labs) 26
2.7 Software configuration page for the Doodle radios. Channel is set to 51 and

Bandwidth is set to 15MHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Sub-GHz RC transmitters used in this project : (a) 915MHz Dragon Link mounted

on 2.4 GHz Jeti Transmitter, and (b) 900 MHz Jeti Transmitter . . . . . . . . . . 27
2.9 Modal AI m500 with FTDI adapter . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.10 Wire diagram of Modal AI m500 based quadcopter tailored for running MAVericks.

The FTDI adapter enables RTPS communication between VOXL Flight onboard
computer and the flight controller. . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.11 Wire diagram of the RB5 with Pixracer flight controller tailored for running
MAVericks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.12 Jeti transmitter with Dragon Link and retrofitted RB5 platforms for MAVericks
autonomy stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.13 Custom UAS platform by Survice Engineering - ARL Grazer and system diagram
developed for MAVericks autonomy stack . . . . . . . . . . . . . . . . . . . . . 32

2.14 Doodle mini is connected to VOXL2 through the USB port open by the Modal
AI USB 3.0 UART Add-on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.15 Wiring diagram of mRo Control Zero H7 and VOXL2 via USB 3.0 Add-on and
M0076 Interposer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.16 Wiring diagram for UROC Chimera tailored for running MAVericks . . . . . . . 34
2.17 Fearless Flight Facility (F3) near the University of Maryland College Park Campus 36
2.18 Raley Farm flight faciltiy near the UMD UAS Research and Operations Center

(UROC) in Southern Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

ix


3.1 Algorithm for the search and revisit: (left) preliminary broad area survey detecting
targets on the ground and (right) subsequent target revisits involving multiple
autonomous UAS and perception . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Configuration facilitating centralized control and PX4 flight stack . . . . . . . . . 39
3.3 Full-autonomous configuration with PX4 flight stack . . . . . . . . . . . . . . . 39
3.4 Full-MAVericks configuration with MAVericks navigation stack . . . . . . . . . . 40
3.5 K-means is utilized to split a large polygon into an arbitrary number of small

polygons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 The VRP for two agents was resolved using the solver within Google OR-tools . 45
3.7 Behavior tree for search and revisit missions developed for MAVericks . . . . . . 46
3.8 Manikins and human actors for Re-ID running on VOXL2 . . . . . . . . . . . . 51
3.9 The outdoor variant of Fiona, crafted for search and revisit missions with an

overlay of the Unity window on the left side. . . . . . . . . . . . . . . . . . . . . 52
3.10 Fiona also monitors the location of a smart binocular (black binocular icon), with

the search area created by a sequence of waypoints (yellow stars) generated by
the binocular’s LiDAR system . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.11 The custom UI displays a 3D mesh model generated by RTAB-Map . . . . . . . 54
3.12 Comparison of mesh model generated with and without AliceVision support. . . 54

4.1 Workflow to train a custom neural network running on VOXL2 . . . . . . . . . . 56
4.2 Sample images of the training dataset. The manikins are either standing or lying

down viewed from 20m to 30m. . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Sample validation images where the manikins are successfully detected using the

custom-trained YOLOv5 small detection model . . . . . . . . . . . . . . . . . . 61
4.4 Training Progress of YOLO Model: A comprehensive view of training and validation

metrics including object loss, class loss, and box loss for both training and validation
sets. Additionally, training precision and recall, along with mean average precision
(mAP) scores at 0.5 and 0.95 IOU thresholds, provide insights into model performance
across various aspects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5 Sample training images for Landolt ”C” rings . . . . . . . . . . . . . . . . . . . 64
4.6 Training and validation performance metrics for the fine-tuned YOLOv8n capable

of detecting persons and Landolt ”C” rings. . . . . . . . . . . . . . . . . . . . . 65
4.7 Detected humans and Landolt ”C” rings in the live test and evaluation phase of

NIST First Responder UAS 3D Mapping Challenge in Salina, KS, April 2024 . . 66
4.8 Samples images in the manually developed dataset. Labels are manually added

through Roboflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.9 Training and validation performance metrics for the fine-tuned YOLOv5s capable

of detecting humans in densely forested areas using thermal images. . . . . . . . 68
4.10 The improved detection model is capable of automatically identifying humans

beneath the canopy in thermal images. Extended observation of the thermal
images without aids of the model may lead to eye strain. . . . . . . . . . . . . . 69

4.11 A camera angle of 15◦ positions the Grazer camera almost facing forward, whereas
a 60◦ camera angle directs the camera more vertically downward. . . . . . . . . . 70

4.12 The quantification of object localization errors involves a series of postprocessing
steps applied to rosbags collected in the field . . . . . . . . . . . . . . . . . . . . 71

x


4.13 Impact of camera angles on object localization errors . . . . . . . . . . . . . . . 72
4.14 Inaccuracies of placing bound boxes occur when the object is large and close to

the UAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.15 UAS poses (blue chevrons) and estimated object locations (white circles) . . . . . 74
4.16 Observing AprilTag (36h11) from different distances from Grazer. The camera is

configured to 1280 x 768 px (720p) with a 120-deg FOV lens. The high-resolution
images are transformed into grayscale to facilitate AprilTag detection. . . . . . . 76

5.1 Digital twin of flight test facility in Southern Maryland built in Unity for high-
fidelity simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Initial survey paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Revisiting the objects of interest with hexagonal orbits. . . . . . . . . . . . . . . 80
5.4 Simulated object images are available in Fiona’s gallery . . . . . . . . . . . . . 81
5.5 (a) UAS is selected (highlighted in yellow) and then redirected to the red pin, and

(b) the UAS path (blue) is adjusted after manually redirecting the UAS. . . . . . 82
5.6 Testing beyond visual line of sight (BVLOS) operations: (top left) a UAV is flying

over a lightly wooded area, and (satellite map) UAV flight paths and positions are
tracked in real-time using a custom UI . . . . . . . . . . . . . . . . . . . . . . . 83

5.7 Flight paths for the initial broad area survey with six ground targets (black pins).
The initial estimated target locations are depicted as yellow pins. . . . . . . . . . 84

5.8 Modified flight paths for revisiting the objects of interest . . . . . . . . . . . . . 85
5.9 True positives - two objects of interest are correctly identified by the detection

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.10 False positive - the detection model misclassifies a lamp pole as a person. . . . . 86
5.11 Results of advanced perception are transmitted back to the ground station. . . . . 86
5.12 Flight paths for initial broad survey using the full autonomous configuration. . . 87
5.13 Manikins detected in the autonomous search experiment . . . . . . . . . . . . . 88
5.14 Flight paths for the revisiting phase are generated onboard. . . . . . . . . . . . . 88
5.15 Flight paths for initial broad survey. . . . . . . . . . . . . . . . . . . . . . . . . 90
5.16 Manikins detected in the BVLOS experiment . . . . . . . . . . . . . . . . . . . 91
5.17 Manikins detected in the BVLOS experiment . . . . . . . . . . . . . . . . . . . 91
5.18 Manikin detected by the UGV . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Communication configuration for UAS operating in bandwidth-limited environments 99
6.2 (a) Dimensions of the ficual and (b) placement of the five fiducials in a test lane

(Credit: NIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Test lane with five fiducials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Fiducial top measurement. Ten distances are measured to quantify the distortion

of the 3D map (Credit: NIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 Real-time 3D point cloud for the test lane, along with RGB and depth images

reconstructed on the ground station . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.6 Generated 3D mesh model with the measurement tool enabled in the custom UI.

The measured distance reads 1.5 meters or 59 inches . . . . . . . . . . . . . . . 104
6.7 Real-time 3D mapping, along with rectified RGB and depth images in RViz,

during a flight test at UMD Engineering Annex . . . . . . . . . . . . . . . . . . 106

xi


6.8 The 3D mesh model of UMD Engineering Annex generated by RTAB-Map in a
live flight test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.9 Close-up perspective of the mesh model featuring targets: individual (highlighted
in red), ”C” ring (displayed in magenta), and manual (depicted in blue). . . . . . 107

6.10 3D mesh map generated by RTAB-Map running on iPhone 14 Pro . . . . . . . . 107
6.11 A gallery of detected humans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.12 A gallery of a detected Landolt ”C” ring that indicates hazardous locations . . . . 108

A.1 A 8 x 11 checkerboard used for calibration . . . . . . . . . . . . . . . . . . . . . 114
A.2 Graphical user interface (GUI) for camera calibration. The enabled Save button

indicates that the calibration is complete. . . . . . . . . . . . . . . . . . . . . . 114
A.3 Object detection is executed on a rectified image . . . . . . . . . . . . . . . . . . 115

xii


List of Abbreviations

AI Artificial Intelligence
API Application Programming Interface
AMAV Autonomous Micro Air Vehicle
ArtIAMAS AI and Autonomy for Multi-Agent Systems
ARL Army Research Laboratory
BVLOS Beyond Visual Line of Sight
CAS Close Air Support
CNN Convolutional Neural Network
COCO Common Objects in Context
COTS Commercial of the Shelf
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
DARPA Defense Advanced Research Projects Agency
DCIST Distributed, Collaborative, Intelligent Systems and Technology
DDS Data Distribution Service
DoD Departement of Defense
F3 UMD Fearless Flight Facility
FAA Federal Aviation Administration
FOV Field of View
FTDI Future Technology Devices International
GUI Graphical User Interface
HTTP Hypertext Transfer Protocol
HTTPS Hypertext Transfer Protocol Secure
IoT Internet of Things
IoU Intersection over Union
IMU Inertial Measurement Unit
ISR intelligence, surveillance, and reconnaissance
JPA Java Persistence API
LiDAR Light Detection and Ranging
LTS Long Term Support
mAP mean Average Precision
MAVLink Micro Air Vehicle Link
MAVSDK Micro Air Vehicle Software Development Kit
MIPI Mobile Industry Processor Interface

xiii


MPA Modal Pipe Architecture
NASA National Aeronautics and Space Administration
NDAA National Defense Authorization Act
NIST National Institute of Standards and Technology
OR Operations Research
PPM Pulse Position Modulation
PWM Pulse Width Modulation
QoS Quality of Service
QML Qt Modeling Language
R2C2 Robotics Research Collaborative Campus
RC Radio Control
Re-ID Re-Identification
RGB Red, Green, Blue
RoCoG Robot Control Gestures
ROS Robotic Operating System
RRT Rapidly-exploring Random Trees
RTAB-Map Real-Time Appearance-Based Mapping
RTPS Real-Time Publish-Subscribe
SBUS Serial Bus
SDK Software Development Kit
SITL Software in the Loop
SSD Single Shot Multibox Detector
TATRC Telemedicine & Advanced Technology Research Center
TSP Traveling Salesman Problem
UAV Uncrewed Aerial Vehicle
UAS Uncrewed Aerial System
UAS IPP UAS Integration Pilot Program
UGV Uncrewed Ground Vehicle
UI User Interface
UMD University of Maryland
uORBs micro Object Request Brokers
UROC UMD UAS Research and Operations Center
UTM UAS Traffic Management
VIO Visual-Inertial Odometry
VRP Vehicle Routing Problem
YOLO You Only Look Once

xiv


Chapter 1: Introduction

In recent years, the field of uncrewed aerial systems (UAS) has offered numerous opportunities

and capabilities, ranging from search and rescue, precision agriculture, and disaster relief efforts

[1]. One particular advancement within this field is the concept of a multi-UAS team, where

multiple uncrewed aircraft collaborate and coordinate their actions to achieve complex objectives

[2]. An illustrative scenario is a long-distance search operation where several UAS are deployed

to search a specific area. However, conducting long-distance operations with UAS in communication-

limited environments presents notable challenges and considerations within the domain of aerial

autonomy. The ability to establish and maintain communication links is paramount for the safe

and effective operation of UAS, particularly when operating beyond the visual line of sight [3].

While UAS may be able to maintain short-range links between nearby vehicles, long distances

can introduce signal degradation, latency, and vulnerability to interference [4]. In the absence

of reliable long-distance communication, UAS may encounter difficulties in real-time decision

making by a groundstation operator, compromising their overall flight safety and efficiency [5].

1.1 Motivation

The confluence of artificial intelligence (AI) and drone technologies has presented a large

number of opportunities and challenges across various domains in recent years [6]. The integration

1


of AI and drone technologies revolutionizes the way small military units operate in intricate and

challenging environments [7]. Over the past few years, diligent efforts have guided the creation

of a comprehensive vignette that unfolds cutting-edge advancements in military capabilities.

In one envisioned scenario, the focal point revolves around leveraging AI and drone technologies

to provide real-time situational awareness to small units navigating complex terrains [8]. The

cornerstone of this vision involves the deployment of a swarm of UAS designed to execute broad

area reconnaissance with a primary objective of detecting targets on the ground. Collective

intelligence is then relayed to the soldiers on the ground, offering them an improved level of

situational awareness and the ability to make informed decisions promptly. Soldiers on the

ground also possess the capability to retask individual UAS assets, directing them to revisit

specific objects of interest. This ability to adapt and modify the mission in real-time empowers the

military units with a flexible and responsive approach to emerging situations in communication-

limited environments [9].

Futhermore, the confluence of AI and UAS technologies in search and rescue operations

is transformative for first responders. UAS equipped with high-resolution cameras and thermal

imaging sensors, can efficiently survey large terrains that would be challenging and time-consuming

for ground-based teams to cover [10]. The ability to access these aerial views enhances the overall

efficiency of the search, as potential targets or individuals in distress can be spotted more rapidly

[11]. Deep learning algorithms, designed for object detection and object localization, empower

UAS to quickly identify and localize humans on the ground, even in challenging environments

where visibility may be compromised [12]. The capability to swiftly cover vast search areas,

combined with AI’s proficiency in swift and precise detection, creates a powerful synergy that

improves the effectiveness of search and rescue missions within time constraints [13].

2


1.2 Relation to the State of the Art

In this section, we delve in the current state of the art in several areas related to UAS

technology. We explore the advances and challenges with UAS swarm technologies, beyond

visual line of sight (BVLOS) operations, communication-limited operations, human-robot interaction,

and air-ground coordination.

UAS have played an important role in military reconnaissance, surveillance, air interdiction,

and close air support (CAS) [14]. UAS also have a wide range of applications in forest fire

management [15], civil safety and security [16], agricultural remote sensing [17], weather assessment

[18], urban traffic monitoring [19], network relays [20], and disaster relief in Hurricane Katrina,

Typhoon Morakot, Tohoku Earthquake, and Haiti earthquake [21, 22].

UAS swarm technology is a paradigm shift in UAS operations, where multiple aircraft

collaborate autonomously to achieve common objectives [23]. This approach offers advantages

such as redundancy, survivability, scalability, low unit cost, and enhanced mission capabilities

compared to a single UAS [24]. Swarm behavior is inspired by natural phenomena like bird

flocking and ant colonies, facilitating adaptive and cooperative operations in dynamic environments

[25]. Research in UAS swarm technology has emphasized the development of algorithms for

swarm formation, task allocation, and cooperative decision-making [26].

Centralized and decentralized control mechanisms present contrasting strategies for orchestrating

UAS swarms [27]. In centralized control, directing each UAS’s actions in an entire swarm

and coordinating their movements towards a common objective is conducted through a central

location. This approach offers advantages such as optimized path planning, streamlined coordination,

and enhanced performance monitoring [28]. In 2015, Intel flew 100 drones simultaneously

3


dancing in unison over Germany, and 500 a year later at the Farnborough Air Show [29]. In 2018

PyeongChang Olympic Winter Games, viewers around the world witnessed history coming to

life as 1,218 Intel Shooting Star drones decorated the night sky [30]. Research and development

of swarm technology has been growing rapidly over the past few years [31]. However, centralized

control suffers from a single point of failure, particularly in complex environments where communication

is limited or denied [32]. Decentralized control, on the other hand, distributes decision-making

authority among individual UAS, allowing each unit to autonomously adapt its behavior based

on observations and interactions within its local environment [33]. This approach improves

scalability, redundancy, adaptability to changing environments, and survivability to a single point

of failure where communication is limited or denied, making it well-suited for scenarios where

fault tolerance and distributed decision-making are critical. Advances in distributed decision-

making algorithms have empowered UAS swarms to self-organize without centralized control

[34].

BVLOS operations enable UAS to operate beyond the operator’s direct line of sight, expanding

the range and scope of missions. Recent technological advances have enabled safe and reliable

BVLOS operations [35]. Systems designed for sensing and avoidance, incorporating radar,

LiDAR, and computer vision, empower UAS to autonomously detect and navigate around obstacles

autonomously [36]. Additionally, technologies for remote identification have been developed,

facilitating aerial situation awareness. Long-distance communication systems, including satellite

links and cellular networks, enable continuous communication between ground control stations

and UAS over extended distances [37]. Furthermore, advancements in autonomous navigation

and control algorithms enable UAS to navigate in complex environments with little human intervention.

Safety remains the most paramount in BVLOS operations, aiming to reduce the risks associated

4


with potential collisions, loss of communication, and loss of properties [38]. One notable initiative

addressing challenges associated with BVLOS operations is the DARPA OFFSET program,

which seeks to enable autonomous swarm BVLOS behaviors through innovative technologies

[39]. Despite the technological advancements, BLVOS operations still face regulatory constraints

and public acceptance challenges [40]. The Federal Aviation Administration (FAA) has established

stringent guidelines for obtaining BVLOS flight permissions, including rigorous risk assessments

and safety protocols [41]. Operating BVLOS in U.S. airspace mandates obtaining a waiver from

the FAA [42]. National Aeronautics and Space Administration (NASA) UAS Traffic Management

(UTM) Program focuses on integration of small UAS into the national space system for applications

like package delivery beyond visual line of sight [43]. The FAA UAS IPP program test and

evaluate a wide range of UAS applications such as package delivery, infrastructure inspection,

and precision agriculture, emergency management beyond visual line of sight [44].

Efficient collaboration between UAS and ground-based vehicles can be beneficial for conducting

missions in complex environments where UAS alone may be limited by airspeed, altitude, network

bandwidth, and payload capacity [45]. ARL Distributed, Collaborative, Intelligent Systems and

Technology (DCIST) program primarily focuses on developing distributed and collaborative

intelligent systems for complex military tasks, including air-ground coordination and human-

robot interaction operations [46]. Common communication protocols and interfaces have been

developed to empower facilitate integration and interoperability between UAS and ground vehicles

[47]. Collaborative mission planning enables real-time adaptation to dynamic environments

and mission objectives, optimizing resource allocation and task execution [48]. Advances in

human-machine interaction focus on designing intuitive user interfaces that enhance situational

awareness and decision making for both UAS and ground vehicles [49]. Despite the progress

5


has been made in improving air-ground coordination capabilities, challenges such as bandwidth

constraints and heterogeneous platform integration still persist, requiring continuous research and

development efforts [50].

Operating UAS in communication-limited environments such as remote, indoor, or hostile

environments presents a unique challenge that requires reliable and robust communication solution

[51]. Ad-hoc networking enables UAS to form a self-contained local area network, enabling

data exchange in communication-constrained environments [52]. Edge computing and onboard

processing enable UAS to perform real-time data analysis and decision-making onboard, eliminating

reliance on continuous communication with the ground station [53]. Optimized network protocol

has been developed to support communications, where low bandwidth, high latency, and intermittent

connectivity is inevitable, ensuring data transmission under communication-limited environments

[54]. Communication-limited operations remain challenging due to limited communication range

and bandwidth [55]. Raven small UAS Program focuses on using small UAS for Army and Air

Force intelligence, surveillance, and reconnaissance (ISR) missions in communication-limited

environments [56]. Exploring novel communication paradigms developing resilient communication

architectures remain research directions in the near future.

Searching remains an age-old problem that continues to garner widespread attention. Path

planning for search missions generally fits into the following categories:

• Grid-based search: search areas are divided into grids and algorithms like A* or Dijkstra’s

are used to scan the entire area. This approach ensures systematic coverage [57].

• Random search: random points in search areas are picked and traversed. This approach

is useful when exhaustive grid based search is not feasible. Random search explores the

6


search area more effectively given limited resources [58].

• Graph-based search: graphs are created to represent the search space. The edges are lines

of sight or feasible routes, This approach is useful for scenarios with limited visibility or

limited feasible routes we can take [59].

• Potential fields: UAS navigate by following attractive and repulsive forces generated by

artificial potential fields, which can guide them towards targets while avoiding obstacles.

This approach is effective for dynamic environments and obstacle avoidance [60].

• Sampling-based techniques: Techniques like Rapidly-exploring Random Trees (RRT*) are

sampling based algorithms. They are efficient to explore large and unknown search areas.

These methods can quickly generate feasible paths while avoiding obstacles [61].

• Machine learning: deep reinforcement learning and other machine learning techniques can

be used to train UAVs to learn optimal navigation policies from existing data, improving

adaptability and performance in diverse search missions [62].

In a complete search mission where the entire search space must be covered, a UAS has

to visit every location. The worst-case scenario has a runtime of O(n). However, if the UAS

has prior knowledge of the targets, it can prioritize high-probability areas first [63]. Additionally,

the UAS can employ active search methods, dynamically adjusting the search area based on real-

time data [64, 65]. Reward functions can be used to optimize the search area, thereby reducing

the runtime if a complete search is not feasible [66].

RGB, infrared (IR) cameras, LiDAR, RF, acoustic sensors, and stereo cameras are among

the popular sensors utilized for target detection and localization [67]. LiDAR and radar are known

7


for their precision in long-range detection, although they come with the disadvantage of added

weight [68]. Conversely, acoustic sensors offer limited range. RGB, IR, and stereo cameras, on

the other hand, are lighter sensors, offering moderate range and accuracy [69]. We employ a

single RGB camera for small UAS to detect and localize objects.

Running advanced perception algorithms on UAS involves performing complex data processing

tasks, such as image processing and object detection, directly on edge devices [70]. This capability

facilitates onboard decision-making and autonomous missions, reducing latency and reliance

on high-bandwidth networks [71]. Handling intensive algorithms on edge devices in real-time

presents several challenges: Edge devices typically have less processing power, memory, and

storage compared to centralized servers, making it difficult to run complex algorithms efficiently

[72]. High computational loads can generate significant heat, which can be challenging to

dissipate in small, enclosed edge devices [73]. Advanced perception algorithms often need to

be optimized or refactored to fit the resource constraints of edge devices without compromising

performance [74]. Many UAS are battery-powered, requiring algorithms to be energy-efficient

to avoid rapid battery depletion. Keeping the software and algorithms on edge devices up to date

and secure requires robust maintenance strategies and efficient update mechanisms [75].

Since 2018, over 16,400 papers on Google Scholar include the keywords multi-agent,

UAV, and search. However, only 663 of these papers feature the key phrase flight test or field

experiment. Among these, 63 papers feature keyword UGV. Table 1.1 compares the papers that

have conducted field experiments involving multi-vehicle systems in search missions. Most of

these studies involve swarms of 2-3 UAVs or UGVs, utilize YOLO for object detection, and use

a 2.4 GHz data link. In addition to previously published work, our research introduces human-

robot interaction with smart binoculars and custom UI. Additionally, we present the experimental

8


Ref. Year #UAV #UGV
Human-
robot
Interaction

BVLOS
Ops

Data Link
Onboard
Perception

[76] 2018 1 1 N/A No N/A None

[77] 2020 1 1 Custom UI No
2.4 GHz and 5.8
GHz (Ubiquity)

YOLO

[78] 2021 2 0 N/A No
2.4 GHz and 5 GHz
WiFi

YOLO

[79] 2021 1 3
Smart
Phones

No
5G and 2.4 GHz
(DJI)

None

[80] 2021 2 2 Custom UI No 4G YOLO
[81] 2021 3 0 N/A No 2.4 GHz WiFi Color/Blob

[66] 2023 1 2 N/A No
2.4 GHz and 5.8
GHz (Little Hexy)

YOLO

[82] 2023 3 0 N/A No 2.4 GHz (DJI)
RCNN +
VGG16

[83] 2023 3 3 N/A No 5G CNN
[84] 2023 3 0 N/A No 2.4 GHz WiFi None
[85] 2024 1 1 N/A No 2.4 GHz (DJI) None

Our
work

2024 2 1

Smart
Binocular
and Custom
UI

Yes
2.4 GHz (Doodle)
and 900 MHz
(Microhard)

Mulitiple

Table 1.1: Surveys of UAS challenges experimentally addressed. Multiple perception algorithms
include YOLO, target localization, person re-identification, and action recognition

results of advanced perception algorithms such as object detection (YOLO), object localization,

person re-identification, and action recognition excuted onboard. Furthermore, we push the

boundaries of BVLOS operations and the implementation of an efficient data link using the

sub-GHz band. The sub-GHz band offers superior performance in covering long distances and

penetrating obstacles.

1.3 Technical Approach

In this dissertation, we aim to address several common challenges in UAS, such as UAS

swarming, human-robot interaction, air-ground coordination, long-range operations with limited

9


bandwidth, and the onboard execution of advanced perception algorithms. This comprehensive

approach focuses on advancing aerial autonomy, advanced perception executed on UAS, user

interfaces, and communication links. It builds on existing knowledge while introducing innovative

solutions to enhance operational capabilities in complex environments. We conducted flight tests

to validate the feasibility of our solutions under real-world conditions.

We developed a novel search and revisit framework for long-range operations using the

ARL aerial autonomy stack. This framework includes three distinct system configurations with

improving levels of autonomy. The first configuration employs a centralized control model in a

star topology, with UAS under operator supervision. Flight plans are created and uploaded using

MAVSDK, a library of APIs for UAS interaction. The advanced perception algorithms were

refactored for execution on the UAS, which has an onboard computer running Ubuntu 18 with

Python 3.6 and TensorFlow 2.7.

The second configuration operates without operator supervision; UAS autonomously generate

flight paths for the PX4 flight stack onboard. In this setup, robotic platforms communicate

through a mesh network, allowing UAS to maintain short-range links between vehicles even

if communication with the ground station is lost during long-range operations.

The third configuration further enhances autonomy by using the ARL navigation stack

instead of the PX4 flight stack. A search area is defined in a YAML file and uploaded to the

vehicles via a custom user interface. The navigation stack uses a custom-built behavior tree to

plan flight paths for both search and revisit phases. The UAS remain connected through a mesh

network, enabling long-range operations without operator input.

To address the limitations of existing public datasets for training neural networks running

on UAS, we created a dataset specifically tailored for high-altitude operations. Additionally,

10


we enhanced the stability of the UAS camera server, quadrupling the camera resolution and

preventing silent failures during missions, especially under high-resolution settings. This initiative

significantly improved the performance of our object detection model.

While existing object localization techniques using RGB-D cameras or LiDAR are suitable

for vehicles with large payloads, we developed an improved localization algorithm for small

UAS. This algorithm combines the camera pinhole model with Gaussian Mixture Models (GMM)

using a single RGB camera module. A simple camera pinhole model alone results in large

localization errors due to factors like fast camera movement, GPS inaccuracies, and timestamp

mismatches between the camera and UAS pose. By assuming that the localization estimates of

the same target follow a Gaussian distribution, with the center representing the most probable

target location, the use of GMM significantly reduces localization errors.

We also focused on enhancing user interaction with UAS through developing two specialized

ground control interfaces. The first interface allows operators to track UAS positions, flight paths,

and target locations in real-time, facilitating effective air-ground coordination and human-robot

interaction. This UI supports path planning and enhances situational awareness by providing

access to target images. The second interface is designed for search and rescue missions, allowing

first responders to view 3D maps, target locations, and detection images in bandwidth-limited

environments. By integrating AliceVision, an open-source photogrammetry framework [86], we

improved modeling quality and enhanced situational awareness for first responders. The software

is modular, allowing easy migration to different UAS platforms. The development process

follows a test-driven and agile approach, ensuring reliability and efficiency. By developing

automated tests alongside production code, we can quickly introduce new features to meet operational

requirements.

11


ROS is popular in robotics due to its modularity, open source nature, and large community

support, with one of its key features being its communication infrastructure. However, ROS

introduces overhead, impacting performance in real-time or latency-sensitive applications. In

bandwidth-limited conditions, we observed that the ground station fails to receive ROS messages

from UAS when bandwidth is less than 8 Mbps. To facilitate robotics in such conditions, we

use TCP/IP sockets for transmitting raw data to the ground station. The ground station then

reconstructs ROS messages using this data, enabling their use in ROS-based applications. Sockets

provide a fundamental building block in network communication, offering low-level control over

the communication process, data encoding, and transmission, operating with lower overhead than

higher-level frameworks like ROS. With this configuration, we can precisely control what to send

and how much data to send to the ground station in a network with less than 3 Mbps.

1.4 Contributions

The contributions of this dissertation tackle several common challenges faced by the UAS.

Many of these findings have been published [87,88] or demonstrated in the 2023 and 2021 NIST

UAS First Responder Challenges [89, 90], where our team achieved an overall 3rd place with

three out of five Best-In-Class awards and an overall 1st place with First Responder’s Choice

award.

1.4.1 Novel Collaborative Search Framework with ARL Aerial Autonomy Stack

We have developed and experimentally evaluated a novel search and revisit capability

using the ARL aerial autonomy stack, designed for long-range operations. This framework

12


encompasses the development of aerial autonomy software, simulation tools, and a ground control

user interface. Three system designs with improving levels of autonomy have been created. In

one long-range field experiment, our framework detected and localized 67% of targets 500 meters

away from the take-off point in under 10 minutes.

1.4.2 Improved Object Detection and Localization with Aerial Images

We created our own aerial dataset to retrain the object detection model running on the UAS

and improved the stability of the camera server. The overall approach improves precision from

0.586 to 0.731 and recall from 0.088 to 0.682.

Additionally, we quantified the localization errors and detection probability, developing an

improved object localization algorithm for small UAS using a combination of a camera pinhole

model and a clustering algorithm. This enhancement reduces localization errors from 4 meters to

an average of 2 meters and more than doubles the probability of detection.

Finally, we meticulously designed and implemented an AprilTag localization algorithm that

allows the UAS to correctly localize AprilTags from 6 meters away using 720p images captured

by a single RGB module.

1.4.3 Specialized Ground Control User Interfaces

Two specialized ground control user interfaces have been created to offer features not found

in existing open-source and proprietary ground control software.

The first interface allows for real-time tracking of UAS positions, flight paths, and target

locations. It also facilitates path planning and tackles the challenges of air-ground coordination

13


and human-robot interaction that other interfaces overlook. The second interface is designed

specifically for viewing 3D maps during indoor search missions in bandwidth-limited environments.

Both user interfaces display target locations and images along with a satellite map for outdoor

missions or a 3D map for indoor missions, greatly enhancing situational awareness for operators.

1.4.4 Efficient Data Link for Aerial Operations in Bandwidth-limited Conditions

We designed and implemented a reliable data link specifically for UAS operating in bandwidth-

limited environments within the sub-GHz band. Our approach employs TCP/IP sockets to transmit

raw data from the UAS to the ground station, where the ground station reconstructs ROS messages

from the socket data for use in ROS-based applications.

This solution allows UAS to create 3D maps and localize missing persons and hazards in

real time while maintaining bandwidth usage under 3 Mbps in the sub-GHz band. The data link is

robust and outperforms the 2.4 GHz RadioMaster radio control link, which typically has a range

of 1 to 2 kilometers.

1.5 Outline of Dissertation

In Chapter 1, motivation, relation to the state of the art, technical approach, and contributions

of the dissertation are presented.

In Chapter 2, experiment testbeds are introduced, encompassing open-source autonomy

software, simulation and visualization tools, unmanned aerial systems, and experimental flight

facilities.

Chapter 3 delves into our autonomy stack and software architecture concerning the search

14


and revisit framework. It commences with an exploration of the problem statement and solution

design, followed by discussions on mission planning and adaptation. Additionally, it covers

behavior tree modeling for achieving fully autonomous configuration, target detection and localization,

advanced perception models, and introduces the ground station software crafted specifically for

our needs.

Chapter 4 elaborates on the utilization of aerial imagery for object detection and localization.

It outlines the process of fine-tuning neural networks to suit various mission requirements, discusses

pertinent loss functions and performance metrics, delineates detection and localization performance

through a structured workflow, and examines choices for AprilTag detectors.

Chapter 5 presents simulation and experimental results from multi-vehicle autonomous

search missions, including outcomes from centralized and decentralized control, as well as BVLOS

scenarios.

Chapter 6 outlines the design and implementation of an efficient communication link tailored

for ROS-based robotics operating within bandwidth-constrained environments. It also showcases

the experimental results from a single UAS conducting 3D mapping and search missions in

bandwidth-limited environments.

Finally, Chapter 7 summarizes the dissertation’s contributions and discusses ongoing and

future endeavors.

15


Chapter 2: Experiment Testbeds

This chapter presents a comprehensive overview for implementing aerial autonomy within

robotic platforms and ground stations. It covers autonomy software implemented on both aerial

robotic platforms and ground stations. Additionally, it introduces simulation and visualization

software essential for evaluating and visually representing the autonomy software’s performance.

Furthermore, it explores the uncrewed aerial robotic systems we have developed and utilized

since the inception of the project, discussing their key onboard electronics, capabilities, and

applications. Lastly, it provides insights into experimental flight facilities, which serve as crucial

environments for empirically testing autonomous technologies in real-world environments.

2.1 Autonomy Software

This section introduces aerial autonomy software we leverage to orchestrate search and

revisit missions.

2.1.1 MAVSDK

The Micro Air Vehicle Link (MAVLink) Software Development Kit (MAVSDK) is an

open-source initiative facilitating the development of MAVLink applications for UAS. MAVLink

is a lightweight and efficient protocol designed for communication between systems in the robotics

16


and ground control stations.

MAVSDK, developed by the Dronecode Foundation, offers a comprehensive set of Application

Programming Interfaces (APIs) and libraries that enable developers to interact with various UAS

platforms using a standardized and user-friendly interface. It supports multiple programming

languages, including C++, Python, Java, and Swift, making it accessible to a wide range of

developers. MAVSDK aims to simplify the integration of autonomous flight capabilities, payload

control, and telemetry data retrieval, enabling innovation in the field of drone applications [91].

2.1.2 Pixhawk Project 4 Autopilot

Pixhawk Project 4 (PX4) Autopilot is open-source flight control software offering a flexible

and modular architecture capable of controlling a wide range of UAS, from quadcopters to fixed-

wing aircraft. Additionally, PX4 provides a variety of flight modes, including manual, stabilized,

altitude hold, position hold, mode, and autonomous offboard mode. Furthermore, PX4 supports

various protocols for communication and control, including MAVLink for small UAVs, Real-

Time Publish-Subscribe (RTPS) within the Data Distribution Service (DDS) framework for real-

time data exchange between distributed systems, MAVROS for communication between PX4 and

ROS nodes, as well as Pulse Position Modulation (PPM), Serial Bus (SBUS), and Pulse Width

Modulation (PWM) for various Radio Control (RC) systems [92]. We leverage a speical version

of PX4 that has RTPS enabled. RTPS is another protocol designed for real-time communication

in distributed systems. It is not limited to MAVLink specific applications, and can be used in a

wider range of applications within a network. While MAVLink is essential for mission-critical

communication between the autopilot and ground control station, RTPS complements MAVLink

17


by providing a more flexible and versatile communication framework for interconnecting various

software modules and components within the PX4.

2.1.3 Robot Operating System 2

Robot Operating System 2 (ROS2) is an open-source framework that offers modern middleware

layer that runs on top of a conventional operating system. Evolve from ROS, ROS2 provides

enhanced features and capabilities to meet the demands of advanced robotics applications. ROS2

also offers redesigned communication middleware called Data Distribution Service (DDS), which

enables efficient message passing and interoperability between nodes. Moreover, ROS2 incorporates

security features, including support for secure communication channels and access control policies,

making it suitable for deployment in environments where safety and privacy are paramount.

ROS2 empowers developers to create robust, scalable, and secure robotic systems for various

applications, ranging from research and education to industrial automation and beyond [47].

2.1.4 MAVericks

Our aerial autonomy stack is largely based on ARL MAVericks with additional capabilities

developed by UMD researchers. MAVericks is a ROS2-based autonomy stack that works across

both simulation and small UAS. It is largely leverages functionality of open-source and DoD-

owned software. MAVericks can run on Modal AI VOXL and VOXL2, and it is capable of

behavior tree navigation, object detection and localization, precision landing, multi-agent teaming,

obstacle avoidance, digital elevation maps, OpenVINS visual inertial odometry [93]. It contains

simulation environments in Unity suitable for verifying advanced perception algorithms in a

18


software-in-the-loop (SITL) simulation [87]. MAVericks also contains UGV bridge for ROS1

systems. It serves a pipeline to transition the state-of-the-art algorithms developed by academia

into DoD [93].

2.2 Simulation and Visualization Software

This sections presents commercial-of-the-shelf (COTS) and custom-built simulation and

visualization software we leverage to comprehend and test the aerial autonomy stack.

2.2.1 Unity

Our autonomy algorithms undergo rigorous testing in the Unity simulation environment

before any field experiments are conducted. Unity is a powerful tool in the field of robotics

simulation, offering a versatile platform for developers to create highly realistic environments.

With its extensive library of assets, Unity enables users to accurately model the dynamics of

robots and their interactions with the environment. Additionally, it is flexible to integrate with

other software and hardware components, facilitating the development and testing of algorithms

in perception, planning, and navigation [94]. To accurately replicate the flight conditions in

simulation, our team has developed 3D Unity scenes using photogrammetry. Also, our team

is invited to compete in DARPA Triage Challenge, which focuses on the development of novel

physiological features for medical triage in mass casualty incidents (MCIs), where medical resources

are insufficient compared to the demand [95]. UAS are tasked with detecting AprilTags on A4

pages from a stand-off distance. Figure 2.1 shows that we have incorporated both manikins and

AprilTags into our Unity scene, enabling us to test our detection algorithms and determine the

19


optimal flight altitude in simulation.

Figure 2.1: Simulated manikins and AprilTags are added to the 3D scene to facilitate testing
algorithms in Unity

Figure 2.2: Two cars are added to the 3D scene to enable testing algorithms detecting large targets
with the pre-trained detection models

20


Figure 2.2 illustrates the Unity scene that consists of two cars, allowing for the testing of

detection models pre-trained by the COCO public dataset within Unity [96].

2.2.2 QGroundControl

QGroundControl is a well-known open-source ground control station software, offering

comprehensive capabilities for commanding and controlling UAS. QGroundControl provides

an interface for mission planning, telemetry monitoring, and vehicle control, empowering both

hobbyists and professionals in the UAS community. Users can efficiently plan missions, monitor

real-time telemetry data, and execute precise flight operations with its user interface and extensive

feature set. QGroundControl enables collaboration and innovation, allowing developers to contribute

improvements, customize new features, and integrate capabilities seamlessly into various UAS

platforms. QGroundControl is developed using C++ and Qt Modeling Language (QML), leveraging

the Qt framework to create responsive GUI across various platforms [97].

2.2.3 Custom User Interface

We developed a custom user interface to monitor, command, and control robotic platforms,

providing additional capabilities not offered by QGroundControl. It is implemented in React

[98] and Spring Boot [99] and brings together the power of two robust technologies to create

dynamic and responsive web user interface. React, a popular open-source JavaScript library for

building client-side components, provides a component-based framework that enables developers

to create modular UI components. Spring Boot, a framework for building Java-based enterprise

applications, offers simplicity and convention over configuration, allowing developers to quickly

21


set up and deploy new HTTP endpoints. By developing automated tests written in Jest [100],

JUnit [101], and Cypress [102] alongside production code, new features can be introduced quickly

to meet operational requirements. Figure 2.3 shows other technologies used in the user interface

tailored for one of the outdoor missions. Google Maps Platform offer a comprehensive set of APIs

and SDK that enable developers to integrate various mapping and location-based services into

their applications [103]. Google OR-Tools [104] offers a collection of optimization algorithms

and tools for solving various combinatorial optimization problems such as Vehicle Routing Problems

(VRP) [105]. Node.js is an open-source, server-side JavaScript runtime environment that allows

developers to build scalable and efficient web applications [106]. MySQL is a open-source

database management system known for its reliability, performance, and ease of use. It is a

key component of many web applications and software systems, powering data storage and

retrieval for a diverse range of applications [107]. Docker is a popular technology used for

containerization, enabling developers to package applications and their dependencies into isolated,

lightweight containers that can run consistently across various environments. [108]. In this

dissertation, both the user interface and MAVericks have been dockerized to enhance consistency,

enabling fast deployment and testing.

Figure 2.3: Some technologies used in custom user interface

22


In Figure 2.3, Hypertext Transfer Protocol Secure (HTTPS) is an extension of the HTTP

protocol used for secure communication between React frontend and Spring Boot backend. The

backend calls MySQL through Java Persistence API (JPA), which allows developers to interact

with relational databases using Java objects. The user interface is designed with modularity in

mind, allowing for easy migration to different UAS platforms. The development process follows

a test-driven and agile approach, ensuring the reliability and efficiency of the software.

2.3 Uncrewed Aerial Systems

This section presents NDAA-compliant uncrewed aerial vehicles (UAVs) as well as long-

range communication systems and radio control (RC) transmitters utilized within this dissertation.

These components are the cornerstone of our project, facilitating testing of autonomous behaviors

of the UAS with precise control and robust data transmission.

2.3.1 Radios

Operating UAS in environments where communication is limited poses significant challenges

and requires careful consideration within the realm of aerial autonomy. The ability to establish

and maintain communication links is paramount for the safe and effective operation of UAS,

particularly when operating beyond the visual line of sight (BVLOS) [3]. While UAS may be

able to maintain short-range links between nearby vehicles, long distances can introduce signal

degradation, latency, and vulnerability to interference [4]. In the absence of reliable long-distance

communication, UAS may encounter difficulties in real-time decision making by an operator,

compromising their overall flight safety and efficiency [5]. Wireshark is employed to validate the

23


Figure 2.4: Wireshark is employed to validate the QoS in ROS2, monitor network packets,
and assess bandwidth utilization within a mesh network consisting of three UAS and one
groundstation

Quality of Service (QoS) in ROS2, monitor network packets, and assess bandwidth utilization

within a mesh network. This practice prevents unnecessary data exchange that could degrade

communication performance between UAS during BVLOS operations.

Figure 2.4 illustrates a significant exchange of packets between three UAS and the ground

station, with ROS2 messages being repeatedly transmitted to the ground station, constraining the

available network bandwidth. The QoS setting of the tf topic is set to reliable which is one of

the ROS2 QoS settings that guarantees message delivery. Each tf message is about 3.7 KB in

size. The publish rate is erroneously set to 70 Hz which consumes 2.59 MBps. When network

bandwidth is limited, undelivered messages are repeatedly resent until successfully delivered,

thereby exacerbating the strain on the already overburdened network. Fiona is the custom user

interface that subscribes to the ROS2 topics from the UAS to process on the ground station.

Figure 2.5 shows that packets flow freely between the same UAS and the ground station.

24


Figure 2.5: Free-flowing network

After the initial handshake between the ground station and the UAS when Fiona is started, the

UAS only transmit about 300 packets per second or 50 KBps of data to the ground station.

2.3.1.1 Doodle Labs Radios

Doodle Labs radios represent a cutting-edge advancement in wireless communication technology,

offering a diverse range of solutions for robotics, Internet of Things (IoT), and mission-critical

infrastructures [109]. In this thesis, a 2.4GHz Doodle embedded radio is used for the groundstation

and 2.4GHz Doodle mini radios are used on all the UAS. Compared to Doodle minis, embedded

radios offer a higher network throughput with a larger form factor as shown in Figure 2.6.

All the radios are set to operate at 2.450 GHz with 15 MHz bandwidth, which has the

maximum transmission power among all the available frequencies. The data link is encrypted

with WPA2-PSK AES 128 bit for ArtIAMAS and 256 bit for DAPAR Triage Challenge to

protect sensitive data. The maximum throughput with encryption is capped at 60 Mbps and

25


Figure 2.6: (left) Doodle embedded radio and (right) Doodle mini radio (Credit: Doodle Labs)

12 Mbps, respectively. Figure 2.7 shows the configuration page where the frequency, bandwidth,

and encryption can be set up. The firmware on the radios have been upgraded to the 2023 Long

Term Time (LTS) which offers central configuration, frequency hopping, and link recovery [110].

Figure 2.7: Software configuration page for the Doodle radios. Channel is set to 51 and
Bandwidth is set to 15MHz

26


2.3.1.2 Remote Control Transmitter

Jeti Remote Control (RC) transmitters (Figure 2.8b) are used to adhere to the provisions

and requirements outlined within National Defense Authorization Act (NDAA). In recent years,

NDAA compliance has gained significant attention due to provisions related to cybersecurity,

especially concerning the use of products and services from certain countries or companies that

may pose risks to national security. The transmitters are made in the Czech Republic, offering

high-quality radio control systems operating in 2.4GHz and 900MHz bands for hobbyists and

professional radio operators.

(a) 915 MHz Dragon Link on Jeti Transmitter (b) 900 MHz Jeti transmitter

Figure 2.8: Sub-GHz RC transmitters used in this project : (a) 915MHz Dragon Link mounted
on 2.4 GHz Jeti Transmitter, and (b) 900 MHz Jeti Transmitter

To avoid interference with 2.4 GHz Doodle radios, either a 900 MHz Jeti transmitter or a

2.4 GHz Jeti transmitter with a 915 MHz DragonLink add-on module is used. The DragonLink

is also a long-range RC module offering a range over 50 kilometers [111].

27


2.3.2 Uncrewed Aerial Vehicles

Since the inception of our project, we have been exploring cost-effective and NDAA-

compliant UAV platforms. Modal AI m500 serve as our stepping stone, providing us with

valuable insights into autonomy stack and the onboard electronics. As our project mature, we

transition to the RB5 platforms, which offer enhanced capabilities and performance. Ultimately

we migrate to the Modal AI VOXL2-based platforms such as ARL Grazer and UMD Chimera,

which further enhance the reliability of our fleet. Considering that our autonomy stack relies on

both MAVLink and RTPS communications, it becomes essential to adapt existing aerial robotic

platforms to support both serial communication protocols.

2.3.2.1 Modal AI m500 Based Quadcopter

Modal AI m500 only contains MAVLink communication between the onboard computer

and the flight controller by default. Modification is needed to incorporate a serial cable for the

operation of the autonomy stack on Modal AI m500 platforms. One solution is to connect J1010

UART on the VOXL flight controller to the USB port on the Modal AI VOXL Add-On board

mounted on the VOXL onboard computer via an FTDI adapter. The FTDI adapter converts

signals between USB and serial communication protocols, enabling RTPS communication between

the VOXL Flight onboard computer and the flight controller.

1. RTPS is used to connect PX4 micro Object Request Brokers (uORBs) to ROS2 topics,

with data transmission handled by an external FTDI adapter. uORBs serve as a messaging

infrastructure within the PX4 autopilot system.

28


2. MAVLink is the communication protocol that connects QGroundControl with the flight

controller, utilizing the internal UART provided by the m500 for transmission.

Figure 2.10 depicts the modified Modal AI m500 featuring a mounted FTDI adapter. The

UART is converted into VOXL USB using this adapter, eliminating the need for specific drivers.

Furthermore, a dedicated FTDI adapter has been developed to link J1010 to the USB expansion

port of the Microhard Add-On board. Any other USB Expander with host capabilities operating

at 3.3V logic should also be compatible. While standard FTDI adapters will work, a custom one

is developed by ARL to minimize the weight and size.

Figure 2.9: Modal AI m500 with FTDI adapter

2.3.2.2 Modal AI RB5 Based Quadcopter

Even though PX4 has the capability to operate on the Qualcomm QRB5165 premium

tier robotics processor found in RB5, executing RTPS-enabled PX4 directly on the Qualcomm

29


Figure 2.10: Wire diagram of Modal AI m500 based quadcopter tailored for running MAVericks.
The FTDI adapter enables RTPS communication between VOXL Flight onboard computer and
the flight controller.

processor poses challenges. An external Pixracer flight controller is added to the RB5 to bypass

the limitation. Additionally, two serial cables are added to facilitate RTPS and MAVLink communications.

Figure 2.11 shows the wire diagram of the modified RB5 platform tailored for running MAVericks.

The USB Ethernet adapter is utilized to connect to the Doodle Labs radio for long-range communication.

The theoretical communication range of the radio exceeds 10 km and it can achieve a maximum

throughput of 100 Mbps.

Figure 2.12 shows an advanced 915 MHz Dragon Link is added to Jeti transmitters to

avoid interference with the Doodle Labs radios [111]. The RB5 platforms leveraging Integrated

Qualcomm QRB5165 premium processor offer enhanced capabilities and performance.

30


Figure 2.11: Wire diagram of the RB5 with Pixracer flight controller tailored for running
MAVericks

(a) Dragon Link mounted on Jeti Transmitter (b) Modified RB5 Platform

Figure 2.12: Jeti transmitter with Dragon Link and retrofitted RB5 platforms for MAVericks
autonomy stack

31


2.3.2.3 ARL Grazer-A

VOXL2 is ModalAI’s latest autopilot, also powered by the Qualcomm® Flight RB5 5G

platform with a smaller form factor. In addition to VOXL2, the other major electronic components

on the ARL Grazer (Figure 2.13a) include the mRo Control Zero H7 flight controller and the 2.4

GHz Doodle Labs mini radio. Compared to embedded radios, the theoretical communication

range of the mini radio exceeds 10 km, and it can achieve a maximum throughput of 80 Mbps.

The high-resolution camera is configured to face the ground at an angle of 15 to 60 degrees.

The system diagram in Figure 2.13b illustrates the wiring connections between the VOXL2

onboard computer, the flight controller, and the Doodle Labs mini radio, enabling long-range

communications. Rather than mounting a 915MHz Dragon Link module, we directly use 900

MHz Jeti transmitters to avoid interference with the Doodle Labs radios. Additionally, we utilize

Sony Starvis IMX412 12MP (4K) MIPI Camera Module for onboard perception algorithms.

(a) ARL Grazer (b) System diagram

Figure 2.13: Custom UAS platform by Survice Engineering - ARL Grazer and system diagram
developed for MAVericks autonomy stack

2.3.2.4 UMD Chimera

We have also worked with UROC to configure MAVericks running on UMD Chimera,

which has the same electronics as ARL Grazer. Rather than mounting the mRo Control Zero

32


and Doodle mini on a proprietary Survice carrier board, Chimera uses the Modal AI USB 3.0

UART Add-on board to expose a USB port and a UART port [112]. The Doodle mini connects to

VOXL2 via the USB as shown in Figure 2.14. Figure 2.15 illustrates the wiring diagram for the

Figure 2.14: Doodle mini is connected to VOXL2 through the USB port open by the Modal AI
USB 3.0 UART Add-on

mRo Control Zero H7 flight controller and VOXL2. The addition of the USB 3.0 Add-on exposes

one UART for MAVLink communication, while the M0076 interposer [113] exposes the other

UART on VOXL2 for RTPS communication. A bidirectional voltage level shifter is employed as

J8 operates at 1.8V logic levels while the flight controller operates at 3.3V logic levels [114].

Figure 2.16 shows the wiring diagram of the Doodle mini radio, VOXL2, and mRo Control

Zero H7 at a high level. This wiring diagram bypasses the Survice proprietary carrier board,

33


Figure 2.15: Wiring diagram of mRo Control Zero H7 and VOXL2 via USB 3.0 Add-on and
M0076 Interposer

reducing the cost of the Chimera by around $4,000.

Figure 2.16: Wiring diagram for UROC Chimera tailored for running MAVericks

34


2.4 Experimental Facilities

In addition to the prototyping and software integration carried out through the simulation

process using software-in-the-loop testing, we conduct comprehensive flight tests to validate the

efficacy and robustness of our autonomy software on custom quadcopters built upon the Modal

AI VOXL, RB5, and VOXL2.

This section provides an insight into the infrastructure and resources that constitute our

flight facilities. By elucidating the key aspects of these facilities, we aim to ensure the reliability

and effectiveness of our autonomy software in real-world applications.

2.4.1 UMD Fearless Flight Facility

The Fearless Flight Facility (F3) is an outdoor flight facility for UMD College Park exclusively

designated for testing UAS in the National Capital Region, where restricted airspace is imposed.

With dimensions of 100 feet in width, 300 feet in length, and 50 feet in height, this facility serves

a critical UAS testing site near the campus (Figure 2.17). F3 empowers us to conduct quick flight

tests, ensuring a fast turnaround time for the evaluation and validation of the software on the

robotic platforms.

2.4.2 UROC Raley Farm

Our team is also in close partnership with UMD UAS Research and Operations Center

(UROC). Situated in Southern Maryland, Raley Farm is a 70-acre field serving as the primary

flight hub for the UMD UROC (Figure 2.18). UROC is one of the select few institutions across

the nation directly collaborating with the FAA to drive forward UAS research and demonstrate

35


Figure 2.17: Fearless Flight Facility (F3) near the University of Maryland College Park Campus

operational capacities. Leveraging UROC’s expertise in operational protocols through the partnership,

we can consistently assess our autonomy and enhance perception algorithms within a high-

fidelity operational environment.

Figure 2.18: Raley Farm flight faciltiy near the UMD UAS Research and Operations Center
(UROC) in Southern Maryland

36


Chapter 3: Autonomy Stack and Software Design

This chapter outlines both the problem statement and the corresponding resolutions, encompassing

perception, decision-making, and navigation components. Moreover, it discusses on navigation

mission planning and adaptation specifically tailored for UAS. Additionally, it explores the methodologies

behind target detection and localization utilizing computer vision and machine learning algorithms.

Furthermore, it introduces two advanced perception algorithms aimed at identifying additional

intelligence regarding the identified targets. Finally, it explores ground control software, which

facilitates mission planning, air-ground coordination, and human-robot interaction. The ground

control software also enhances situational awareness by continuously tracking robot positions,

flight paths, target locations, and target images in real-time.

3.1 Problem Statement and Solution Designs

Our primary challenge revolves around the necessity for the UAS to search, detect, and

localize ground targets with high accuracy while minimizing time. This requires a sophisticated

software architecture capable of handling sensor inputs, processing complex data streams, and

making intelligent decisions, all while ensuring the safety and reliability of the UAS.

Our search and revisit strategy involves two phases, during the initial phase, the search area

is divided into smaller regions of similar size. Subsequently, lawnmower patterns are generated

37


for the UAS to sweep the search at a high altitude, the onboard object detection algorithm detects

and localizes target-like objects on the ground. After the broad area survey, the UAS collectively

plan the flight paths to revisit the targets and to extract additional information of the targets with

the advanced perception algorithms. Figure 3.1 is a high-level view for the search and revisit

algorithm. Our approach maximizes search coverage while minimizing flight time.

Figure 3.1: Algorithm for the search and revisit: (left) preliminary broad area survey detecting
targets on the ground and (right) subsequent target revisits involving multiple autonomous UAS
and perception

The solution design involves the creation of an autonomy stack, comprising perception,

decision-making, and navigation. The perception module focuses on sensor fusion, incorporating

data from a single RGB camera, IMU, and GPS to detect and localize targets. The decision-

making module tracks status of a mission, analyzing the target data for mission replanning.

Our autonomy stack is based on ARL’s MAVericks with additional capabilities developed

by UMD researchers. Our solution emphasizes modularity and adaptability. The additional

capabilities are developed with a set of interchangeable and upgradeable components, allowing

for easy integration with MAVericks. This modularity facilitates scalability ensuring the autonomy

38


Figure 3.2: Configuration facilitating centralized control and PX4 flight stack

Figure 3.3: Full-autonomous configuration with PX4 flight stack

39


Figure 3.4: Full-MAVericks configuration with MAVericks navigation stack

stack can evolve alongside advancements in the capabilities.

Here are three configurations for conducting search and revisit mission we have developed

during the evolution of our autonomy stack. Figure 3.2 shows the configuration where the robotic

platforms are centralized controlled. Mission plans are developed in JSON on the ground station

and then transferred to the UAS via MAVSDK [91]. Most of the AI/ML algorithms such as object

detection, advanced perception algorithms, and object geolocalization are executed on the UAS.

The ground station tracks mission status and performs path planning with a VRP solver. ROS2

microservices collect telemetry data for Fiona, enabling real-time tracking of UAS positions,

speed, and object locations. All UAS are linked to the common Doodle network through a star

topology, providing both high throughput and scalability. The centralized control configuration

is ideal for missions requiring continuous monitoring and operator in the loop.

Figure 3.3 shows full-autonomous configuration where the mission plans for the initial

40


broad area survey are developed in JSON on the ground station and then transferred to the

UAS via MAVSDK [91]. The UAS tracks the mission status and update the missions onboard.

Subsequently, the missions are uploaded to PX4 flight deck via MAVSDK by the UAS. All

UAS are connected to the same Doodle mesh network which ensures that object locations are

synchronized across the UAS, guaranteeing that the VRP solver returns the same paths. The

UAS may not have continuous communication with the ground station, but the UAS in close

range can communicate with each other. This setup is well-suited for operations involving some

operator control and beyond visual line of sight.

Figure 3.4 illustrates fully autonomous configuration leveraging MAVericks navigation

stack. This configuration involves running a variety of AI/ML algorithms on the UAS. The UAS

also tracks mission status using a behavior tree and performs path planning with a VRP solver. All

UAS are connected to the same Doodle mesh network. MAVericks navigation stack offers higher

agility over PX4 flight deck enabling UAS to pivot quickly. The full MAVericks mode is suitable

for running search and revisit missions beyond visual line of sight and facilitating incorporating

other MAVericks capabilities developed by the community. The ROS2 microservices operating

on the ground station retain the capability to gather telemetry data for the custom user interface as

long as communication remains uninterrupted. This facilitates the real-time monitoring of UAS

positions, speed, and object locations.

3.2 Mission Planning and Adaption

Mission planning is a critical aspect of autonomous systems, involving the generation of a

sequence of actions to achieve predefined objectives. Adaptation is a key feature, allowing the

41


autonomous system to adjust its behavior in the environment effectively, allowing the system to

navigate complex scenarios.

Our search framework spans across two distinctive phases. In the initial phase, the search

expanse undergoes segmentation into smaller, uniformly sized regions. Subsequently, lawnmower

patterns are generated for the UAS to systematically sweep the search area at high altitudes, while

the onboard object detection algorithm identifies and geolocates targets on the ground. After the

initial broad area survey of the expansive area, the UAS collaboratively replan flight paths to

revisit the detected targets, facilitating the extraction of close-up imagery and supplementary

intelligence for operators.

3.2.1 Broad Area Survey

When creating lawn mower patterns for multi-agent systems, it’s crucial to divide a large

area into smaller ones of similar size, ensuring each agent covers a comparable area. To achieve

this objective, our approach starts with uniformly sampling the large area. Subsequently, we

employ K-means to partition the search area into K smaller segments, as illustrated in Figure

3.5. Figure 3.5a demonstrates the division of a polygon into three segments of comparable size,

while Figure 3.5b demonstrates the partitioning of a pentagon into five equal sized segments. The

distance between two dots is determined by the track spacing.

Subsequently, we employ a grid-based algorithm to connect the dots in each segment to

generate the lawn mower patterns for the UAS [115]. The approach is also used by QGIS,

which is a free and open-source cross-platform desktop geographic information system (GIS)

application [116, 117].

42


(a) Divided into small 3 polygons (b) Divided into small 5 polygons

Figure 3.5: K-means is utilized to split a large polygon into an arbitrary number of small
polygons.

3.2.2 Target Revisiting

We use the Vehicle Routing Problem (VRP) solver for path replanning. VRP is an extension

of Traveling Salesman Problem (TSP) for multi-agents. The goal of TSP is to find routes with

the least total distance for one salesman visiting a set of locations. The TSP can be seen as

a combinatorial optimization and an integer linear programming problem [118]. One of the

formulations is described as follows. Let nodes be numbered 1, ..., n and define:

xij =


1 a path from node i to j

0 otherwise

For i = 1, ..., n, let wij be the distance between the nodes i, j. To solve a TSP problem, we need

to find

min
n∑

i=1

n∑
j ̸=i,j=1

wijxij

43


where xi,j ∈ 0, 1.

The Vehicle Routing Problem (VRP) extends the Traveling Salesman Problem (TSP) and

falls within the realm of NP-hard problems [119]. VRP aims to determine optimal routes for

multiple salesmen (or UAS), minimizing the total distance traveled while visiting a given set of

locations. It is difficult to solve VRP in large scale. Probabilistic techniques are often preferred

for applications where computation resources are too limited for deterministic algorithms, as

the probabilistic methods are more adaptive to deal with a large number of constraints such as

capacity, waiting time, etc [104]. There are many flavors of probabilistic methods that have been

proposed such as particle swarm optimization [120], genetic algorithm [121], Tabu search [122],

Simulated annealing [123]. The state-of-the-art probabilistic algorithms reach solutions within

less than 1% of the optimum for problems consisting of millions of nodes [124].

Another recent notable trend is the utilization of reinforcement learning (RL) algorithms

to train policies for solving combinatorial optimization problems [125]. Although RL has yet

to surpass state-of-the-art approaches, it presents a promising alternative for automating search

processes autonomously [126, 127].

We use the solver from Google OR-tools for its ability of solving problems with additional

constraints such as different start and stop points. Google OR-tools often employs local search

algorithms like Simulated Annealing or Tabu Search to improve solutions generated by other

methods. These algorithms iteratively explore the solution space to find better solutions [104].

The solver also uses Constraint Programming (CP) which can handle complex constraints and

dependencies between variables. The specific algorithm or a combination of the algorithms used

in the solver can often be configured by the user through parameters and options provided by the

API to achieve the best results [104].

44


(a) A 20-node problem (b) A 100-node problem

Figure 3.6: The VRP for two agents was resolved using the solver within Google OR-tools

Figure 3.6 displays the outcomes of solving a 20-node problem and a 100-node problem

with two agents using Google OR-tools, resulting in satisfactory performance. The two agents

start their routes from distinct locations and return to the ’x’ point.

3.2.3 Behavior Tree Modeling

Behavior trees are hierarchical structures that represent a set of tasks and their relationships,

allowing for modular design of robot behaviors [128]. It is implemented to facilitate the development

of intelligent and adaptive robots by providing a flexible framework for decision-making [129].

It enables engineers and developers to design robot behaviors in a modular and reusable manner,

making it easier to create, modify, and maintain complex robotic systems [130]. ROS2 behavior

tree plays a crucial role in orchestrating the execution of tasks, handling sensor input, and making

decisions based on the robot’s environment, contributing to the overall efficiency and reliability

of robotic applications [131]. Figure 3.7 illustrates the behavior tree tailored for achieving fully

autonomous search and revisit operations using MAVericks. The highlighted subtree in light blue

45


orchestrates the development of lawnmower patterns for all UAS in the swarm; each follows a

unique pattern based on its hostname. Furthermore, the orange-highlighted subtree directs the

aircraft to initiate takeoff and execute the mission. In addition, the yellow-highlighted subtree

guides the aircraft in following the lawnmower pattern for the initial broad area survey. After

completion, the purple-highlighted subtree instructs the aircraft to hover and await the completion

of the initial survey by other aircraft in the swarm. Subsequently, the green-highlighted subtree

invokes the VRP solver for mission replanning, generating new flight paths for UAS to revisit

targets. Since the VRP solver requires a few seconds to generate paths, the magenta-highlighted

subtree directs the aircraft to hover until the solver finishes. Finally, the red-highlighted subtree

commands the aircraft to commence the revisiting phase.

Figure 3.7: Behavior tree for search and revisit missions developed for MAVericks

46


3.3 Target Detection and Localization

3.3.1 Object Detection

You Only Look Once (YOLO) is the state-of-the-art neural network for real-time object

detection [132]. It has been widely used in applications such as autonomous driving, personnel

recovery and rescue, and object retrieval and delivery [133]. The architecture unifies object

detection and object classification for real-time performance [134]. A YOLOv5 small variant

model is integrated into our aerial autonomy stack running on VOXL2. On the ground station,

both the YOLOv5 medium variant and the YOLOv8 medium variant have been utilized, as they

offer superior performance albeit with reduced inference speed.

3.3.2 Object Localization

Object localization is the process of finding the precise geographic location of a specific

target. Existing techniques leveraging RGB-D cameras or LiDAR are suitable for vehicles that

can carry a large payload. For UAS aiming to geolocate ground targets, one simple method

involves employing a single camera to initially gauge the distance between the UAV and the

target using a pinhole camera model. Subsequently, the target’s position is triangulated based on

the camera bearing, angle, and the UAS GPS coordinates [135].

Based on the center of the detection bounding box and the camera intrinsic matrix, the

distance of the object from the camera can be inferred from the GPS coordinates and altitude

of the UAS and the ground plane. The algorithm is a relative cost-effective alternative to stereo

camera and LiDAR to estimate the distance of an object of interest. After the distance from the

47


UAS to an object is found, a series of transformations are used to calculate the object position in

the world-fixed frame [135], i.e.,

Tw
o = Tw

b T
b
cT

c
o

where Tw
o is the transformation of the object position relative to the world frame that is used for

globally-consistent representations of distances; Tc
o is the transformation of an object position

relative to the camera frame; Tb
c is the transformation from the camera frame to the vehicle body

frame; and Tw
b is the transformation of the vehicle body relative to the world frame.

3.3.3 Object Detection Clustering

A limitation of object localization with a single camera is the potential for significant errors,

which may arise from factors such as rapid camera movements, GPS inaccuracies, and aircraft

vibrations. We operate under the assumption that the localization estimates of a particular object

adhere to a Gaussian distribution. By employing Gaussian Mixture Models (GMM), we identify

the centroids of these Gaussians, representing the most probable locations of the objects.

Before running GMM, we must establish the number of centroids, which serves as a

hyperparameter for the algorithm. The Silhouette score is a metric used to determine the optimal

number of centroids in clustering algorithms [136]. The Silhouette measures the quality of

clustering by calculating how similar an object is to its own cluster compared to other clusters

[136]:

s =
b− a

max(a, b)

where a represents the average distance between the data point and all other points within the

same cluster. b represents the average distance between the data point and all points in the nearest

48


neighboring cluster. We calculate the average of all individual Silhouette scores to derive the

overall Silhouette score for a specified number of centroids. Subsequently, we iterate through

all potential numbers of centroids (e.g., 10) to identify the optimal number associated with the

highest score. The Silhouette score spans from -1 to 1, where a score of 1 signifies well-clustered

data points, while a score of -1 indicates misalignments.

A regularization term can also be used to prevent overfitting:

smod = s+ γ · n (3.1)

where γ represents the regularization term, and n signifies the number of clusters. When γ > 0,

the Silhouette score faces a penalty for opting for a smaller cluster count. Conversely, when

γ < 0, the Silhouette score experiences a penalty for favoring a larger number of clusters.

3.4 Advanced Perception Models

Our autonomy stack has incorporated advanced perception algorithms such as action recognition

and person re-identification developed by other ArtIAMAS researchers into MAVericks. The

results haven been demonstrated in the ArtIAMAS 2023 field experiments.

3.4.1 Action Recognition

Deep learning has been widely used in action recognition. The majority of the work focus

on utilizing high-quality videos taken on the ground. When applying these algorithms directly

to videos captured by UAS, a significant decrease in accuracy is observed due to factors such as

low resolution of images and camera movement. Auto zoom and temporal reasoning (AZTR) is

49


one the state of the art algorithms for action recognition [137–139]

The algorithm has attained a 95% top-1 accuracy on the Drone Action dataset, which

captures action recognition data via UAS [140]. Our autonomy stack has been enhanced by

integrating AZTR running on UAS. To demonstrate its effectiveness, the action recognition model

can detect three crucial actions: halt, attention, and follow me, as defined in the Robot Control

Gestures (RoCoG-v2) dataset [141, 142].

3.4.2 Person Re-identification

The goal of person re-identification is to match a person captured in one camera view,

referred to as a query image or probe, with the same person appearing in other camera views,

or the gallery or reference images. The task involves two main stages. Initially, a gallery of

individuals is built using feature embeddings, which refer to transformation of raw input data

(features) into a lower-dimensional vector space. The gallery images are prepared and uploaded

to the UAS before a mission is started. Subsequently, a query image is utilized to compare

with the gallery images. An unsupervised approach or a clustering method is utilized to find

the best-matched gallery image. The algorithm integrated into our autonomy stack was initially

created using PyTorch and CUDA [143]. Afterwards, the code underwent refactoring to ensure

compatibility with our autonomy stack and enable execution on VOXL2. As a proof of concept,

the person re-identification system stores approximately 32 images from six manikins and human

actors, as shown in Figure 3.8.

50


Figure 3.8: Manikins and human actors for Re-ID running on VOXL2

3.5 Ground Control Software

This section introduces two flavors of custom built ground control software: one designed

for outdoor search and revsit missions and the other tailored for 3D mapping missions. Both

variants of the user interface are created using React [98] and Spring Boot [99].

3.5.1 Outdoor Variant

Born out of 2021 First Responde