ABSTRACT

Title of Dissertation: LEVERAGING ADS-B:
A FRAMEWORK FOR DATA COLLECTION
AND AIRSIDE OPERATION METRICS ANALYSIS
AT SMALL AIRPORTS

Zhuoxuan Cao
Doctor of Philosophy, 2025

Dissertation Directed by: Professor David Lovell
Department of Civil and Environmental Engineering

With the growing global demand for air travel, General Aviation (GA) airports are

facing significant challenges. Unlike larger airports, many GA airports operate with limited

infrastructure, leading to issues such as delays and congestion. Managing a mix of flight

types, including training and regular flights, within tight budget constraints and limited runway

capacity further complicates operations. Effective management and reliable capacity estimation

are crucial, especially as these airports often depend on federal funding for future expansions.

However, the lack of effective data collection mechanisms and equipment makes it difficult to

implement data-driven management strategies or accurately estimate capacity, particularly given

the complexities of handling diverse flight operations.

Tasked by the Federal Aviation Administration (FAA), this project addresses the capacity

estimation challenges at GA airports using Automatic Dependent Surveillance–Broadcast (ADS-

B) technology. It proposes a comprehensive data pipeline and analysis system hosted on


Amazon Web Services (AWS) to collect, decode, filter, analyze, and archive flight data. This

system facilitates the extraction of key operational metrics for advanced capacity modeling. To

ensure precise parameter extraction, the framework incorporates a rule-based model for accurate

operation type classification. Additionally, a novel signal enhancement method is introduced to

improve ADS-B data quality, ensuring more reliable and consistent flight trajectory timestamps.

To support the development of the second generation of the Airport Capacity Model

(ACM2) and define the required operational metrics, this work provides specifications for

bounding boxes at target airports and establishes key operational benchmarks. The methodologies

for calculating departure and arrival operational metrics based on the benchmarks are also

detailed.

Leveraging the advantages of the proposed data analysis system, this study demonstrates

various applications of ADS-B data analysis. These include performance comparisons between

flights with different operational purposes, correlations between squared flight speeds at various

phases and density altitude, and time series predictions of air traffic flow at specific airports. By

addressing these challenges, this project has the potential to significantly enhance the accuracy of

capacity estimation across thousands of GA airports while delivering reliable aviation data and

actionable insights to both the aviation research community and GA airport stakeholders.


LEVERAGING ADS-B:
A FRAMEWORK FOR DATA COLLECTION AND

AIRSIDE OPERATION METRICS ANALYSIS
AT SMALL AIRPORTS

by

Zhuoxuan Cao

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2025

Advisory Committee:
Professor David Lovell, Chair/Advisor
Professor Seth Young
Professor Timothy Horiuchi, Dean’s Representative
Professor Paul Schonfeld
Assistant Professor Alexander Estes
Associate Professor Xianfeng (Terry) Yang


© Copyright by
Zhuoxuan Cao

2025


Dedication

To my family, whose unwavering support and encouragement made this journey possible.

ii


Acknowledgments

I would like to express my deepest gratitude to my advisor, Dr. David Lovell, whose

unwavering support throughout my Ph.D. journey has made it possible for me to come this far

in academia. Although I have not always been perfect in my work, his guidance has taught me

the importance of rigor and creativity in both research and professional life. When I began my

Ph.D., I was not a particularly confident person, but his encouragement helped me realize that I

am capable of achieving much more than I had originally believed.

I am also sincerely thankful to Dr. Seth Young for his guidance during the completion of

my project. His professional advice helped me overcome several technical challenges, and I am

especially grateful for his support during the Christmas holidays, when we worked together to

ensure everything was submitted before the deadline.

My sincere thanks also go to Dr. Paul Schonfeld, Dr. Alexander Estes, and Dr. Terry Yang

for serving as members of my dissertation committee, and to Dr. Timothy Horiuchi for stepping

in as the Dean’s Representative at the last minute.

I am truly grateful to Ms. Anna Damm for her administrative support. From the moment

I was admitted to the program to the final steps of graduation, she helped me handle countless

forms and procedures. Without her help, I would not have made it through.

I also want to thank my fellow Ph.D. colleagues for their companionship and advice during

my studies and internship. In particular, I appreciate Zheyu Li for sharing housing with me during

iii


my internship, and Dr. Yeming Hao for offering invaluable advice throughout both my academic

and personal life.

Finally, I owe my heartfelt thanks to my family, including Mikey and Dave, for their

unwavering support and presence throughout this journey. And most of all, to my fiancée, Mia

Zhang, thank you for your extraordinary patience and love. You’re the one who truly endured the

full chaos of my Ph.D. life: the stress, the scattered sleep schedule, the emotional ups and downs,

the unwashed dishes, and those late-night League of Legends sessions. Honestly, you deserve a

degree for this too.

To all of you: this achievement is as much yours as it is mine.

iv


Table of Contents

Dedication ii

Acknowledgements iii

Table of Contents v

List of Tables viii

List of Figures ix

List of Abbreviations xi

Chapter 1: Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Research Questions and Approaches . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 How can high-resolution ADS-B data be efficiently collected at GA
airports? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.2 How can operational metrics be extracted from the collected ADS-B data
to support airport capacity modeling? . . . . . . . . . . . . . . . . . . . 9

1.3.3 How can ADS-B data contribute to advancements in aviation research or
ATM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Contributions of this Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 2: ADS-B Data Collection System 16
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Raspberry Pi-Based Data Collection Platform . . . . . . . . . . . . . . . . . . . 29
2.4 Decoding Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4.1 Fundamentals of ADS-B Broadcast . . . . . . . . . . . . . . . . . . . . 33
2.4.2 Timestamp Debuffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Message Amalgamation for In-Decoder Debuffering . . . . . . . . . . . 40
2.4.4 Validation Approaches for Debuffering Performances . . . . . . . . . . . 43

2.5 Unreliable Flight Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.6 Operation Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

v


Chapter 3: Operational Metric Extraction 66
3.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 Arrival Operational Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.2.1 Approach Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.2 Arrival Runway Occupancy Time (AROT) . . . . . . . . . . . . . . . . . 79
3.2.3 Exit Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.4 Landing Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.2.5 Arrival Deceleration & Arrival Buffer . . . . . . . . . . . . . . . . . . . 83

3.3 Departure Operational Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.1 Departure Runway Occupancy Time (DROT) . . . . . . . . . . . . . . . 84
3.3.2 Departure Cruise Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.3 Takeoff Speed & Departure Hold Buffer . . . . . . . . . . . . . . . . . . 89

Chapter 4: Applications of the Collected ADS-B Data 93
4.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2 A Comparative Analysis of Flights with Different Aircraft Types and Missions . . 96

4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3 Identifying the Relationship between Air Density Altitude and Speed Metrics . . 104
4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.4 Time Series Forecasting of Next-Day Air Traffic Volume with Multi-Scale
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Chapter 5: Summary, Conclusions, and Limitations 133
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Appendix A: Operational Metrics Results 143
A.1 Results of Average Approach Speed . . . . . . . . . . . . . . . . . . . . . . . . 145
A.2 Results of Arrival Runway Occupancy Time . . . . . . . . . . . . . . . . . . . . 146
A.3 Results of Exit Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
A.4 Results of Landing Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.5 Results of Arrival Deceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.6 Results of Departure Runway Occupancy Time . . . . . . . . . . . . . . . . . . 150
A.7 Results of Cruise Departure Speed . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.8 Results of Takeoff Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
A.9 Results of Departure Hold Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 153

vi


Bibliography 154

vii


List of Tables

2.1 Flights observed at target airports . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 ADS-B message type code and broadcast frequency [1]. . . . . . . . . . . . . . 34
2.3 Improvement in drift metric after debuffering. . . . . . . . . . . . . . . . . . . . 49

4.1 ROT and phase division results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 Regression results for squared takeoff speed and density altitude. . . . . . . . . . 109
4.3 Regression results for squared touchdown speed and density altitude. . . . . . . . 112
4.4 Performance comparison across models. . . . . . . . . . . . . . . . . . . . . . . 132

A.1 Average approach speed data at KAPA and KGFK. . . . . . . . . . . . . . . . . 145
A.2 Average arrival runway occupancy time at KAPA and KGFK. . . . . . . . . . . . 146
A.3 Average exit speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . . . 147
A.4 Average landing speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . 148
A.5 Average deceleration at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . . 149
A.6 Average departure runway occupancy time at KAPA and KGFK. . . . . . . . . . 150
A.7 Average departure cruise speed at KAPA and KGFK. . . . . . . . . . . . . . . . 151
A.8 Average takeoff speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . 152
A.9 Average departure hold buffer at KAPA and KGFK. . . . . . . . . . . . . . . . . 153

viii


List of Figures

2.1 Examples of GA airports with ADS-B receiving systems. . . . . . . . . . . . . . 21
2.2 Real-time mapping tool interface. . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Concept map of ADS-B data collection system. . . . . . . . . . . . . . . . . . . 28
2.4 Simplified Data pipeline in EC2. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 ADS-B data receiving unit and its components. . . . . . . . . . . . . . . . . . . 31
2.6 Concept Map of ADS-B Reception. . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Typical inter-arrival times of buffered messages. . . . . . . . . . . . . . . . . . 36
2.8 Illustration of debuffering algorithm. . . . . . . . . . . . . . . . . . . . . . . . 38
2.9 Matching process for merging complete records. . . . . . . . . . . . . . . . . . 41
2.10 Data retention comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.11 Comparison of buffered and debuffered data using integration. . . . . . . . . . . 44
2.12 Correlation coefficients between implicit and buffered/debuffered time intervals

for flights from KFRG and KOSU. . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.13 An example of AGL correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.14 Three types of unreliable trajectories. . . . . . . . . . . . . . . . . . . . . . . . . 54
2.15 An example of touch-and-go flight trajectory. . . . . . . . . . . . . . . . . . . . 56
2.16 Comparison between original trajectory and smoothed trajectory. . . . . . . . . 60
2.17 A smoothed trajectory with identified peaks and troughs. . . . . . . . . . . . . . 62
2.18 An example of ”takeoff point” and ”landing point”. . . . . . . . . . . . . . . . . 64

3.1 A landing flight plotted by the trajectory plotting application. . . . . . . . . . . 71
3.2 An example approach bounding box for runway 17R at KGFK. . . . . . . . . . . 74
3.3 The concept map for operational benchmark interpolation. . . . . . . . . . . . . 75
3.4 The arrival bounding box for runway 17R and 35L at KGFK. . . . . . . . . . . . 80
3.5 Identification of entry and exit point for an arrival flight at KGFK. . . . . . . . . 81
3.6 Arrival transitional points at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . 84
3.7 Departure geofence for runway 17R at KGFK. . . . . . . . . . . . . . . . . . . 86
3.8 Concept map for function make SRS bounding box. . . . . . . . . . . . . . 87
3.9 An example of takeoff roll identification. . . . . . . . . . . . . . . . . . . . . . 90
3.10 Departure transitional points at KGFK. . . . . . . . . . . . . . . . . . . . . . . 91

4.1 4500 ft geofence for runway 17R. . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Illustration of the cross point on the hold bar. . . . . . . . . . . . . . . . . . . . 100
4.3 Three takeoff styles at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

ix


4.4 Runway layout of KAPA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Examples of linear regression of squared takeoff speed and density altitude. . . . 111
4.6 Examples of linear regression of squared touchdown speed and density altitude. . 113
4.7 Illustration of single-scale and multi-scale sliding window. . . . . . . . . . . . . 119
4.8 LSTM Cell Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.9 Architectures of LSTM-based models. . . . . . . . . . . . . . . . . . . . . . . . 125
4.10 Daily air traffic volume at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.11 SARIMA predictions on test dataset. . . . . . . . . . . . . . . . . . . . . . . . . 127
4.12 XGBoost predicted and true air traffic volume. . . . . . . . . . . . . . . . . . . . 128
4.13 XGBoost predicted and true air traffic volume. . . . . . . . . . . . . . . . . . . . 129
4.14 Predictions of single-scale LSTM-based model and multi-scale LSTM-based model.131

x


List of Abbreviations

1090ES 1090MHz Extended Squitter

ACF Autocorrelation Function
ACM Airport Capacity Models
ACM2 Second Generation Airport Capacity Model
ADS-B Automatic Dependent Surveillance–Broadcast
AGL Altitude Above Ground Level
AI Artificial Intelligence
AR Autoregressive
ARIMA Autoregressive Integrated Moving Average
AROT Arrival Runway Occupancy Time
ARTAS Air Traffic Management Surveillance Tracker and Server
ASDE-X Airport Surface Detection Equipment, Model X
ASPM Aviation System Performance Metrics
ATC Air Traffic Controllers
ATM Air Traffic Management
AWS Amazon Web Services

BER Brandenburg International Airport
BOS Boston Logan International Airport

CPR Compact Position Reporting

DELAYS Dynamic Estimation of Landing Aircraft in the Terminal Area System
DFW Dallas/Fort Worth Airport
DROT Departure Runway Occupancy Time

FAA Federal Aviation Administration
FIFO First-In, First-Out

GA General Aviation
GBDT Gradient Boosted Decision Tree
GNSS Global Navigation Satellite System
GPS Global Positioning System

xi


ILS Instrument Landing System
IMC Instrumental Meteorological Conditions
IoT Internet of Things

JFK John F. Kennedy International Airport

KAPA Centennial Airport
KCGS College Park Airport
KFRG Republic Airport
KGFK Grand Forks International Airport
KMNN Marion Municipal Airport
KN51 Solberg Hunterdon Airport
KOSU Ohio State University Airport

LMI Lincoln Laboratory MIT
LOS Levels of Service
LSTM Long Short-Term Memory

MAE Mean Absolute Error
METAR Meteorological Aerodrome Reports
MLP Multi-Layer Perceptron
MSSW-LSTM Multi-Scale Sliding Window LSTM Framework

NextGen Next Generation Air Transportation System
NLP Natural Language Processing

OOOI Out, Off, On, and In

PACF Partial Autocorrelation Function
PPM Pulse-Position Modulation

QAR Quick Access Recorder
QFE Field Elevation Pressure

REDIM Runway Exit Interactive Design Model
RMSE Root Mean Square Error
RNN Recurrent Neural Networks
ROC Rate of Climb
ROT Runway Occupancy Time

xii


SARIMA Seasonal Autoregressive Integrated Moving Average
SDR Software-Defined Radio
SRS Same Runway Separation
STARS Standard Terminal Automation Replacement System
SWIM System Wide Information Management

UAT Universal Access Transceiver
UND University of North Dakota

VFR Visual Flight Rules
VMC Visual Meteorological Condition

WTS Wake Turbulence Separation

XGBoost eXtreme Gradient Boosting

xiii


Chapter 1: Introduction

1.1 Background and Motivation

With global economic growth, airports worldwide have experienced a substantial increase

in civil air travel demand. In 2019, airports accommodated a record 4.5 billion passengers [2], and

by 2024, this number had grown by an additional 171 million, despite the lingering effects of the

COVID-19 pandemic [3]. While much of the attention has been focused on managing air traffic

at large commercial hubs, this upward trend also has implications for General Aviation (GA)

airports. GA airports are typically untowered, operate primarily under Visual Flight Rules (VFR),

and serve a highly diverse range of aircraft types, pilot experience levels, and flight purposes.

Unlike commercial airports, GA facilities support a substantial volume of flight training activities

and unscheduled transient operations, resulting in highly dynamic and less predictable traffic

patterns. Despite this operational complexity, many GA airports lack access to reliable data on

traffic volume, aircraft mix, and operational behavior, making it difficult to support effective

planning, secure funding, and maintain safety.

Combined with the rapid growth in air traffic, the lack of data-driven air traffic management

became a major contributor to a 4.6% increase in delays across Europe in 2006, exceeding

expectations [4]. These delays can lead to financial setbacks, affecting both airlines and

airport operations [5]. General Aviation (GA) airports, in particular, face even greater financial

1


challenges compared to larger commercial airports. With fewer resources available to expand

capacity or implement delay-mitigation strategies, delays at GA airports can become even more

severe, further straining operations and service quality.

Improving airside efficiency is a key strategy for reducing air travel delays. Two critical

factors influencing airside efficiency are runway configuration and air traffic management (ATM),

both of which highlight the increasing need for federal funding to support runway improvements

and the adoption of more advanced ATM technologies. However, ATM systems at small airports

still heavily rely on human resources to manage airspace sectors, making operations more labor-

intensive. Air traffic controllers (ATC) must consider factors such as conflict risk, conflict

resolution, and limited operational resources, further complicating their tasks [6].

To address these challenges, the Next Generation Air Transportation System (NextGen) has

been developed to modernize air traffic management and improve operational efficiency. A key

component of NextGen is Automatic Dependent Surveillance–Broadcast (ADS-B), a satellite-

based surveillance technology designed to replace (or at least supplement in a significant way)

traditional radar. The technical details of this system will be discussed in Chapter 2. Following

the Federal Aviation Administration (FAA) mandate, a large proportion of aircraft have been

equipped with ADS-B transponders [7].

ADS-B enhances communication between pilots and air traffic controllers, making it a

critical component of modern ATC systems. In addition, its open communication protocol allows

ADS-B signals to be received by any party with a compatible receiver, enabling researchers to

collect and analyze flight data from target airports, the detailed explanation of ADS-B mechanism

and mandate is provided in Chapter 2. However, despite its advantages, ADS-B-based data

collection has not been widely adopted at GA airports. Many of these facilities lack the

2


infrastructure required to support continuous ADS-B surveillance or to integrate such data into

their operational workflows. As a result, this vacuum at small airports is increasingly being filled

by third-party services that rely on commercial-grade ADS-B receivers, such as Virtower and

1200.aero, although the data quality and density of the data provided by these services are not

always guaranteed.

Furthermore, federal funding and research efforts have been primarily directed toward large

regional airports, as they handle higher traffic volumes. As a result, existing Airport Capacity

Models (ACM) are developed using data from large airports, failing to account for the unique

operational characteristics of small airports. This discrepancy in capacity modeling makes it

challenging for GA airports to justify their needs when applying for federal funding, further

hindering their ability to improve infrastructure and operational efficiency.

Therefore, it is crucial to establish a dedicated data collection system for small airports

and to integrate these data into the development of next-generation ACM. A more accurate

and representative model would enable improved resource allocation, support federal funding

applications, and enhance ATM at small airports. Beyond supporting capacity modeling, the

collected data would also provide GA airports with a deeper understanding of local flight

operation patterns and the behaviors of flights serving various purposes. This knowledge

can inform a wide range of airport functions, including runway and pavement utilization, air

traffic flow forecasting, revenue modeling (e.g. accurate landing fee assessment) and long-

term infrastructure planning. In doing so, it addresses both strategic and operational challenges,

ultimately contributing to more data-informed and resilient airport systems.

3


1.2 Problem Statement

Among those most impacted by the surge in air travel demand are GA airports, which face

unique and growing challenges. Unlike larger regional airports, many GA airports operate with

less developed infrastructure, including runway facilities and ATC systems, making it difficult

to accommodate increasing traffic. As a result, they frequently experience capacity constraints,

delays, and congestion. According to data collected from eight European airports, a previous

study reported that between 8.1% and 24.1% of flights were delayed [8]. Additionally, since many

GA airports host flight schools, they must manage a diverse mix of flight operations, including

training flights and regular operations, all while operating with limited runway configurations

and tight budgets. These challenges highlight the urgent need for accurate capacity assessment

and effective airside management to optimize operations and minimize delays. Moreover,

reliable capacity estimates are essential for securing federal funding to support infrastructure

improvements and future expansion.

Compared to larger regional airports, GA airports also face significant challenges in data

collection for capacity prediction. Installing and maintaining ADS-B systems similar to those

used at regional airports requires substantial financial investment, which can be a significant

burden for GA airports already operating with limited funding. Additionally, accurate capacity

estimation relies on the extraction of operational parameters, yet the diverse mix of flight types

and high sensitivity to weather conditions lead to highly variable daily operational patterns. This

variability complicates the derivation of consistent and reliable parameters for capacity modeling.

Furthermore, many GA airports lack ATC towers and automated tracking systems, making

it difficult to accurately count, track, and archive flight trajectory data. This absence of reliable air

4


traffic volume measurement further complicates efforts to model airport capacity and understand

the local flight behaviors effectively. Adding to these complexities, GA airports with flight

schools face an additional layer of operational variability. Training flights exhibit unique

behavioral patterns, such as repeated circuits, touch-and-go landings, and continuous flight

maneuvers, which are rarely observed in commercial operations. However, existing capacity

modeling methods are primarily designed for commercial flights, making them poorly suited to

account for the distinct characteristics of training operations.

Although data collection can be outsourced to aviation data vendors, data quality cannot be

guaranteed. One major issue is data sparsity—datasets provided by large-scale ADS-B collection

networks often have sampling intervals exceeding 20 seconds, whereas the true broadcast

frequency is approximately 2Hz. This discrepancy creates significant data gaps, limiting the

accuracy of trajectory reconstruction using rate-of-climb calculations and limiting operational

analysis, particularly in the early stages of flight operations.

Another key concern is data buffering. Due to bandwidth limitations in inexpensive

receiving equipment, flight messages are often buffered, and they are thus assigned delayed

timestamps, affecting the precision of aviation studies that rely on accurate time duration

measurements. Furthermore, buffered data is often overlooked in widely used ADS-B datasets

due to large data intervals, further compromising the reliability of operational studies at GA

airports.

This project addresses the challenges of collecting ADS-B data at GA airports by

leveraging widely used ADS-B technology, thereby providing essential data for the development

of an improved ACM. The proposed data collection scheme will not only capture the details of

each trajectory but also generate refined statistics for key operational metrics. These metrics

5


and statistics will serve as critical inputs for the next-generation of ACM, which will offer a more

comprehensive understanding of conditions at small airports. Furthermore, this study will explore

the broader applications of ADS-B data at GA airports, shedding light on ATM challenges and

contributing to the advancement of aviation research.

1.3 Research Questions and Approaches

This dissertation applies data science and data engineering techniques to address the

challenges associated with collecting and utilizing ADS-B data at small airports. By leveraging

the data provided by the proposed system, this research explores key aspects of ATM and its

broader implications for aviation research.

This study focuses on three interconnected research questions:

• How can high-resolution ADS-B data be efficiently collected at GA airports at a low cost?

• How can operational metrics be extracted from the collected ADS-B data to support airport

capacity management?

• How can ADS-B data contribute to advancements in aviation research or ATM?

Each of these questions is systematically examined throughout this dissertation. As

new challenges emerge, the research framework is adapted to incorporate detailed subsidiary

questions and refined methodologies, ensuring a comprehensive approach to problem-solving.

6


1.3.1 How can high-resolution ADS-B data be efficiently collected at GA

airports?

1.3.1.1 How can ADS-B data be efficiently and cost-effectively collected at

GA airports?

The ADS-B data collection mechanisms used at large airports cannot be directly applied

to GA airports due to several key challenges, including financial constraints, operational

characteristics, and differing data requirements. Additionally, many GA airports are untowered,

making it difficult to establish a continuous, centralized data collection system. As a result,

although ADS-B transponders are installed on most aircraft, the vast amount of available ADS-B

information remains underutilized.

To address this issue, under the guidance of the FAA, this study proposes an automated

data collection framework deployed on the Amazon Web Services (AWS) platform. This

system utilizes affordable hardware, including data receiving, amplifying, filtering, decoding,

storage, and transmission modules, specifically designed to accommodate the unique operational

characteristics of GA airports. By installing these customized hardware sets at target airports, the

system will enhance signal reception, ensuring a strong and reliable ADS-B data foundation for

subsequent processing.

7


1.3.1.2 How can the reliability of collected ADS-B timestamps be ensured,

and if unreliable, how can they be improved?

ADS-B data contain critical information for ATM and operational studies, where time-

related accuracy is essential. Therefore, it is necessary to evaluate the reliability of the timestamps

attached to ADS-B messages. Unlike positional and velocity data, which are directly decoded

from raw ADS-B signals, the timestamps are assigned locally by the receiving devices. As a

result, timestamp inconsistencies may arise due to hardware limitations and data buffering effects.

To validate timestamp accuracy, a cross-validation method leveraging the kinematics of

ADS-B messages is proposed. A known issue, referred to as “data buffering”, causes bias in

the intervals between ADS-B data points, disrupting time-sensitive analyses. However, since

different types of ADS-B messages are broadcast at predefined frequencies, their expected

transmission intervals can be used to identify and ameliorate time distortions.

This correction process, named ”debuffering”, systematically adjusts timestamp

inconsistencies based on the transponder’s broadcast mechanism, significantly improving the

temporal accuracy of collected ADS-B data. The debuffering technique will be integrated into

the entire processing workflow, ensuring more precise timing information for ATM research and

operational applications.

8


1.3.2 How can operational metrics be extracted from the collected ADS-B data

to support airport capacity modeling?

1.3.2.1 How to extract arrival operational metrics?

The arrival operational metrics in this study are derived from landing flights and will be

integrated into the newer version of the ACM, bringing GA airports into focus. The estimated

capacity will aid GA airports in securing federal funding, while also assisting airport managers in

balancing incoming flights with limited airside resources. Since previous studies and regulations

lack clear definitions of operational parameters, this dissertation provides detailed specifications

for benchmark determination, ensuring that these benchmarks serve as the foundation for

generating accurate metrics.

These benchmarks are established using a bounding box framework. The runway

environment bounding boxes are defined based on runway coordinates, while departure bounding

boxes are constructed following FAA regulations. This approach ensures consistency with GA

airport operations, aligning the methodology with real-world practices. The methodology and

implementation details of these bounding boxes are discussed in Chapter 3.

1.3.2.2 How to extract departure operational metrics?

Similar to the extraction of arrival metrics, departure metrics are computed based on a

series of departure benchmarks, including entry points, takeoff positions, and exit points. These

benchmarks divide the takeoff operation into distinct phases, providing a structured approach to

assessing operational efficiency.

9


While arrival bounding boxes are determined solely by runway configuration, according

to the FAA’s mandate [9], departure bounding boxes must also account for the performance

category of departing aircraft, which influences flight partition distances. To address this

variability, bounding boxes of varying lengths are automatically generated along the runways

using algorithmic methods, ensuring that the benchmarks accurately reflect the operational

characteristics of different aircraft types.

1.3.3 How can ADS-B data contribute to advancements in aviation research or

ATM?

1.3.3.1 How could ADS-B data reveal the differences between flights with

different missions and performance categories?

Many GA airports serve as flight training hubs, with Grand Forks International Airport

(KGFK) being a prime example, as it hosts the University of North Dakota’s Aerospace Program.

As a result, GA airports must manage a diverse mix of aircraft types and flight missions, including

both training and operational flights. The level of familiarity with the aircraft varies significantly

between student pilots and experienced pilots, impacting aircraft handling and overall efficiency.

Additionally, performance categories play a crucial role in runway operations, as differences in

operating procedures and engine performance directly affect aircraft behavior during takeoff and

landing.

The interactions between different aircraft types and missions influence the efficiency of

airport operations. To better understand these dynamics, this study divides the takeoff procedure

10


into two distinct phases, each capturing the impact of different factors on runway occupancy

time (ROT). By analyzing these phases separately, this research aims to provide a clearer

understanding of how various operational factors influence airport capacity.

1.3.3.2 How could ADS-B data quantify the relationship between density

altitude and speed benchmarks?

Given the standard lift equation, where lift force is positively correlated with the square

of velocity and air density while other variables remain constant, it can be deduced that density

altitude is positively correlated with the squared velocity under a constant lift condition [10].

Consequently, it is a well-established principle in aviation research that higher density altitude

necessitates a higher velocity for aircraft to take off or land. However, the precise quantitative

relationship between these variables remains undetermined. Since velocity directly impacts

runway distance requirements for acceleration and deceleration, ROT, and fuel consumption,

understanding this relationship is crucial for assessing flight behavior variations across airports

at different elevations and their subsequent impact on ATM.

This study employs a regression-based approach to validate the linear relationship between

squared velocity and density altitude. Additionally, hypothesis testing is conducted to assess the

significance of density altitude in explaining speed variations, providing valuable insights into

aviation dynamics and operational efficiency at different airports.

11


1.3.3.3 How could ADS-B facilitate the resource allocation at GA airports?

Resource allocation at small airports is typically planned a day in advance. Key factors such

as weather conditions and expected traffic are assessed the day before to minimize disruptions

and optimize operational efficiency. Since real-time adjustments are challenging due to limited

personnel and resources, pre-planning ensures smooth airport operations. To effectively allocate

ground crew, tow tractors, and fuel, accurate predictions of future air traffic demand are essential.

Therefore, this study aims to develop a method to forecast next-day air traffic volumes, using

KGFK as a case study.

This research considers weather data and historical air traffic volume as key inputs for

forecasting. Weather patterns help estimate expected conditions for the target day, while historical

air traffic data capture seasonal demand variations. To achieve this, multiple machine learning

and deep learning approaches are applied using datasets with varying time spans and time units.

This methodological design ensures a scientific comparison of dataset effectiveness and model

performance, providing a robust framework for next-day air traffic estimation.

1.4 Contributions of this Research

There are thousands of GA airports in the U.S., yet due to limited research attention and

insufficient federal funding, many of these airports struggle to equip and maintain effective

data collection systems. This limitation hinders data-driven ATM at GA airports and creates

challenges in securing federal funding for infrastructure improvements. Additionally, this lack

of data accessibility poses difficulties for research groups seeking high-quality aviation data

from small airports. Although widely used aviation networks exist, their data quality is often

12


compromised due to data sparsity and potential buffering issues. To address these challenges,

this study proposes a comprehensive data collection system that integrates data acquisition,

processing, analysis, and archiving. This system benefits both GA airport ATC operations and

researchers utilizing small-airport data for aviation studies by providing high-resolution ADS-

B data, data augmentation techniques, operational metric extraction, and novel applications of

ADS-B data.

The high-resolution ADS-B data collection system introduced in this work consists of

cost-effective hardware installed at GA airports, enabling continuous data flow with minimal

disruption by strategically placing receivers near runways. This approach is particularly

beneficial for studies focusing on early-stage departure operations, where significant variations

in track and speed occur. Without dense data collection, critical change points may be lost, which

could compromise the accuracy of operational performance analysis.

To further improve data quality, this study implements an ADS-B data augmentation

process that corrects timestamp inaccuracies caused by buffering effects during data transmission.

The debuffering process, embedded in the decoding phase, systematically adjusts unrealistically

small data intervals, restoring timestamps to their most accurate values. The effectiveness

of debuffering is tested through multiple validation approaches, proving that debuffered data

significantly enhances data reliability. Notably, this data augmentation method can be applied

not only to the ADS-B data collected in this study but also to other aviation networks using

similar hardware configurations, eliminating the need for costly hardware upgrades. Improved

data resolution benefits various aviation studies, including altitude trajectory reconstruction using

rate of climb (ROC) and research on ROT, ensuring more precise operational analyses.

With the enhanced dataset, this study extracts comprehensive operational metrics from

13


ADS-B data collected at GA airports. Unlike conventional datasets, which struggle to capture

high-variance operational behaviors, this system continuously records flight movements, offering

a more detailed and holistic representation of airport operations. Additionally, since previous

studies lack standardized definitions of key operational benchmarks, this work establishes clear

criteria for determining these benchmarks, ensuring accurate and reliable metric extraction.

These metrics are then integrated into the next-generation ACM model to estimate airport

capacity, assisting GA airports in optimizing resource allocation and strengthening their case

for federal funding.

Furthermore, this study explores new applications of ADS-B data, such as comparing

flight behavior across different missions and performance categories, quantifying the relationship

between density altitude and speed benchmarks, and using time series forecasting models to

predict next-day air traffic volume. These applications expand the research potential of ADS-B

data and improve understanding of how environmental and operational factors influence airport

performance. Additionally, they provide practical insights for small-airport ATM, helping predict

future travel demand and assess the impact of key operational factors.

By introducing an affordable, high-resolution ADS-B data collection system, refining data

quality through augmentation techniques, and extracting reliable operational metrics, this study

contributes to both aviation research and small-airport ATM, offering valuable insights into

airport capacity modeling, operational efficiency, and data-driven decision-making.

14


1.5 Organization

The remainder of this work is structured as follows. Chapter 2 presents a detailed

description of the data collection system, with a focus on the data pipeline, which encompasses

data preprocessing, labeling, analysis, and archiving. Following an overview that outlines the

system’s framework and highlights the purpose of each component, the chapter provides a

thorough explanation of each module within the system. Chapter 3 details the methodology

for calculating departure and arrival operational metrics, explaining the underlying procedures

and their significance in capacity estimation. Chapter 4 explores three key applications of

the proposed system, demonstrating its practical value in aviation research and air traffic

management. Chapter 5 synthesizes the key findings from the previous chapters and provides

concluding remarks on the contributions and implications of this study. Additionally, a literature

review is incorporated at the beginning of each chapter to provide the necessary background,

helping readers develop a clearer understanding of current research conditions and existing gaps

in the field.

15


Chapter 2: ADS-B Data Collection System

2.1 Literature Review

As one of the core components of the NextGen transportation system, ADS-B has been

widely applied across all types of aircraft. The development of ADS-B technology has been

largely driven by FAA’s priority to enhance aviation safety and reduce air accidents. By enabling

real-time communication of aircraft identification, position, and kinematic data between ground

receivers and airborne systems, ADS-B has successfully enhanced the safety of air travel and

modernized the ATM process [11], resulting in a 12% reduction in the fatal accident rate from

2009 to 2015 [12]. The ICAO has suggested that Mode S 1090MHz Extended Squitter (1090ES)

receiving stations should be deployed in the Asian-Pacific region, America, and Australia [13],

while a small proportion of 978 MHz Universal Access Transceiver (UAT) receivers remain

operational. Chen et al. pointed out that while 1090ES follows an international standard, it

also leads to channel congestion, whereas UAT can process more data but lacks widespread

adoption [14]. The dual-frequency usage is based on mean sea level aircraft operations to comply

with FAA mandates [15]. The widespread adoption of ADS-B technology provides a constant

and abundant data flow for airports, researchers, and aviation enthusiasts.

The fidelity of ADS-B data has been extensively studied. Ali et al. [16] assessed the

integrity of positional data within ADS-B messages, noting that certain Global Navigation

16


Satellite System (GNSS) receivers onboard aircraft can exhibit anomalies, although the

corresponding ADS-B data remain unaffected. The primary sources of ADS-B errors include

missing altitude data, duplicated messages, and position jumps. Zhang et al. [13] compared

primary radar data and ADS-B data with high-accuracy position data as the ground truth, finding

that ADS-B data exhibits more small error bursts, whereas large error bursts are significantly

fewer, indicating that ADS-B outperforms radar data in accuracy.

ADS-B technology consists of ADS-B In and ADS-B Out functionalities. ADS-B In

receives and decodes external messages broadcast by ground stations or nearby aircraft, while

ADS-B Out transmits encoded messages containing real-time flight information. Leveraging

these two functions, airports have developed large-scale ATC systems to enhance surveillance and

efficiency. One such system is Airport Surface Detection Equipment, Model X (ASDE-X), which

has been deployed at major airports across the United States, including Baltimore/Washington

International Thurgood Marshall Airport, LaGuardia Airport, and Los Angeles International

Airport. Integrating ground surveillance radar systems, ADS-B signals, and additional data

sources, ASDE-X has been proven to enhance runway safety by detecting potential runway

conflicts [17]. Using ASDE-X data from John F. Kennedy International Airport (JFK), Bhadra et

al. measured surface traffic efficiency based on the time flights spent in runway queues, proposing

a virtual queue concept that reduces physical queuing and optimizes financial and environmental

outcomes [18]. Srivastava also used ASDE-X data to analyze queue durations before takeoff at

JFK, later proposing a taxi-out duration prediction model that utilized timestamps, callsigns, and

kinematics, all of which can be obtained from ADS-B. Additionally, Hotle et al. used ASDE-X

data to identify taxi-out benchmarks and predict departure delays [19].

Other systems, such as Standard Terminal Automation Replacement System (STARS) and

17


Air Traffic Management Surveillance Tracker and Server (ARTAS), are designed to assist ATC

through advanced air traffic surveillance and management [20] [21]. Although these systems

are effective at major U.S. and European airports, they are not warranted at GA airports due

to the lack of operation complexity and level of traffic, budget constraints, and infrastructure

limitations. Many GA airports lack ATC facilities, real-time surveillance needs, and centralized

data processing capabilities, making large-scale ADS-B ground station installations impractical.

As a result, ADS-B data collection at small airports is often managed by third-party networks such

as OpenSky, FlightAware, and Flightradar24. These platforms provide flight tracking services

primarily focused on en route flight monitoring, relying largely on hobbyist-built receivers.

Several studies have leveraged third-party ADS-B networks for aviation research. Sun et

al. [22] analyzed a year’s worth of ADS-B data from OpenSky to assess ADS-B equipage trends,

fleet management strategies, and system performance. Schultz et al. [23] used OpenSky ADS-

B data from Zurich Airport to extract airport performance metrics, such as taxi times, runway

occupancy times, and ground trajectories. Similarly, Patroumpas et al. [24] utilized FlightAware

ADS-B data from Europe to detect mobility events by analyzing flight trajectory change points,

introducing noise reduction techniques to enhance trajectory fidelity. However, not all third-

party networks provide complete airport surface coverage. Almost all of these services rely

on hobbyist-built receivers, leading to incomplete flight records and inconsistent data reception.

As a result, third-party ADS-B networks have historically struggled to support GA airports,

necessitating custom-built systems for detailed flight operations studies.

Beyond third-party data sources, alternative data collection methods have been explored

for GA airport surveillance. Some studies investigated acoustic [25] [26] and image-based [27]

sensor technologies for flight detection and tracking. However, ADS-B remains the most ATC-

18


compliant and widely accepted method. Piracci et al. [28] tested Software-Defined Radio

(SDR) prototypes for evaluating enhanced ADS-B receivers, proposing a multi-channel 1090ES

receiver capable of high-resolution data reception. Gui et al. [29] built 14 ground stations

with omnidirectional antennas, achieving ADS-B reception for air traffic within a 300-kilometer

radius. Mitkas and Lovell [30] developed an antenna array and portable Raspberry Pi-based

ADS-B receivers, successfully collecting high-resolution ADS-B data from six GA airports.

Their ADS-B receiving mechanism forms the foundation of this dissertation’s data collection

framework.

2.2 System Overview

In collaboration with the FAA, this project developed a cost-effective data collection system

capable of collecting, preprocessing, analyzing, and archiving real-time ADS-B data. Unlike the

aforementioned third-party networks, this system is specifically designed to focus on operations

conducted at or near target airports, with an emphasis on data fidelity and integrity. Using

high-resolution ADS-B reception devices, this system is capable of capturing near-airport flight

trajectories. These trajectories are then labeled with the corresponding operations during the

data pipeline, providing essential information for subsequent analyses such as flight performance

comparison and airport performance parameter estimations.

With proven reliability, the system has been installed and is currently operational at

seven airports: Centennial Airport (KAPA), Colorado; College Park Airport (KCGS), Maryland;

Republic Airport (KFRG), New York; Grand Forks International Airport (KGFK), North Dakota;

Marion Municipal Airport (KMNN), Ohio; Solberg Hunterdon Airport (KN51), New Jersey; and

19


The Ohio State University Airport (KOSU), Ohio. The airports in our network feature distinct

runway configurations. KGFK has two pairs of parallel runways, it accommodates approximately

300,000 annual flight operations, approximately 85% of which are training flights on small

general aviation aircraft operated by the University of North Dakota (UND). The runways at

KFRG intersect to form an X-shape; KOSU includes one pair of parallel runways, along with a

diagonal runway that intersects both. Ambulance helicopter operations can also be observed at

KOSU. KCGS, located in College Park, Maryland, has a single runway. While it does not have

an attached flight school, it does receive occasional traffic from police helicopters. Considering

the geographic diversity, KAPA stands out with an elevation of 5,883 ft, which is nearly twice

that of KGFK, highlighting the heterogeneity within the network.

Given the variations in geographical conditions and runway configurations, the system

ensures a diverse dataset, providing a robust foundation for subsequent analyses. Fig. 2.1 presents

examples of these airports equipped with ADS-B receiving devices, illustrating the range of

runway configurations captured in the database to support broad research applications.

The data flow begins with the customized ADS-B receiver, a Raspberry Pi-based platform.

ADS-B signals at different frequencies are received by antennas and SDRs, and the Raspberry

Pi integrates these signals for subsequent data processing. The antenna serves as the front-

end hardware that captures radio signals transmitted by aircraft transponders. These signals are

weakened when they reach the receivers, so antennas are essential to ensure reliable reception,

especially in near-airport environments. Once the radio signals are captured, the SDR converts the

signals into digital form for processing. The SDRs are proved to be flexible and programmable,

making them ideal for this task. The collected data is then transmitted to AWS via the MQTT

Internet of Things (IoT) protocol for further processing.

20


(a) Runway configuration of KCGS. (b) Runway configuration of KFRG.

(c) Runway configuration of KGFK. (d) Runway configuration of KOSU.

Figure 2.1: Examples of GA airports with ADS-B receiving systems.

At the initial stage, raw data from different frequency channels are stored separately in

AWS, facilitating efficient data management in later steps. Subsequently, the system transfers

files from multiple channels into a centralized folder, which functions like the wide end of

a funnel; once data enter this folder, they automatically move down the processing pipeline,

triggering the following procedures.

The first stage of processing is the decoding algorithm. A commonly used and widely

21


recognized tool for decoding raw ADS-B messages is the pyModeS library in Python. This

package is specifically designed for decoding messages from Mode S transponders and is

therefore adopted in our project. Based on the type code at the beginning of each message, the

pyModeS algorithm identifies the type of information contained in the raw data and translates

ADS-B messages from binary into an ASCII-based format, which is more human-readable. ADS-

B transponders transmit nine different message types, though in practice, most raw messages

contain position or velocity data, with occasional identification or status messages. Consequently,

the decoded output consists primarily of position and velocity information. Because a complete

flight message requires both position and velocity, this step also includes a merging process that

combines messages of different types from the same flight based on the unique ICAO identifier.

With proven fidelity, the real-time position and velocity messages broadcast by ADS-B

transponders have significantly benefited air traffic management (ATM) at airports and enabled

a wide range of aviation studies. When combined with the FAA’s registered aircraft database,

aircraft specifications can be easily identified, further broadening the scope of potential research

applications. However, a challenge arises from the ”data buffering” phenomenon, caused by the

absence of precise temporal information in raw ADS-B messages. This issue compromises the

accuracy of time-sensitive data during transmission and complicates downstream processing and

analysis. To address this, a debuffering algorithm is applied prior to the merging process, ensuring

a more accurate and reliable dataset for subsequent use. Afterwards, the merging process is also

customized to accommodate the debuffered data, ensuring greater precision. Messages received

within the same batch are processed simultaneously, producing a single decoded dataframe where

each row contains complete position and velocity data for a specific flight at a given time point,

integrating messages from multiple flights. To balance data comprehensiveness with the project’s

22


requirements, the decoded file contains the following information for each data point: timestamp,

ICAO identification code, latitude, longitude, altitude, groundspeed (in nautical miles per hour),

track, ROC in feet per minute, and callsign. The decoded file is named based on the airport name,

transponder frequency, and reception datetime to maintain consistency in data organization.

The efforts of the decoding process and associated algorithms ultimately support the update

of the PostgreSQL database. This database functions as long-term storage for flight records and

fulfills historical flight query needs in research. A series of flight statistics, such as flight counts

and message counts, are directly stored in the database. The mapping between flight identifiers

(i.e., ICAO and callsign) and the corresponding aircraft specifications is also included.

More importantly, individual ADS-B messages are archived in the table adsb messages,

while the complete trajectories—along with the labels generated by the analysis algorithms

(introduced below)—are stored in the table flights. These two tables serve as the primary

data sources for the applications developed in this work.

To update the above-mentioned tables, once decoding is complete, the system checks for

any existing decoded dataframes in the designated directory and then initiates the process of

loading the decoded messages into the PostgreSQL database. This loading process is executed

by a Python-based algorithm that incorporates a series of customized functions for filtering and

merging flight data. In this process, and in alignment with the project’s focus, flight trajectories

are initially trimmed by discarding any data points where the aircraft’s coordinates are more than

7 nautical miles from the subject airport. The flight locations are extracted from the decoded file

names, and each flight coordinate is compared with the center coordinate of the target airport

to determine its proximity. Flights deviating more than 7 miles from the airport for durations

exceeding 600 seconds are treated as two distinct flight segments. Each segment falls within the

23


study area and captures all operations associated with the target airport. flight ids are unique

identifiers assigned to individual flights to make locating and studying specific flights easier, they

are assigned to individual flights at this step as well. Therefore, it is necessary to continue the

flights that are processed in two continuous batches. If flight data or an associated flight from the

flights table or relevant data points are found, the flight id is inherited from the existing

record, and the new flight segments are appended to the first half of the flight, with corresponding

information in the flights table updated accordingly. If no match is identified, a new unique

flight id is assigned to all data points in the flight.

Once the flight trajectories are reconstructed, the individual ADS-B messages are loaded

into the adsb messages table in the database for validation purposes and future research. The

decoded files, along with the raw ADS-B messages from different frequencies, are then archived

as backup files in an AWS S3 bucket for long-term storage. This ingestion process is triggered

whenever a previous task is completed, and new raw files are detected after a one-minute interval.

Since a single flight consists of dozens to thousands of individual messages, a comprehensive

flight analysis is conducted on an hourly basis, a process referred to as the ”hourly update”.

The data loading process primarily populates the adsb messages table, whereas the

hourly update focuses on updating the flights table, which serves as the primary source for

extracting flight details. The flights table contains complete reconstructed trajectories along

with a series of classification labels, facilitating flight filtering and categorization. The complete

flight trajectories are assembled from individual messages stored in adsb messages, where all

decoded messages with the same flight id and loaded within a two-hour window are merged

and stored as JSON strings. These JSON structures provide flexibility for further processing and

serve as a standardized format for database storage. An example of a decoded JSON string is

24


shown below:

{

"datetime": "1730759325.6904",

"lat": "39.57158",

"long": "-104.83833",

"trk": "2.8125",

"gs": "7.0",

"alt": "0.0",

"roc": "0.0"

}

To ensure data integrity, preliminary checks are applied, such as negative altitude

validation, which detects faulty data points, and on-ground verification, which determines

whether an aircraft has touched down on the runway while operating.

These JSON-formatted flight records form the foundation for all subsequent analyses.

Since unreliable ADS-B data can enter the system for various reasons (which will be elaborated

on in the following subsections), inaccurate flight trajectories can significantly compromise the

performance of downstream studies. To address this, a flight reliability filtering algorithm is

implemented to determine the validity of recorded flights. This filter consists of a series of rule-

based checks, each designed to identify and exclude a specific type of unreliable condition. Only

flights passing all validation rules are assigned a ”reliable” label in the flights table, while those

failing any check are flagged as ”unreliable”.

For the reliable flights, an operation identification model is applied using a rule-based

25


classification method. Since this work focuses on GA airports, which often host flight schools and

observe a large number of training operations, the model must go beyond identifying standard

operations such as takeoff, landing, and taxiing. It must also distinguish more complex training

behaviors composed of these basic actions. For example, a touch-and-go operation consists of a

sequence of a landing immediately followed by a takeoff, and a takeoff followed by a landing is

also treated as a distinct operation. Additionally, aborted landings, caused by runway obstacles

or other factors, result in low-approach operations, which must also be correctly identified.

To minimize the impact of high-frequency noise, Butterworth low-pass filters with varying

filter orders are applied based on the maximum altitude of the flight. The smoothed trajectories

allow for the identification of distinct flight maneuvers, where key operations are extracted by

detecting peaks and troughs in altitude changes. This method not only ensures high classification

accuracy but also enables the identification of multiple operations and change points within a

single flight. The detected operations are then assigned to the predicted op type column in

the flights table as integer values, representing different operation categories.

In addition to analyzing and labeling flight data, a series of real-time statistics are also

updated during this workflow. The flight class for each flight is determined using an ICAO-

to-weight class mapping list, and the count for each weight class is updated every hour.

Simultaneously, this process updates a real-time web dashboard via a DynamoDB table, enabling

continuous monitoring of air traffic across multiple GA airports. The system tracks the total

number of flights over the last 24 hours, categorizing them by airport location while recording

hourly flight counts, timestamps, ICAO identifiers, and other relevant flight attributes. Additional

metrics such as ground operations statistics, on-ground aircraft counts, and real-time ADS-

B message counts are also recorded and displayed on the dashboard, created by the previous

26


research team of this project. The interface of the mapping tool can be found in Fig. 2.2.

Figure 2.2: Real-time mapping tool interface.

The workflow of this ADS-B data collection system is depicted in Fig. 2.3, and the

simplified data pipeline in the cloud computing unit, the EC2 instance, is shown in Fig. 2.4,

demonstrating how this framework ensures maximum utilization of received ADS-B data,

providing a consistent and reliable data flow while efficiently managing AWS computation

resources. The algorithms and processes described above will be further detailed in the following

subsections.

As of March 14, 2025, at 2:37 PM, hundreds of millions of ADS-B messages have been

captured by our system, and over a million flights have been observed, decoded, filtered, labeled,

and archived in our database, excluding a small proportion of undecoded records, as shown in

Table 2.1.

27


Figure 2.3: Concept map of ADS-B data collection system.

Figure 2.4: Simplified Data pipeline in EC2.

28


Table 2.1: Flights observed at target airports

Target Airport Flight Count
Centennial Airport (KAPA) 235568

College Park Airport (KCGS) 86682
Republic Airport (KFRG) 32914

rand Forks International Airport (KGFK) 476671
Marion Municipal Airport (KMNN) 10562
Solberg-Hunterdon Airport (KN51) 289084

Ohio State University Airport (KOSU) 40655

2.3 Raspberry Pi-Based Data Collection Platform

Tasked by the FAA, this project aims to provide operational parameters for evaluating

airport capacity and performance at GA airports. To assess both current and future capacity,

detailed flight records must be analyzed to extract key operational metrics. Several existing data

sources were reviewed before proposing a new data collection system. The Aviation System

Performance Metrics (ASPM) database and the System Wide Information Management (SWIM)

system were examined, but neither provides comprehensive coverage of GA airport operations

[31] [32].

Since the required metrics in this study rely on breakdown of statistics by operation type,

implementing flight detection and counting mechanisms is essential. Previous attempts, aside

from ADS-B technology, have explored various alternative approaches, such as drones and

acoustic arrays [25] [26]. However, these methods have been limited by hardware constraints,

providing only limited operational insights. In contrast, ADS-B has proven to be a more

accessible and efficient alternative due to its high data transmission frequency and fidelity. Its

open communication protocol allows this project to collect and analyze flight data using cost-

effective, easily assembled hardware, which can be built with off-the-shelf components. This

29


approach significantly reduces both the cost and technical barriers associated with data collection

at GA airports, making high-resolution flight tracking more feasible for research and operational

improvements.

To achieve this, a Raspberry Pi-based data acquisition system was developed. Raspberry Pi

is a compact, low-cost computing platform that has been widely used across various applications,

including gaming stations and household automation [33] [34]. It offers several advantages,

such as proximity between the computation unit and the data receiving unit, which minimizes

signal noise and simplifies wiring. Additionally, Raspberry Pi’s high processing speed and data

reception rate make it well-suited for real-time data acquisition [35]. Running on a Linux-based

operating system, Raspberry Pi also provides flexibility in software configuration and efficient

text-based processing.

Each data receiving module in the system consists of an antenna, an amplifier, a bandpass

filter, and an SDR. ADS-B messages are transmitted using pulse-position modulation (PPM) at

fixed frequencies depending on the transponder type. These signals are captured by strategically

placed antennas with an unobstructed line of sight to optimize reception. The received messages

contain essential flight data, including GPS-derived coordinates and other kinematic information

obtained from onboard instruments. Each 112-bit ADS-B message includes a 24-bit ICAO

aircraft identifier for validation and a 56-bit segment containing flight status information.

To ensure optimal signal quality, an amplifier is used to extend the strength of the

receiver, followed by a bandpass filter that eliminates noise and interference from any part of

the spectrum away from the target band(s). The filtered signal is then processed by an SDR,

where demodulation occurs. Here, it is important to distinguish this signal decoding process

from the message decoding process described in the system overview section. In the SDR step,

30


the received ADS-B signals are converted from radio frequency waves into digital data. The

SDR extracts Mode S ADS-B transponder messages, parsing essential flight status information.

However, since ADS-B messages do not inherently include timestamps, Unix timestamps (also

known as Epoch timestamps) are assigned upon reception to provide temporal accuracy.

This timestamping process is widely employed by ADS-B receivers. However, due

to computational constraints, it can lead to data buffering—a phenomenon where messages

accumulate before processing, resulting in timestamp inaccuracies. A solution to mitigate this

issue will be introduced in the next section. Given that 77% of GA aircraft are equipped

with 1090 MHz ADS-B transponders, 22% use 978 MHz UAT, and 1% operate dual-frequency

transponders, each receiving unit is designed with two identical SDR-based systems, each

dedicated to a fixed frequency to ensure comprehensive signal capture. An example of a data

receiving unit is shown in Fig. 2.5a, while its hardware components are illustrated in Fig. 2.5b.

(a) ADS-B data receiving unit at KFRG. (b) Components of data receiving units.

Figure 2.5: ADS-B data receiving unit and its components.

Considering the limited bandwidth of Raspberry Pis, the decoding, analysis, and

visualization processes are conducted in AWS to efficiently handle the large and continuous data

flow. Piracci et al. [28] have demonstrated that this ADS-B signal collection system maintains a

31


stable performance even under heavy signal congestion and jamming. While distant signals may

experience interference, the on-site installation of receiver units minimizes these effects, ensuring

reliable data reception and processing. The overall conceptual diagram of ADS-B data reception

is illustrated in Fig. 2.6

Figure 2.6: Concept Map of ADS-B Reception.

32


2.4 Decoding Mechanism

2.4.1 Fundamentals of ADS-B Broadcast

Aircraft equipped with ADS-B “Out” technology continuously broadcast essential flight

information, including three-dimensional position data (latitude, longitude, and altitude),

telemetry (ground speed, track, and rate of climb), and identifying details such as the aircraft’s

unique ICAO address and, in some cases, its tail number or call sign. The frequency of these

transmissions depends on the message type, which is specified by the type code within the

message.

In this project, three primary message types are of particular interest: surface messages,

airborne position messages, and airborne velocity messages, all of which are transmitted at a

rate of 2 Hz (twice per second). Position messages encode three-dimensional location data using

the Compact Position Reporting (CPR) format, which enhances local accuracy and transmission

efficiency. CPR achieves this by encoding coordinates in two alternating odd and even grid

systems, allowing for precise global positioning through the convergence of these grids [1].

Velocity messages include key motion parameters such as ground speed (in knots), track, and

ROC in knots per second. A more comprehensive mapping between message types and type

codes is provided in Table 2.2.

Since an aircraft can only broadcast one type of message at a time, position and velocity

data are transmitted separately while in flight. However, when an aircraft is on the ground,

position and velocity information are consolidated and sent together in surface messages.

Regardless of the message type, aircraft in motion are expected to broadcast at 2 Hz, whereas

33


Table 2.2: ADS-B message type code and broadcast frequency [1].

Messages TC Ground (still) Ground (moving) Airborne

Aircraft identification 1–4 0.1 Hz 0.2 Hz 0.2 Hz

Surface position 5–8 0.2 Hz 2 Hz –

Airborne position 9–18, 20–22 – – 2 Hz

Airborne velocity 19 – – 2 Hz

Aircraft status 28
0.2 Hz if no alert or code change

1.25 Hz if alert or code changes

Target states and status 29 – – 0.8 Hz

Operational status 31 0.2 Hz
0.4 Hz if no change in integrity or accuracy

1.25 Hz if integrity or accuracy changes

stationary aircraft transmit at a reduced rate. The time intervals between consecutive messages of

the same type should not be significantly lower than 0.5 seconds under normal conditions. This

interval can fluctuate slightly due to random dithering, a mechanism designed to minimize the

risk of message collisions from multiple aircraft transmitting simultaneously [1].

As previously noted, raw ADS-B data, as broadcast by aircraft, do not contain any absolute

or relative timestamps. Since precise timing is essential for deriving kinematic properties such

as velocity and ROC, temporal information is added by the Raspberry Pi processors based on the

local system clock. This ensures that each message is timestamped at the moment it is received,

allowing for time-based analyses. Once all necessary metadata have been appended, the encoded

messages are transmitted to the AWS system, where they await further processing and decoding.

2.4.2 Timestamp Debuffering

The traditional decoding algorithm processes raw messages in multiple steps, beginning

with message type classification. In the first stage, position information from airborne position

34


messages and surface messages is extracted, along with the corresponding timestamps. Similarly,

velocity data is decoded from airborne velocity messages. To construct complete messages that

include timestamps, velocity, and position, several post-processing steps are applied: timestamps

are rounded to the nearest integer, duplicate messages of the same type with identical rounded

timestamps are removed, and finally, messages of different types are merged to form a complete

data point. This structured approach ensures that each decoded message contains all necessary

flight details while maintaining data consistency.

Theoretically, this decoding method should yield messages that closely match the ground

truth, assuming the raw messages and attached timestamps are accurate. In practice, however,

significant discrepancies are observed. Fig. 2.7 illustrates a typical message stream, showing the

inter-arrival times between consecutive decoded messages. The larger intervals predominantly

align with multiples of 0.5 seconds, with slight variations likely caused by a built-in dithering

mechanism intended to reduce the risk of message collisions. However, there are also

unrealistically short intervals—less than 0.5 seconds—that contradict the expected ADS-B

transmission behavior. The larger intervals suggest instances where intermediate messages were

lost, while the smaller ones indicate that timestamps may have been buffered before being

released.

Two primary factors contribute to erroneous timestamp intervals. First, the demodulation

and decoding process operates within a bandwidth-limited queue, meaning that messages

do not exit the system with the same timing intervals as they entered. Second, the local

processing unit (e.g., the Raspberry Pi) has limited computational capacity, leading to additional

buffering and potential delays in message processing. Furthermore, message collisions during

peak traffic periods, particularly at lower altitudes near the ground where aircraft are more

35


Figure 2.7: Typical inter-arrival times of buffered messages.

densely concentrated, further distort the received message stream and introduce additional

inconsistencies.

It should be noted that there are expensive hardware solutions to mitigate data buffering

issues. High-end demodulation hardware, where the decoding is accomplished in hardware via

a field-programmable gate array (FPGA) rather than in software, often includes the capability

to tag outgoing data with GPS timestamps. This ensures that the relative time intervals inferred

from such hardware closely match the actual intervals at which the radio frequency messages

were received, aligning with the expected transmission pattern. However, the methods discussed

in this work are intended for scenarios where only widely available, cost-effective hobbyist-level

equipment is used. Notably, this type of equipment forms the backbone of most crowd-sourced

ADS-B data repositories, making this an important application. Furthermore, a growing number

of small businesses are marketing ADS-B data products and analytical solutions to small airports,

and their hardware and software platforms may not be significantly more sophisticated than those

built with hobbyist-level technology.

36


To address the data buffering issue, a debuffering algorithm is developed in this project

to improve the accuracy of recorded timestamps by repositioning data points to their most likely

true temporal locations. This approach consists of two key components: data debuffering and data

amalgamation. The fundamental principles of the debuffering algorithm are straightforward:

• Any timestamp assigned to an ADS-B message must be later than the actual time at

which the message reached the receiver due to processing delays caused by demodulation,

decoding, and queueing.

• Any consecutive messages with timestamp differences of less than 0.5 seconds were not

actually transmitted at such short intervals; rather, their recorded timing reflects throughput

limitations of the decoding process rather than the actual transmission intervals.

Based on these principles, we propose shifting assigned timestamps backward in time using

a first-in, first-out (FIFO) queue that runs in reverse order. This queue is designed with a capacity

of two messages per second, aligning with the standard transmission rate of most ADS-B message

types. While this method does not guarantee perfect realignment with actual arrival times, our

hypothesis is that, on a larger scale, it produces timestamps that are significantly closer to the

truth than the original buffered data. The overall process is illustrated in Fig. 2.8.

In Fig. 2.8, the top timeline represents the original timestamps assigned to received ADS-B

messages, where several intervals between messages are less than 0.5 seconds. Moving from right

to left, these timestamps are adjusted through a FIFO queue with a capacity of two messages per

second, resulting in the refined timestamps shown in the bottom timeline. Let ti and si represent

the buffered and debuffered timestamps, respectively, of message i, then the FIFO queue operates

by processing messages in reverse chronological order and applying the following recursion:

37


Figure 2.8: Illustration of debuffering algorithm.

si = min{ti, si+1 − 0.5} (2.1)

The original decoder’s method for amalgamating messages implicitly assumes that

buffering has occurred, as it relies on rounded timestamps to merge position and velocity

messages. When duplicate information arises due to rounding, only the most recent data point

is retained. While effective for handling buffered data, this approach preserves the inaccuracies

introduced by buffered timestamps, and it also loses data when frames are amalgamated through

rounding that were originally distinct. A key challenge emerges when applying the debuffering

algorithm: once timestamps are corrected, the resulting data points are spaced at least 0.5 seconds

apart. This adjustment effectively shifts many original timestamps back by around 0.5 seconds.

However, since the previous rounding-based amalgamation process also altered timestamps, its

application becomes inconsistent with the fidelity required for this study.

To address this, improving timestamp accuracy through debuffering necessitates a revised

approach to message amalgamation at a higher resolution. Given that ADS-B messages with

different type codes are transmitted independently at varying frequencies, an additional step is

38


introduced—stratifying messages by type code before applying debuffering. This adjustment

ensures that the temporal alignment of messages is preserved according to their original

transmission characteristics. Consequently, two distinct debuffering algorithm variants have been

developed to optimize data integrity:

• Debuffer complete messages which are already amalgamated. This approach is particularly

useful for archival data where complete messages have been saved, but the original raw

ADS-B messages are no longer available. It benefits users whose data collection systems

lack structured archival storage, such as data warehouses or data lakes, or those who do not

have access to raw ADS-B messages.

• Stratify by message type before debuffering, then amalgamate. In this variant, raw ADS-

B messages are first classified by type code in the decoder, debuffered separately, and

then combined using a refined amalgamation process (described in the next section). This

method better reconstructs the original transmission timeline and is expected to yield more

accurate results when raw ADS-B messages are available.

These two approaches provide flexibility based on the available data and the specific

requirements of different users, ensuring that ADS-B timestamps can be corrected for various

applications with different scenarios.

The first 5 bits of the message section of an ADS-B message indicate the message type.

In particular, in order to properly decode position, from its CPR format, a pair of odd/even

position messages is required. It is notable that there are two types of position, i.e., surface

position and airborne position. For the surface position messages, the altitude and rate of climb

are defaulted to be 0. Surface position messages can also describe the movements and ground

39


track, which are also known as surface velocity. Then, the surface and airborne velocities are

decoded by the software. Unlike surface velocity, airborne velocity messages are separate from

airborne position messages. The rounding method of amalgamation relies on the premise that,

once message timestamps are rounded to integer seconds, there will be the necessary odd/even

position records, velocity records, etc., in the same group, to allow them to be merged to form a

complete record. There can be excess messages of various types encountered during this process,

and in this case older versions are dropped. Hence, this message has the added disadvantage that

data are being lost; data that would have had a higher chance of being retained if the timestamps

had been pushed back closer to the truth. When this rounding process is used and debuffer the

resulting complete records, it is called “post-decoder debuffering”.

2.4.3 Message Amalgamation for In-Decoder Debuffering

Another variant of the debuffering approach is named “in-decoder debuffering.” In this

method, each message type is debuffered separately before being decoded and merged into a

complete record. The process begins with debuffering and decoding surface and airborne position

messages, which serve as “anchors” on the debuffered timeline. These anchor points are then

used to attach any missing information, such as velocity data, to form complete records.

Since surface position messages already contain velocity information, their timestamps

remain unchanged even when debuffered separately. As a result, surface velocity and surface

position messages can be directly merged without any data loss. In contrast, for airborne velocity

and position messages, instead of rounding timestamps and discarding data, the algorithm

searches for the nearest airborne velocity message within a 0.5-second range for each position

40


record. Fig. 2.9 illustrates this process and its various scenarios.

Figure 2.9: Matching process for merging complete records.

In this figure, the blue circles denote the positions on the timeline where the debuffered

and merged position records fall, while the red circles represent velocity records that need to

be paired with a corresponding position record. Each position record is assigned to its nearest

velocity record, provided that the velocity record falls within the 0.5-second window. If no such

velocity record is found, only the position is retained, as position data are considered the most

critical component of the message stream. In cases where velocity data are missing, they can be

approximated using numerical differentiation of adjacent position records under the assumption

that speed does not change drastically over small time intervals. Any velocity records that do not

have an associated position record within the allowable range are ignored, likely due to message

garbling or missing data.

A key reason for prioritizing position records over velocity records, and in particular,

retaining their timestamps, is that an aircraft’s position changes constantly in flight, whereas

velocity tends to be more stable. Slight adjustments to the timing of velocity records are unlikely

to significantly affect accuracy. This principle is also followed in the rounding process used in

the post-decoder debuffering variant.

One significant advantage of the in-decoder approach is that it ensures the retention of all

41


position records, which might otherwise be lost due to rounding in post-decoder debuffering.

By preserving as many velocity records as possible and improving timestamp alignment, this

method generates a greater number of complete records. Fig. 2.10 demonstrates the extent to

which additional data are retained when using in-decoder debuffering.

Figure 2.10: Data retention comparison.

In this figure, the difference in height between the blue and orange bars represents data

loss for complete flights due to timestamp rounding in post-decoder debuffering. The green

and yellow bars show a similar comparison for post-takeoff messages, highlighting that data

loss is most significant for surface messages. Since debuffering effectively ”spreads out” surface

messages, data integrity improvements can be directly validated through altitude integration plots.

A more comprehensive quantitative validation is presented in the following subsection.

42


2.4.4 Validation Approaches for Debuffering Performances

Complete ADS-B messages contain both barometric altitude data, in units of feet, and

ROC, in units of feet per minute. At first glance, it might appear that there is redundancy here,

as the altitude data should be the integral of the rate of climb data. However, there are instances

where the rate of climb data in an aircraft’s message stream seem to be reasonable, while the

altitude data do not. In such cases, it would make sense to construct an alternate version of the

altitude data by integrating the rate of climb data. The complication is that one needs to know

what time intervals over which to perform the integration. One way to validate the debuffering

algorithm is to perform this rate of climb integration on flights whose altitude data is also reliable.

The time intervals can be inferred from the consecutive differences in timestamps. If debuffering

is doing a good job, then the debuffered timestamps should better replicate the altitude data (taken

to be the “truth”) than the originally buffered timestamps would.

Examples of applying this idea to some individual flights are shown in Fig. 2.11. With

the proof of better performance on over a hundred flights compared to the rectangular rule and

Simpson’s rule, the numerical integration is performed using the trapezoidal rule (because rates

of climb can change quickly), but similar results can be found using Simpson’s rules.

Fig. 2.11a shows a case of a single takeoff. The red line is the ground truth altitude data

reported in the ADS-B data stream. The yellow line shows the results of integrating rate of climb

to estimate altitude, based on the original buffered timestamps. Finally, the green line represents

doing the same thing but with debuffered timestamps, and it is clearly more representative of the

truth.

Fig. 2.11b highlights a single flight that takes off and climbs, leveling off at just over 1000

43


(a) Altitude accuracy improvement with debuffering.

(b) Altitude integration with buffered data.

(c) Altitude integration with debuffered data.

Figure 2.11: Comparison of buffered and debuffered data using integration.

44


ft, where it dwells for a short period, and then climbs again to just over 2000 ft. The integration

step is inaccurate at the beginning of this exercise, because the timestamps were buffered, and this

error remains for the rest of the trajectory. It is notable that in Fig. 2.11b, an incoherent section,

which is caused by buffered timestamps, exists at the beginning of the takeoff operation. The

same phenomenon can be found in many other takeoffs in KOSU, while the buffered timestamps

in landings tend to be uniformly distributed along the time axis. A possible reason is that the

layout of KOSU runways could lead to a higher signal density at the takeoff end, especially

the area below 1500ft near KOSU. Both the original trajectory and the integrated trajectory

are improved by applying debuffering, and thus a much better match is produced, as shown

in Fig. 2.11c.

For a more comprehensive test, 100 flights are selected to test this integrated altitude

comparison. Of these flights, 75 of them are single takeoffs, and the rest are touch-and-go flights.

The recorded altitudes in the received messages are deemed as the ground truth. Two cases are

reconstructed for each flight: first, reconstruct trajectories with ROC and the buffered timestamps,

and compare to the altitude data (assessed at those same buffered timestamps); second, perform

the same operation with the debuffered timestamps. The error metric in each case is the final

vertical distance (in units of feet) of altitude deviation between the true and the integrated altitude.

This measurement is called “drift” in this study because it represents the final extent by which

the two profiles have deviated.

For the single takeoff flights, the experiment is started at the point of takeoff. The touch-

and-go flights all start and end at 0 ft altitude above ground level (AGL). Altitude integration

is more challenging for such flights, because it is our experience that the ROC data are biased

towards positive rates of climb. Thus, even though the actual flight profile would suggest that

45


the troughs of the altitude profile should be very close to 0 ft AGL, the integrated data show

their altitudes increasing over time, contributing to the final value of drift. This is ameliorated

somewhat, but not entirely, by the debuffering process. Other possible explanations include

differences in “visibility” of the transmitting antenna on the aircraft in climbing and descending

attitudes, which could also depend on the receiver antenna placement. At this point, this

phenomenon is not completely understood, and more inquiry is required.

Table 2.3 shows a sample of 40 out of these 100 flights. Any row with a positive sign for

the value of the improvement indicates that the drift derived from debuffered timestamps is better

(i.e., less) than what would have been obtained with the original buffered timestamps. For these

results, all the debuffered drifts are calculated using in-decoder debuffered data.

From the takeoff flights, it could be found that 52 of the 75 tested takeoffs have smaller

debuffered drifts. The average improvement is 40.77 ft. Only 2 cases have similar drifts in the

buffered and debuffered trajectories. For the rest of the cases, the buffered drifts and debuffered

drifts have an average difference of 12.63 ft, which is even smaller than the altitude resolution

of 1090 MHz devices (25 ft). Moreover, 12 of the 15 cases with buffered drifts over 100 ft have

smaller debuffered drifts. This result indicates that debuffering has the potential to ease drifts for

flights with large initial integration deviation. As for the touch-and-go flights, 80% of the flights’

drifts are improved by debuffering. Compared to takeoffs, the touch-and-goes’ drifts are much

larger; some of them are over 1000 ft, while the others are over 100 ft. The reason is that longer

flight durations will cause more drifts to accumulate.

Despite the ADS-B message stream not containing explicit time information, it is possible

to approximate the elapsed time between consecutive messages using the other telemetry data in

those messages. For message i, we denote the latitude lati, longitude loni, and speed vi. Since

46


aircraft tend not to exhibit large values of jerk at any phase of operations, it is assumed here

that the acceleration is constant between two consecutive messages. So that the mean velocity

between the two instances can be calculated by:

vm ≃ vi+1 − vi
2

. (2.2)

The elapsed time between the two messages can then be estimated by:

∆t ≈ d(lati, loni, lati+1, loni+1)

vm
, (2.3)

where d(lati, loni, lati+1, loni+1) is a distance measurement based on the two pairs of latitude and

longitude. We call this estimate of the time interval the ”implicit” time interval for this pair of

messages. For the short time intervals expected in this application, the Haversine distance formula

should be sufficiently accurate, and that is what was used to produce the following results. It

should be noted that, of course, these position data are collected by the aircraft with some error

distribution. If those errors are strongly correlated between consecutive measurements, then this

distance estimate will be quite accurate; otherwise, it is less so.

This method of validation, then, is to compare the values of ∆t estimated from each pair

of consecutive messages, with the differences in the recorded timestamps (either buffered or

debuffered). The validation hypothesis here is that debuffering is improving the accuracy of

the timestamps if the sample correlation between the implicit timestamps and the debuffered

timestamps is higher than that between the implicit timestamps and the buffered timestamps.

Fig. 2.12 shows the results of this validation step for 50 representative flights. The blue

bars represent the sample correlations between intervals formed from the original buffered

47


timestamps, and the implicit intervals constructed from Eq. 2.3.

Figure 2.12: Correlation coefficients between implicit and buffered/debuffered time intervals for
flights from KFRG and KOSU.

Compared to drift measurements, this error metric provides a more stable validation

approach because it does not involve accumulative errors. Each implicit and explicit time interval

pair contains information from only two closely spaced data points, ensuring that the implicit and

explicit timestamps remain independent, thus preventing bias accumulation.

As part of a broader numerical experiment, all flights without significant data gaps in our

database are analyzed from two small airports: KOSU in Ohio and KFRG in New York. Flights

with large time gaps were excluded because such gaps render implicit timestamps unreliable,

and debuffering has no impact over extended missing intervals. Among the 77 flights from

KOSU, only one exhibited a lower debuffered correlation coefficient than its buffered counterpart.

Similarly, for the 384 flights from KFRG, 96.61% demonstrated improved correlation coefficients

following post-decoder debuffering. Notably, 100 of these flights involved single takeoff

operations, with 99 showing better post-decoder coefficients than their buffered versions.

Additionally, 63 of them had higher in-decoder debuffered coefficients compared to post-decoder

results. On average, post-decoder debuffering improved correlation coefficients by 25.8%, while

48


in-decoder debuffering yielded an even greater enhancement of 30.24%.

Table 2.3: Improvement in drift metric after debuffering.

Flight ID Debuffered Drift (ft) Buffered Drift (ft) Improvement (ft)
356015 -29.11 -72.56 43.45
356185 42.87 48.80 5.93
356930 182.90 349.12 166.21
380627 -2.23 -30.43 28.20
283714 -7.43 -197.40 189.97
282936 -3.79 -20.38 16.58
284363 26.14 -377.91 351.77
271063 -5.21 -0.65 -4.55
282983 -30.04 -39.14 9.10
149569 -0.24 -0.87 0.63
284168 20.24 25.31 5.07
284451 -38.95 -57.16 18.22
283423 243.98 247.98 4.00
282676 139.93 142.03 2.10
283123 106.60 106.58 -0.02
283900 -70.10 -63.77 -6.33
270408 -12.18 -56.22 44.04
284314 -48.43 -23.00 -25.43
275489 7.19 -33.34 26.14
282259 36.81 24.76 -12.05
283207 5.02 8.47 3.45
378372 -11.75 1.03 -10.72
376508 -64.25 -70.18 5.93
376662 -29.39 -30.75 1.37
378598 -237.54 -246.57 9.03
376660 -10.64 -16.63 5.99
376792 49.36 72.40 23.04
377324 -300.37 -275.52 -24.84
377198 62.66 71.84 9.18
377754 -21.67 -27.36 5.69
377547 -107.07 -102.62 -4.45
378488 -16.98 -14.25 -2.72
377035 -37.27 48.30 11.03
377752 25.87 48.51 22.63
376918 -33.63 -72.12 38.49
377176 48.56 91.04 42.48
381389 -50.25 -13.72 -36.52
378880 -69.37 -84.71 15.34
378789 -16.64 -24.48 7.83
381209 2.27 -12.44 10.16

49


In summary, post-decoder debuffering significantly improves timestamp accuracy based

on two validation techniques that use information from the ADS-B data stream as ground

truth. Furthermore, in-decoder debuffering consistently outperforms the post-decoder approach.

However, while in-decoder debuffering provides the best results, it requires access to the original

ADS-B messages before they have been amalgamated. In cases where only archived, pre-decoded

data is available, the post-decoder method remains a superior alternative to using buffered

timestamps.

The observations in this study indicate that some data streams contain substantial temporal

gaps, likely due to antenna occlusion caused by building obstruction or suboptimal positioning

relative to runway geometry. Any ADS-B data collection effort would benefit from using an

externally mounted antenna with a clear line of sight to the entire airfield. Additionally, kinematic

estimates over long data gaps should be interpreted cautiously, as aircraft speed, velocity, and

heading cannot be assumed to remain constant over extended periods.

Future improvements to the ADS-B communications protocol could consider incorporating

standardized timing information in a subset of transmissions, enabling direct measurement of

transmission and decoding lags. A potential experiment to refine timestamp accuracy would

involve logging transponder-generated timestamps within the cockpit of a known aircraft and

comparing these records with the received and decoded ADS-B data. However, given the curren