ABSTRACT Title of Dissertation: LEVERAGING ADS-B: A FRAMEWORK FOR DATA COLLECTION AND AIRSIDE OPERATION METRICS ANALYSIS AT SMALL AIRPORTS Zhuoxuan Cao Doctor of Philosophy, 2025 Dissertation Directed by: Professor David Lovell Department of Civil and Environmental Engineering With the growing global demand for air travel, General Aviation (GA) airports are facing significant challenges. Unlike larger airports, many GA airports operate with limited infrastructure, leading to issues such as delays and congestion. Managing a mix of flight types, including training and regular flights, within tight budget constraints and limited runway capacity further complicates operations. Effective management and reliable capacity estimation are crucial, especially as these airports often depend on federal funding for future expansions. However, the lack of effective data collection mechanisms and equipment makes it difficult to implement data-driven management strategies or accurately estimate capacity, particularly given the complexities of handling diverse flight operations. Tasked by the Federal Aviation Administration (FAA), this project addresses the capacity estimation challenges at GA airports using Automatic Dependent Surveillance–Broadcast (ADS- B) technology. It proposes a comprehensive data pipeline and analysis system hosted on Amazon Web Services (AWS) to collect, decode, filter, analyze, and archive flight data. This system facilitates the extraction of key operational metrics for advanced capacity modeling. To ensure precise parameter extraction, the framework incorporates a rule-based model for accurate operation type classification. Additionally, a novel signal enhancement method is introduced to improve ADS-B data quality, ensuring more reliable and consistent flight trajectory timestamps. To support the development of the second generation of the Airport Capacity Model (ACM2) and define the required operational metrics, this work provides specifications for bounding boxes at target airports and establishes key operational benchmarks. The methodologies for calculating departure and arrival operational metrics based on the benchmarks are also detailed. Leveraging the advantages of the proposed data analysis system, this study demonstrates various applications of ADS-B data analysis. These include performance comparisons between flights with different operational purposes, correlations between squared flight speeds at various phases and density altitude, and time series predictions of air traffic flow at specific airports. By addressing these challenges, this project has the potential to significantly enhance the accuracy of capacity estimation across thousands of GA airports while delivering reliable aviation data and actionable insights to both the aviation research community and GA airport stakeholders. LEVERAGING ADS-B: A FRAMEWORK FOR DATA COLLECTION AND AIRSIDE OPERATION METRICS ANALYSIS AT SMALL AIRPORTS by Zhuoxuan Cao Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2025 Advisory Committee: Professor David Lovell, Chair/Advisor Professor Seth Young Professor Timothy Horiuchi, Dean’s Representative Professor Paul Schonfeld Assistant Professor Alexander Estes Associate Professor Xianfeng (Terry) Yang © Copyright by Zhuoxuan Cao 2025 Dedication To my family, whose unwavering support and encouragement made this journey possible. ii Acknowledgments I would like to express my deepest gratitude to my advisor, Dr. David Lovell, whose unwavering support throughout my Ph.D. journey has made it possible for me to come this far in academia. Although I have not always been perfect in my work, his guidance has taught me the importance of rigor and creativity in both research and professional life. When I began my Ph.D., I was not a particularly confident person, but his encouragement helped me realize that I am capable of achieving much more than I had originally believed. I am also sincerely thankful to Dr. Seth Young for his guidance during the completion of my project. His professional advice helped me overcome several technical challenges, and I am especially grateful for his support during the Christmas holidays, when we worked together to ensure everything was submitted before the deadline. My sincere thanks also go to Dr. Paul Schonfeld, Dr. Alexander Estes, and Dr. Terry Yang for serving as members of my dissertation committee, and to Dr. Timothy Horiuchi for stepping in as the Dean’s Representative at the last minute. I am truly grateful to Ms. Anna Damm for her administrative support. From the moment I was admitted to the program to the final steps of graduation, she helped me handle countless forms and procedures. Without her help, I would not have made it through. I also want to thank my fellow Ph.D. colleagues for their companionship and advice during my studies and internship. In particular, I appreciate Zheyu Li for sharing housing with me during iii my internship, and Dr. Yeming Hao for offering invaluable advice throughout both my academic and personal life. Finally, I owe my heartfelt thanks to my family, including Mikey and Dave, for their unwavering support and presence throughout this journey. And most of all, to my fiancée, Mia Zhang, thank you for your extraordinary patience and love. You’re the one who truly endured the full chaos of my Ph.D. life: the stress, the scattered sleep schedule, the emotional ups and downs, the unwashed dishes, and those late-night League of Legends sessions. Honestly, you deserve a degree for this too. To all of you: this achievement is as much yours as it is mine. iv Table of Contents Dedication ii Acknowledgements iii Table of Contents v List of Tables viii List of Figures ix List of Abbreviations xi Chapter 1: Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Research Questions and Approaches . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 How can high-resolution ADS-B data be efficiently collected at GA airports? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 How can operational metrics be extracted from the collected ADS-B data to support airport capacity modeling? . . . . . . . . . . . . . . . . . . . 9 1.3.3 How can ADS-B data contribute to advancements in aviation research or ATM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Contributions of this Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2: ADS-B Data Collection System 16 2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Raspberry Pi-Based Data Collection Platform . . . . . . . . . . . . . . . . . . . 29 2.4 Decoding Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.1 Fundamentals of ADS-B Broadcast . . . . . . . . . . . . . . . . . . . . 33 2.4.2 Timestamp Debuffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.3 Message Amalgamation for In-Decoder Debuffering . . . . . . . . . . . 40 2.4.4 Validation Approaches for Debuffering Performances . . . . . . . . . . . 43 2.5 Unreliable Flight Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.6 Operation Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 v Chapter 3: Operational Metric Extraction 66 3.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 Arrival Operational Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.1 Approach Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2.2 Arrival Runway Occupancy Time (AROT) . . . . . . . . . . . . . . . . . 79 3.2.3 Exit Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.2.4 Landing Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2.5 Arrival Deceleration & Arrival Buffer . . . . . . . . . . . . . . . . . . . 83 3.3 Departure Operational Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.3.1 Departure Runway Occupancy Time (DROT) . . . . . . . . . . . . . . . 84 3.3.2 Departure Cruise Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3.3 Takeoff Speed & Departure Hold Buffer . . . . . . . . . . . . . . . . . . 89 Chapter 4: Applications of the Collected ADS-B Data 93 4.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2 A Comparative Analysis of Flights with Different Aircraft Types and Missions . . 96 4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3 Identifying the Relationship between Air Density Altitude and Speed Metrics . . 104 4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.4 Time Series Forecasting of Next-Day Air Traffic Volume with Multi-Scale Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 5: Summary, Conclusions, and Limitations 133 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Appendix A: Operational Metrics Results 143 A.1 Results of Average Approach Speed . . . . . . . . . . . . . . . . . . . . . . . . 145 A.2 Results of Arrival Runway Occupancy Time . . . . . . . . . . . . . . . . . . . . 146 A.3 Results of Exit Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.4 Results of Landing Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.5 Results of Arrival Deceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 A.6 Results of Departure Runway Occupancy Time . . . . . . . . . . . . . . . . . . 150 A.7 Results of Cruise Departure Speed . . . . . . . . . . . . . . . . . . . . . . . . . 151 A.8 Results of Takeoff Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 A.9 Results of Departure Hold Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 153 vi Bibliography 154 vii List of Tables 2.1 Flights observed at target airports . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 ADS-B message type code and broadcast frequency [1]. . . . . . . . . . . . . . 34 2.3 Improvement in drift metric after debuffering. . . . . . . . . . . . . . . . . . . . 49 4.1 ROT and phase division results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2 Regression results for squared takeoff speed and density altitude. . . . . . . . . . 109 4.3 Regression results for squared touchdown speed and density altitude. . . . . . . . 112 4.4 Performance comparison across models. . . . . . . . . . . . . . . . . . . . . . . 132 A.1 Average approach speed data at KAPA and KGFK. . . . . . . . . . . . . . . . . 145 A.2 Average arrival runway occupancy time at KAPA and KGFK. . . . . . . . . . . . 146 A.3 Average exit speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . . . 147 A.4 Average landing speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . 148 A.5 Average deceleration at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . . 149 A.6 Average departure runway occupancy time at KAPA and KGFK. . . . . . . . . . 150 A.7 Average departure cruise speed at KAPA and KGFK. . . . . . . . . . . . . . . . 151 A.8 Average takeoff speed at KAPA and KGFK. . . . . . . . . . . . . . . . . . . . . 152 A.9 Average departure hold buffer at KAPA and KGFK. . . . . . . . . . . . . . . . . 153 viii List of Figures 2.1 Examples of GA airports with ADS-B receiving systems. . . . . . . . . . . . . . 21 2.2 Real-time mapping tool interface. . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Concept map of ADS-B data collection system. . . . . . . . . . . . . . . . . . . 28 2.4 Simplified Data pipeline in EC2. . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 ADS-B data receiving unit and its components. . . . . . . . . . . . . . . . . . . 31 2.6 Concept Map of ADS-B Reception. . . . . . . . . . . . . . . . . . . . . . . . . 32 2.7 Typical inter-arrival times of buffered messages. . . . . . . . . . . . . . . . . . 36 2.8 Illustration of debuffering algorithm. . . . . . . . . . . . . . . . . . . . . . . . 38 2.9 Matching process for merging complete records. . . . . . . . . . . . . . . . . . 41 2.10 Data retention comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.11 Comparison of buffered and debuffered data using integration. . . . . . . . . . . 44 2.12 Correlation coefficients between implicit and buffered/debuffered time intervals for flights from KFRG and KOSU. . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.13 An example of AGL correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.14 Three types of unreliable trajectories. . . . . . . . . . . . . . . . . . . . . . . . . 54 2.15 An example of touch-and-go flight trajectory. . . . . . . . . . . . . . . . . . . . 56 2.16 Comparison between original trajectory and smoothed trajectory. . . . . . . . . 60 2.17 A smoothed trajectory with identified peaks and troughs. . . . . . . . . . . . . . 62 2.18 An example of ”takeoff point” and ”landing point”. . . . . . . . . . . . . . . . . 64 3.1 A landing flight plotted by the trajectory plotting application. . . . . . . . . . . 71 3.2 An example approach bounding box for runway 17R at KGFK. . . . . . . . . . . 74 3.3 The concept map for operational benchmark interpolation. . . . . . . . . . . . . 75 3.4 The arrival bounding box for runway 17R and 35L at KGFK. . . . . . . . . . . . 80 3.5 Identification of entry and exit point for an arrival flight at KGFK. . . . . . . . . 81 3.6 Arrival transitional points at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . 84 3.7 Departure geofence for runway 17R at KGFK. . . . . . . . . . . . . . . . . . . 86 3.8 Concept map for function make SRS bounding box. . . . . . . . . . . . . . 87 3.9 An example of takeoff roll identification. . . . . . . . . . . . . . . . . . . . . . 90 3.10 Departure transitional points at KGFK. . . . . . . . . . . . . . . . . . . . . . . 91 4.1 4500 ft geofence for runway 17R. . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Illustration of the cross point on the hold bar. . . . . . . . . . . . . . . . . . . . 100 4.3 Three takeoff styles at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 ix 4.4 Runway layout of KAPA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.5 Examples of linear regression of squared takeoff speed and density altitude. . . . 111 4.6 Examples of linear regression of squared touchdown speed and density altitude. . 113 4.7 Illustration of single-scale and multi-scale sliding window. . . . . . . . . . . . . 119 4.8 LSTM Cell Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.9 Architectures of LSTM-based models. . . . . . . . . . . . . . . . . . . . . . . . 125 4.10 Daily air traffic volume at KGFK. . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.11 SARIMA predictions on test dataset. . . . . . . . . . . . . . . . . . . . . . . . . 127 4.12 XGBoost predicted and true air traffic volume. . . . . . . . . . . . . . . . . . . . 128 4.13 XGBoost predicted and true air traffic volume. . . . . . . . . . . . . . . . . . . . 129 4.14 Predictions of single-scale LSTM-based model and multi-scale LSTM-based model.131 x List of Abbreviations 1090ES 1090MHz Extended Squitter ACF Autocorrelation Function ACM Airport Capacity Models ACM2 Second Generation Airport Capacity Model ADS-B Automatic Dependent Surveillance–Broadcast AGL Altitude Above Ground Level AI Artificial Intelligence AR Autoregressive ARIMA Autoregressive Integrated Moving Average AROT Arrival Runway Occupancy Time ARTAS Air Traffic Management Surveillance Tracker and Server ASDE-X Airport Surface Detection Equipment, Model X ASPM Aviation System Performance Metrics ATC Air Traffic Controllers ATM Air Traffic Management AWS Amazon Web Services BER Brandenburg International Airport BOS Boston Logan International Airport CPR Compact Position Reporting DELAYS Dynamic Estimation of Landing Aircraft in the Terminal Area System DFW Dallas/Fort Worth Airport DROT Departure Runway Occupancy Time FAA Federal Aviation Administration FIFO First-In, First-Out GA General Aviation GBDT Gradient Boosted Decision Tree GNSS Global Navigation Satellite System GPS Global Positioning System xi ILS Instrument Landing System IMC Instrumental Meteorological Conditions IoT Internet of Things JFK John F. Kennedy International Airport KAPA Centennial Airport KCGS College Park Airport KFRG Republic Airport KGFK Grand Forks International Airport KMNN Marion Municipal Airport KN51 Solberg Hunterdon Airport KOSU Ohio State University Airport LMI Lincoln Laboratory MIT LOS Levels of Service LSTM Long Short-Term Memory MAE Mean Absolute Error METAR Meteorological Aerodrome Reports MLP Multi-Layer Perceptron MSSW-LSTM Multi-Scale Sliding Window LSTM Framework NextGen Next Generation Air Transportation System NLP Natural Language Processing OOOI Out, Off, On, and In PACF Partial Autocorrelation Function PPM Pulse-Position Modulation QAR Quick Access Recorder QFE Field Elevation Pressure REDIM Runway Exit Interactive Design Model RMSE Root Mean Square Error RNN Recurrent Neural Networks ROC Rate of Climb ROT Runway Occupancy Time xii SARIMA Seasonal Autoregressive Integrated Moving Average SDR Software-Defined Radio SRS Same Runway Separation STARS Standard Terminal Automation Replacement System SWIM System Wide Information Management UAT Universal Access Transceiver UND University of North Dakota VFR Visual Flight Rules VMC Visual Meteorological Condition WTS Wake Turbulence Separation XGBoost eXtreme Gradient Boosting xiii Chapter 1: Introduction 1.1 Background and Motivation With global economic growth, airports worldwide have experienced a substantial increase in civil air travel demand. In 2019, airports accommodated a record 4.5 billion passengers [2], and by 2024, this number had grown by an additional 171 million, despite the lingering effects of the COVID-19 pandemic [3]. While much of the attention has been focused on managing air traffic at large commercial hubs, this upward trend also has implications for General Aviation (GA) airports. GA airports are typically untowered, operate primarily under Visual Flight Rules (VFR), and serve a highly diverse range of aircraft types, pilot experience levels, and flight purposes. Unlike commercial airports, GA facilities support a substantial volume of flight training activities and unscheduled transient operations, resulting in highly dynamic and less predictable traffic patterns. Despite this operational complexity, many GA airports lack access to reliable data on traffic volume, aircraft mix, and operational behavior, making it difficult to support effective planning, secure funding, and maintain safety. Combined with the rapid growth in air traffic, the lack of data-driven air traffic management became a major contributor to a 4.6% increase in delays across Europe in 2006, exceeding expectations [4]. These delays can lead to financial setbacks, affecting both airlines and airport operations [5]. General Aviation (GA) airports, in particular, face even greater financial 1 challenges compared to larger commercial airports. With fewer resources available to expand capacity or implement delay-mitigation strategies, delays at GA airports can become even more severe, further straining operations and service quality. Improving airside efficiency is a key strategy for reducing air travel delays. Two critical factors influencing airside efficiency are runway configuration and air traffic management (ATM), both of which highlight the increasing need for federal funding to support runway improvements and the adoption of more advanced ATM technologies. However, ATM systems at small airports still heavily rely on human resources to manage airspace sectors, making operations more labor- intensive. Air traffic controllers (ATC) must consider factors such as conflict risk, conflict resolution, and limited operational resources, further complicating their tasks [6]. To address these challenges, the Next Generation Air Transportation System (NextGen) has been developed to modernize air traffic management and improve operational efficiency. A key component of NextGen is Automatic Dependent Surveillance–Broadcast (ADS-B), a satellite- based surveillance technology designed to replace (or at least supplement in a significant way) traditional radar. The technical details of this system will be discussed in Chapter 2. Following the Federal Aviation Administration (FAA) mandate, a large proportion of aircraft have been equipped with ADS-B transponders [7]. ADS-B enhances communication between pilots and air traffic controllers, making it a critical component of modern ATC systems. In addition, its open communication protocol allows ADS-B signals to be received by any party with a compatible receiver, enabling researchers to collect and analyze flight data from target airports, the detailed explanation of ADS-B mechanism and mandate is provided in Chapter 2. However, despite its advantages, ADS-B-based data collection has not been widely adopted at GA airports. Many of these facilities lack the 2 infrastructure required to support continuous ADS-B surveillance or to integrate such data into their operational workflows. As a result, this vacuum at small airports is increasingly being filled by third-party services that rely on commercial-grade ADS-B receivers, such as Virtower and 1200.aero, although the data quality and density of the data provided by these services are not always guaranteed. Furthermore, federal funding and research efforts have been primarily directed toward large regional airports, as they handle higher traffic volumes. As a result, existing Airport Capacity Models (ACM) are developed using data from large airports, failing to account for the unique operational characteristics of small airports. This discrepancy in capacity modeling makes it challenging for GA airports to justify their needs when applying for federal funding, further hindering their ability to improve infrastructure and operational efficiency. Therefore, it is crucial to establish a dedicated data collection system for small airports and to integrate these data into the development of next-generation ACM. A more accurate and representative model would enable improved resource allocation, support federal funding applications, and enhance ATM at small airports. Beyond supporting capacity modeling, the collected data would also provide GA airports with a deeper understanding of local flight operation patterns and the behaviors of flights serving various purposes. This knowledge can inform a wide range of airport functions, including runway and pavement utilization, air traffic flow forecasting, revenue modeling (e.g. accurate landing fee assessment) and long- term infrastructure planning. In doing so, it addresses both strategic and operational challenges, ultimately contributing to more data-informed and resilient airport systems. 3 1.2 Problem Statement Among those most impacted by the surge in air travel demand are GA airports, which face unique and growing challenges. Unlike larger regional airports, many GA airports operate with less developed infrastructure, including runway facilities and ATC systems, making it difficult to accommodate increasing traffic. As a result, they frequently experience capacity constraints, delays, and congestion. According to data collected from eight European airports, a previous study reported that between 8.1% and 24.1% of flights were delayed [8]. Additionally, since many GA airports host flight schools, they must manage a diverse mix of flight operations, including training flights and regular operations, all while operating with limited runway configurations and tight budgets. These challenges highlight the urgent need for accurate capacity assessment and effective airside management to optimize operations and minimize delays. Moreover, reliable capacity estimates are essential for securing federal funding to support infrastructure improvements and future expansion. Compared to larger regional airports, GA airports also face significant challenges in data collection for capacity prediction. Installing and maintaining ADS-B systems similar to those used at regional airports requires substantial financial investment, which can be a significant burden for GA airports already operating with limited funding. Additionally, accurate capacity estimation relies on the extraction of operational parameters, yet the diverse mix of flight types and high sensitivity to weather conditions lead to highly variable daily operational patterns. This variability complicates the derivation of consistent and reliable parameters for capacity modeling. Furthermore, many GA airports lack ATC towers and automated tracking systems, making it difficult to accurately count, track, and archive flight trajectory data. This absence of reliable air 4 traffic volume measurement further complicates efforts to model airport capacity and understand the local flight behaviors effectively. Adding to these complexities, GA airports with flight schools face an additional layer of operational variability. Training flights exhibit unique behavioral patterns, such as repeated circuits, touch-and-go landings, and continuous flight maneuvers, which are rarely observed in commercial operations. However, existing capacity modeling methods are primarily designed for commercial flights, making them poorly suited to account for the distinct characteristics of training operations. Although data collection can be outsourced to aviation data vendors, data quality cannot be guaranteed. One major issue is data sparsity—datasets provided by large-scale ADS-B collection networks often have sampling intervals exceeding 20 seconds, whereas the true broadcast frequency is approximately 2Hz. This discrepancy creates significant data gaps, limiting the accuracy of trajectory reconstruction using rate-of-climb calculations and limiting operational analysis, particularly in the early stages of flight operations. Another key concern is data buffering. Due to bandwidth limitations in inexpensive receiving equipment, flight messages are often buffered, and they are thus assigned delayed timestamps, affecting the precision of aviation studies that rely on accurate time duration measurements. Furthermore, buffered data is often overlooked in widely used ADS-B datasets due to large data intervals, further compromising the reliability of operational studies at GA airports. This project addresses the challenges of collecting ADS-B data at GA airports by leveraging widely used ADS-B technology, thereby providing essential data for the development of an improved ACM. The proposed data collection scheme will not only capture the details of each trajectory but also generate refined statistics for key operational metrics. These metrics 5 and statistics will serve as critical inputs for the next-generation of ACM, which will offer a more comprehensive understanding of conditions at small airports. Furthermore, this study will explore the broader applications of ADS-B data at GA airports, shedding light on ATM challenges and contributing to the advancement of aviation research. 1.3 Research Questions and Approaches This dissertation applies data science and data engineering techniques to address the challenges associated with collecting and utilizing ADS-B data at small airports. By leveraging the data provided by the proposed system, this research explores key aspects of ATM and its broader implications for aviation research. This study focuses on three interconnected research questions: • How can high-resolution ADS-B data be efficiently collected at GA airports at a low cost? • How can operational metrics be extracted from the collected ADS-B data to support airport capacity management? • How can ADS-B data contribute to advancements in aviation research or ATM? Each of these questions is systematically examined throughout this dissertation. As new challenges emerge, the research framework is adapted to incorporate detailed subsidiary questions and refined methodologies, ensuring a comprehensive approach to problem-solving. 6 1.3.1 How can high-resolution ADS-B data be efficiently collected at GA airports? 1.3.1.1 How can ADS-B data be efficiently and cost-effectively collected at GA airports? The ADS-B data collection mechanisms used at large airports cannot be directly applied to GA airports due to several key challenges, including financial constraints, operational characteristics, and differing data requirements. Additionally, many GA airports are untowered, making it difficult to establish a continuous, centralized data collection system. As a result, although ADS-B transponders are installed on most aircraft, the vast amount of available ADS-B information remains underutilized. To address this issue, under the guidance of the FAA, this study proposes an automated data collection framework deployed on the Amazon Web Services (AWS) platform. This system utilizes affordable hardware, including data receiving, amplifying, filtering, decoding, storage, and transmission modules, specifically designed to accommodate the unique operational characteristics of GA airports. By installing these customized hardware sets at target airports, the system will enhance signal reception, ensuring a strong and reliable ADS-B data foundation for subsequent processing. 7 1.3.1.2 How can the reliability of collected ADS-B timestamps be ensured, and if unreliable, how can they be improved? ADS-B data contain critical information for ATM and operational studies, where time- related accuracy is essential. Therefore, it is necessary to evaluate the reliability of the timestamps attached to ADS-B messages. Unlike positional and velocity data, which are directly decoded from raw ADS-B signals, the timestamps are assigned locally by the receiving devices. As a result, timestamp inconsistencies may arise due to hardware limitations and data buffering effects. To validate timestamp accuracy, a cross-validation method leveraging the kinematics of ADS-B messages is proposed. A known issue, referred to as “data buffering”, causes bias in the intervals between ADS-B data points, disrupting time-sensitive analyses. However, since different types of ADS-B messages are broadcast at predefined frequencies, their expected transmission intervals can be used to identify and ameliorate time distortions. This correction process, named ”debuffering”, systematically adjusts timestamp inconsistencies based on the transponder’s broadcast mechanism, significantly improving the temporal accuracy of collected ADS-B data. The debuffering technique will be integrated into the entire processing workflow, ensuring more precise timing information for ATM research and operational applications. 8 1.3.2 How can operational metrics be extracted from the collected ADS-B data to support airport capacity modeling? 1.3.2.1 How to extract arrival operational metrics? The arrival operational metrics in this study are derived from landing flights and will be integrated into the newer version of the ACM, bringing GA airports into focus. The estimated capacity will aid GA airports in securing federal funding, while also assisting airport managers in balancing incoming flights with limited airside resources. Since previous studies and regulations lack clear definitions of operational parameters, this dissertation provides detailed specifications for benchmark determination, ensuring that these benchmarks serve as the foundation for generating accurate metrics. These benchmarks are established using a bounding box framework. The runway environment bounding boxes are defined based on runway coordinates, while departure bounding boxes are constructed following FAA regulations. This approach ensures consistency with GA airport operations, aligning the methodology with real-world practices. The methodology and implementation details of these bounding boxes are discussed in Chapter 3. 1.3.2.2 How to extract departure operational metrics? Similar to the extraction of arrival metrics, departure metrics are computed based on a series of departure benchmarks, including entry points, takeoff positions, and exit points. These benchmarks divide the takeoff operation into distinct phases, providing a structured approach to assessing operational efficiency. 9 While arrival bounding boxes are determined solely by runway configuration, according to the FAA’s mandate [9], departure bounding boxes must also account for the performance category of departing aircraft, which influences flight partition distances. To address this variability, bounding boxes of varying lengths are automatically generated along the runways using algorithmic methods, ensuring that the benchmarks accurately reflect the operational characteristics of different aircraft types. 1.3.3 How can ADS-B data contribute to advancements in aviation research or ATM? 1.3.3.1 How could ADS-B data reveal the differences between flights with different missions and performance categories? Many GA airports serve as flight training hubs, with Grand Forks International Airport (KGFK) being a prime example, as it hosts the University of North Dakota’s Aerospace Program. As a result, GA airports must manage a diverse mix of aircraft types and flight missions, including both training and operational flights. The level of familiarity with the aircraft varies significantly between student pilots and experienced pilots, impacting aircraft handling and overall efficiency. Additionally, performance categories play a crucial role in runway operations, as differences in operating procedures and engine performance directly affect aircraft behavior during takeoff and landing. The interactions between different aircraft types and missions influence the efficiency of airport operations. To better understand these dynamics, this study divides the takeoff procedure 10 into two distinct phases, each capturing the impact of different factors on runway occupancy time (ROT). By analyzing these phases separately, this research aims to provide a clearer understanding of how various operational factors influence airport capacity. 1.3.3.2 How could ADS-B data quantify the relationship between density altitude and speed benchmarks? Given the standard lift equation, where lift force is positively correlated with the square of velocity and air density while other variables remain constant, it can be deduced that density altitude is positively correlated with the squared velocity under a constant lift condition [10]. Consequently, it is a well-established principle in aviation research that higher density altitude necessitates a higher velocity for aircraft to take off or land. However, the precise quantitative relationship between these variables remains undetermined. Since velocity directly impacts runway distance requirements for acceleration and deceleration, ROT, and fuel consumption, understanding this relationship is crucial for assessing flight behavior variations across airports at different elevations and their subsequent impact on ATM. This study employs a regression-based approach to validate the linear relationship between squared velocity and density altitude. Additionally, hypothesis testing is conducted to assess the significance of density altitude in explaining speed variations, providing valuable insights into aviation dynamics and operational efficiency at different airports. 11 1.3.3.3 How could ADS-B facilitate the resource allocation at GA airports? Resource allocation at small airports is typically planned a day in advance. Key factors such as weather conditions and expected traffic are assessed the day before to minimize disruptions and optimize operational efficiency. Since real-time adjustments are challenging due to limited personnel and resources, pre-planning ensures smooth airport operations. To effectively allocate ground crew, tow tractors, and fuel, accurate predictions of future air traffic demand are essential. Therefore, this study aims to develop a method to forecast next-day air traffic volumes, using KGFK as a case study. This research considers weather data and historical air traffic volume as key inputs for forecasting. Weather patterns help estimate expected conditions for the target day, while historical air traffic data capture seasonal demand variations. To achieve this, multiple machine learning and deep learning approaches are applied using datasets with varying time spans and time units. This methodological design ensures a scientific comparison of dataset effectiveness and model performance, providing a robust framework for next-day air traffic estimation. 1.4 Contributions of this Research There are thousands of GA airports in the U.S., yet due to limited research attention and insufficient federal funding, many of these airports struggle to equip and maintain effective data collection systems. This limitation hinders data-driven ATM at GA airports and creates challenges in securing federal funding for infrastructure improvements. Additionally, this lack of data accessibility poses difficulties for research groups seeking high-quality aviation data from small airports. Although widely used aviation networks exist, their data quality is often 12 compromised due to data sparsity and potential buffering issues. To address these challenges, this study proposes a comprehensive data collection system that integrates data acquisition, processing, analysis, and archiving. This system benefits both GA airport ATC operations and researchers utilizing small-airport data for aviation studies by providing high-resolution ADS- B data, data augmentation techniques, operational metric extraction, and novel applications of ADS-B data. The high-resolution ADS-B data collection system introduced in this work consists of cost-effective hardware installed at GA airports, enabling continuous data flow with minimal disruption by strategically placing receivers near runways. This approach is particularly beneficial for studies focusing on early-stage departure operations, where significant variations in track and speed occur. Without dense data collection, critical change points may be lost, which could compromise the accuracy of operational performance analysis. To further improve data quality, this study implements an ADS-B data augmentation process that corrects timestamp inaccuracies caused by buffering effects during data transmission. The debuffering process, embedded in the decoding phase, systematically adjusts unrealistically small data intervals, restoring timestamps to their most accurate values. The effectiveness of debuffering is tested through multiple validation approaches, proving that debuffered data significantly enhances data reliability. Notably, this data augmentation method can be applied not only to the ADS-B data collected in this study but also to other aviation networks using similar hardware configurations, eliminating the need for costly hardware upgrades. Improved data resolution benefits various aviation studies, including altitude trajectory reconstruction using rate of climb (ROC) and research on ROT, ensuring more precise operational analyses. With the enhanced dataset, this study extracts comprehensive operational metrics from 13 ADS-B data collected at GA airports. Unlike conventional datasets, which struggle to capture high-variance operational behaviors, this system continuously records flight movements, offering a more detailed and holistic representation of airport operations. Additionally, since previous studies lack standardized definitions of key operational benchmarks, this work establishes clear criteria for determining these benchmarks, ensuring accurate and reliable metric extraction. These metrics are then integrated into the next-generation ACM model to estimate airport capacity, assisting GA airports in optimizing resource allocation and strengthening their case for federal funding. Furthermore, this study explores new applications of ADS-B data, such as comparing flight behavior across different missions and performance categories, quantifying the relationship between density altitude and speed benchmarks, and using time series forecasting models to predict next-day air traffic volume. These applications expand the research potential of ADS-B data and improve understanding of how environmental and operational factors influence airport performance. Additionally, they provide practical insights for small-airport ATM, helping predict future travel demand and assess the impact of key operational factors. By introducing an affordable, high-resolution ADS-B data collection system, refining data quality through augmentation techniques, and extracting reliable operational metrics, this study contributes to both aviation research and small-airport ATM, offering valuable insights into airport capacity modeling, operational efficiency, and data-driven decision-making. 14 1.5 Organization The remainder of this work is structured as follows. Chapter 2 presents a detailed description of the data collection system, with a focus on the data pipeline, which encompasses data preprocessing, labeling, analysis, and archiving. Following an overview that outlines the system’s framework and highlights the purpose of each component, the chapter provides a thorough explanation of each module within the system. Chapter 3 details the methodology for calculating departure and arrival operational metrics, explaining the underlying procedures and their significance in capacity estimation. Chapter 4 explores three key applications of the proposed system, demonstrating its practical value in aviation research and air traffic management. Chapter 5 synthesizes the key findings from the previous chapters and provides concluding remarks on the contributions and implications of this study. Additionally, a literature review is incorporated at the beginning of each chapter to provide the necessary background, helping readers develop a clearer understanding of current research conditions and existing gaps in the field. 15 Chapter 2: ADS-B Data Collection System 2.1 Literature Review As one of the core components of the NextGen transportation system, ADS-B has been widely applied across all types of aircraft. The development of ADS-B technology has been largely driven by FAA’s priority to enhance aviation safety and reduce air accidents. By enabling real-time communication of aircraft identification, position, and kinematic data between ground receivers and airborne systems, ADS-B has successfully enhanced the safety of air travel and modernized the ATM process [11], resulting in a 12% reduction in the fatal accident rate from 2009 to 2015 [12]. The ICAO has suggested that Mode S 1090MHz Extended Squitter (1090ES) receiving stations should be deployed in the Asian-Pacific region, America, and Australia [13], while a small proportion of 978 MHz Universal Access Transceiver (UAT) receivers remain operational. Chen et al. pointed out that while 1090ES follows an international standard, it also leads to channel congestion, whereas UAT can process more data but lacks widespread adoption [14]. The dual-frequency usage is based on mean sea level aircraft operations to comply with FAA mandates [15]. The widespread adoption of ADS-B technology provides a constant and abundant data flow for airports, researchers, and aviation enthusiasts. The fidelity of ADS-B data has been extensively studied. Ali et al. [16] assessed the integrity of positional data within ADS-B messages, noting that certain Global Navigation 16 Satellite System (GNSS) receivers onboard aircraft can exhibit anomalies, although the corresponding ADS-B data remain unaffected. The primary sources of ADS-B errors include missing altitude data, duplicated messages, and position jumps. Zhang et al. [13] compared primary radar data and ADS-B data with high-accuracy position data as the ground truth, finding that ADS-B data exhibits more small error bursts, whereas large error bursts are significantly fewer, indicating that ADS-B outperforms radar data in accuracy. ADS-B technology consists of ADS-B In and ADS-B Out functionalities. ADS-B In receives and decodes external messages broadcast by ground stations or nearby aircraft, while ADS-B Out transmits encoded messages containing real-time flight information. Leveraging these two functions, airports have developed large-scale ATC systems to enhance surveillance and efficiency. One such system is Airport Surface Detection Equipment, Model X (ASDE-X), which has been deployed at major airports across the United States, including Baltimore/Washington International Thurgood Marshall Airport, LaGuardia Airport, and Los Angeles International Airport. Integrating ground surveillance radar systems, ADS-B signals, and additional data sources, ASDE-X has been proven to enhance runway safety by detecting potential runway conflicts [17]. Using ASDE-X data from John F. Kennedy International Airport (JFK), Bhadra et al. measured surface traffic efficiency based on the time flights spent in runway queues, proposing a virtual queue concept that reduces physical queuing and optimizes financial and environmental outcomes [18]. Srivastava also used ASDE-X data to analyze queue durations before takeoff at JFK, later proposing a taxi-out duration prediction model that utilized timestamps, callsigns, and kinematics, all of which can be obtained from ADS-B. Additionally, Hotle et al. used ASDE-X data to identify taxi-out benchmarks and predict departure delays [19]. Other systems, such as Standard Terminal Automation Replacement System (STARS) and 17 Air Traffic Management Surveillance Tracker and Server (ARTAS), are designed to assist ATC through advanced air traffic surveillance and management [20] [21]. Although these systems are effective at major U.S. and European airports, they are not warranted at GA airports due to the lack of operation complexity and level of traffic, budget constraints, and infrastructure limitations. Many GA airports lack ATC facilities, real-time surveillance needs, and centralized data processing capabilities, making large-scale ADS-B ground station installations impractical. As a result, ADS-B data collection at small airports is often managed by third-party networks such as OpenSky, FlightAware, and Flightradar24. These platforms provide flight tracking services primarily focused on en route flight monitoring, relying largely on hobbyist-built receivers. Several studies have leveraged third-party ADS-B networks for aviation research. Sun et al. [22] analyzed a year’s worth of ADS-B data from OpenSky to assess ADS-B equipage trends, fleet management strategies, and system performance. Schultz et al. [23] used OpenSky ADS- B data from Zurich Airport to extract airport performance metrics, such as taxi times, runway occupancy times, and ground trajectories. Similarly, Patroumpas et al. [24] utilized FlightAware ADS-B data from Europe to detect mobility events by analyzing flight trajectory change points, introducing noise reduction techniques to enhance trajectory fidelity. However, not all third- party networks provide complete airport surface coverage. Almost all of these services rely on hobbyist-built receivers, leading to incomplete flight records and inconsistent data reception. As a result, third-party ADS-B networks have historically struggled to support GA airports, necessitating custom-built systems for detailed flight operations studies. Beyond third-party data sources, alternative data collection methods have been explored for GA airport surveillance. Some studies investigated acoustic [25] [26] and image-based [27] sensor technologies for flight detection and tracking. However, ADS-B remains the most ATC- 18 compliant and widely accepted method. Piracci et al. [28] tested Software-Defined Radio (SDR) prototypes for evaluating enhanced ADS-B receivers, proposing a multi-channel 1090ES receiver capable of high-resolution data reception. Gui et al. [29] built 14 ground stations with omnidirectional antennas, achieving ADS-B reception for air traffic within a 300-kilometer radius. Mitkas and Lovell [30] developed an antenna array and portable Raspberry Pi-based ADS-B receivers, successfully collecting high-resolution ADS-B data from six GA airports. Their ADS-B receiving mechanism forms the foundation of this dissertation’s data collection framework. 2.2 System Overview In collaboration with the FAA, this project developed a cost-effective data collection system capable of collecting, preprocessing, analyzing, and archiving real-time ADS-B data. Unlike the aforementioned third-party networks, this system is specifically designed to focus on operations conducted at or near target airports, with an emphasis on data fidelity and integrity. Using high-resolution ADS-B reception devices, this system is capable of capturing near-airport flight trajectories. These trajectories are then labeled with the corresponding operations during the data pipeline, providing essential information for subsequent analyses such as flight performance comparison and airport performance parameter estimations. With proven reliability, the system has been installed and is currently operational at seven airports: Centennial Airport (KAPA), Colorado; College Park Airport (KCGS), Maryland; Republic Airport (KFRG), New York; Grand Forks International Airport (KGFK), North Dakota; Marion Municipal Airport (KMNN), Ohio; Solberg Hunterdon Airport (KN51), New Jersey; and 19 The Ohio State University Airport (KOSU), Ohio. The airports in our network feature distinct runway configurations. KGFK has two pairs of parallel runways, it accommodates approximately 300,000 annual flight operations, approximately 85% of which are training flights on small general aviation aircraft operated by the University of North Dakota (UND). The runways at KFRG intersect to form an X-shape; KOSU includes one pair of parallel runways, along with a diagonal runway that intersects both. Ambulance helicopter operations can also be observed at KOSU. KCGS, located in College Park, Maryland, has a single runway. While it does not have an attached flight school, it does receive occasional traffic from police helicopters. Considering the geographic diversity, KAPA stands out with an elevation of 5,883 ft, which is nearly twice that of KGFK, highlighting the heterogeneity within the network. Given the variations in geographical conditions and runway configurations, the system ensures a diverse dataset, providing a robust foundation for subsequent analyses. Fig. 2.1 presents examples of these airports equipped with ADS-B receiving devices, illustrating the range of runway configurations captured in the database to support broad research applications. The data flow begins with the customized ADS-B receiver, a Raspberry Pi-based platform. ADS-B signals at different frequencies are received by antennas and SDRs, and the Raspberry Pi integrates these signals for subsequent data processing. The antenna serves as the front- end hardware that captures radio signals transmitted by aircraft transponders. These signals are weakened when they reach the receivers, so antennas are essential to ensure reliable reception, especially in near-airport environments. Once the radio signals are captured, the SDR converts the signals into digital form for processing. The SDRs are proved to be flexible and programmable, making them ideal for this task. The collected data is then transmitted to AWS via the MQTT Internet of Things (IoT) protocol for further processing. 20 (a) Runway configuration of KCGS. (b) Runway configuration of KFRG. (c) Runway configuration of KGFK. (d) Runway configuration of KOSU. Figure 2.1: Examples of GA airports with ADS-B receiving systems. At the initial stage, raw data from different frequency channels are stored separately in AWS, facilitating efficient data management in later steps. Subsequently, the system transfers files from multiple channels into a centralized folder, which functions like the wide end of a funnel; once data enter this folder, they automatically move down the processing pipeline, triggering the following procedures. The first stage of processing is the decoding algorithm. A commonly used and widely 21 recognized tool for decoding raw ADS-B messages is the pyModeS library in Python. This package is specifically designed for decoding messages from Mode S transponders and is therefore adopted in our project. Based on the type code at the beginning of each message, the pyModeS algorithm identifies the type of information contained in the raw data and translates ADS-B messages from binary into an ASCII-based format, which is more human-readable. ADS- B transponders transmit nine different message types, though in practice, most raw messages contain position or velocity data, with occasional identification or status messages. Consequently, the decoded output consists primarily of position and velocity information. Because a complete flight message requires both position and velocity, this step also includes a merging process that combines messages of different types from the same flight based on the unique ICAO identifier. With proven fidelity, the real-time position and velocity messages broadcast by ADS-B transponders have significantly benefited air traffic management (ATM) at airports and enabled a wide range of aviation studies. When combined with the FAA’s registered aircraft database, aircraft specifications can be easily identified, further broadening the scope of potential research applications. However, a challenge arises from the ”data buffering” phenomenon, caused by the absence of precise temporal information in raw ADS-B messages. This issue compromises the accuracy of time-sensitive data during transmission and complicates downstream processing and analysis. To address this, a debuffering algorithm is applied prior to the merging process, ensuring a more accurate and reliable dataset for subsequent use. Afterwards, the merging process is also customized to accommodate the debuffered data, ensuring greater precision. Messages received within the same batch are processed simultaneously, producing a single decoded dataframe where each row contains complete position and velocity data for a specific flight at a given time point, integrating messages from multiple flights. To balance data comprehensiveness with the project’s 22 requirements, the decoded file contains the following information for each data point: timestamp, ICAO identification code, latitude, longitude, altitude, groundspeed (in nautical miles per hour), track, ROC in feet per minute, and callsign. The decoded file is named based on the airport name, transponder frequency, and reception datetime to maintain consistency in data organization. The efforts of the decoding process and associated algorithms ultimately support the update of the PostgreSQL database. This database functions as long-term storage for flight records and fulfills historical flight query needs in research. A series of flight statistics, such as flight counts and message counts, are directly stored in the database. The mapping between flight identifiers (i.e., ICAO and callsign) and the corresponding aircraft specifications is also included. More importantly, individual ADS-B messages are archived in the table adsb messages, while the complete trajectories—along with the labels generated by the analysis algorithms (introduced below)—are stored in the table flights. These two tables serve as the primary data sources for the applications developed in this work. To update the above-mentioned tables, once decoding is complete, the system checks for any existing decoded dataframes in the designated directory and then initiates the process of loading the decoded messages into the PostgreSQL database. This loading process is executed by a Python-based algorithm that incorporates a series of customized functions for filtering and merging flight data. In this process, and in alignment with the project’s focus, flight trajectories are initially trimmed by discarding any data points where the aircraft’s coordinates are more than 7 nautical miles from the subject airport. The flight locations are extracted from the decoded file names, and each flight coordinate is compared with the center coordinate of the target airport to determine its proximity. Flights deviating more than 7 miles from the airport for durations exceeding 600 seconds are treated as two distinct flight segments. Each segment falls within the 23 study area and captures all operations associated with the target airport. flight ids are unique identifiers assigned to individual flights to make locating and studying specific flights easier, they are assigned to individual flights at this step as well. Therefore, it is necessary to continue the flights that are processed in two continuous batches. If flight data or an associated flight from the flights table or relevant data points are found, the flight id is inherited from the existing record, and the new flight segments are appended to the first half of the flight, with corresponding information in the flights table updated accordingly. If no match is identified, a new unique flight id is assigned to all data points in the flight. Once the flight trajectories are reconstructed, the individual ADS-B messages are loaded into the adsb messages table in the database for validation purposes and future research. The decoded files, along with the raw ADS-B messages from different frequencies, are then archived as backup files in an AWS S3 bucket for long-term storage. This ingestion process is triggered whenever a previous task is completed, and new raw files are detected after a one-minute interval. Since a single flight consists of dozens to thousands of individual messages, a comprehensive flight analysis is conducted on an hourly basis, a process referred to as the ”hourly update”. The data loading process primarily populates the adsb messages table, whereas the hourly update focuses on updating the flights table, which serves as the primary source for extracting flight details. The flights table contains complete reconstructed trajectories along with a series of classification labels, facilitating flight filtering and categorization. The complete flight trajectories are assembled from individual messages stored in adsb messages, where all decoded messages with the same flight id and loaded within a two-hour window are merged and stored as JSON strings. These JSON structures provide flexibility for further processing and serve as a standardized format for database storage. An example of a decoded JSON string is 24 shown below: { "datetime": "1730759325.6904", "lat": "39.57158", "long": "-104.83833", "trk": "2.8125", "gs": "7.0", "alt": "0.0", "roc": "0.0" } To ensure data integrity, preliminary checks are applied, such as negative altitude validation, which detects faulty data points, and on-ground verification, which determines whether an aircraft has touched down on the runway while operating. These JSON-formatted flight records form the foundation for all subsequent analyses. Since unreliable ADS-B data can enter the system for various reasons (which will be elaborated on in the following subsections), inaccurate flight trajectories can significantly compromise the performance of downstream studies. To address this, a flight reliability filtering algorithm is implemented to determine the validity of recorded flights. This filter consists of a series of rule- based checks, each designed to identify and exclude a specific type of unreliable condition. Only flights passing all validation rules are assigned a ”reliable” label in the flights table, while those failing any check are flagged as ”unreliable”. For the reliable flights, an operation identification model is applied using a rule-based 25 classification method. Since this work focuses on GA airports, which often host flight schools and observe a large number of training operations, the model must go beyond identifying standard operations such as takeoff, landing, and taxiing. It must also distinguish more complex training behaviors composed of these basic actions. For example, a touch-and-go operation consists of a sequence of a landing immediately followed by a takeoff, and a takeoff followed by a landing is also treated as a distinct operation. Additionally, aborted landings, caused by runway obstacles or other factors, result in low-approach operations, which must also be correctly identified. To minimize the impact of high-frequency noise, Butterworth low-pass filters with varying filter orders are applied based on the maximum altitude of the flight. The smoothed trajectories allow for the identification of distinct flight maneuvers, where key operations are extracted by detecting peaks and troughs in altitude changes. This method not only ensures high classification accuracy but also enables the identification of multiple operations and change points within a single flight. The detected operations are then assigned to the predicted op type column in the flights table as integer values, representing different operation categories. In addition to analyzing and labeling flight data, a series of real-time statistics are also updated during this workflow. The flight class for each flight is determined using an ICAO- to-weight class mapping list, and the count for each weight class is updated every hour. Simultaneously, this process updates a real-time web dashboard via a DynamoDB table, enabling continuous monitoring of air traffic across multiple GA airports. The system tracks the total number of flights over the last 24 hours, categorizing them by airport location while recording hourly flight counts, timestamps, ICAO identifiers, and other relevant flight attributes. Additional metrics such as ground operations statistics, on-ground aircraft counts, and real-time ADS- B message counts are also recorded and displayed on the dashboard, created by the previous 26 research team of this project. The interface of the mapping tool can be found in Fig. 2.2. Figure 2.2: Real-time mapping tool interface. The workflow of this ADS-B data collection system is depicted in Fig. 2.3, and the simplified data pipeline in the cloud computing unit, the EC2 instance, is shown in Fig. 2.4, demonstrating how this framework ensures maximum utilization of received ADS-B data, providing a consistent and reliable data flow while efficiently managing AWS computation resources. The algorithms and processes described above will be further detailed in the following subsections. As of March 14, 2025, at 2:37 PM, hundreds of millions of ADS-B messages have been captured by our system, and over a million flights have been observed, decoded, filtered, labeled, and archived in our database, excluding a small proportion of undecoded records, as shown in Table 2.1. 27 Figure 2.3: Concept map of ADS-B data collection system. Figure 2.4: Simplified Data pipeline in EC2. 28 Table 2.1: Flights observed at target airports Target Airport Flight Count Centennial Airport (KAPA) 235568 College Park Airport (KCGS) 86682 Republic Airport (KFRG) 32914 rand Forks International Airport (KGFK) 476671 Marion Municipal Airport (KMNN) 10562 Solberg-Hunterdon Airport (KN51) 289084 Ohio State University Airport (KOSU) 40655 2.3 Raspberry Pi-Based Data Collection Platform Tasked by the FAA, this project aims to provide operational parameters for evaluating airport capacity and performance at GA airports. To assess both current and future capacity, detailed flight records must be analyzed to extract key operational metrics. Several existing data sources were reviewed before proposing a new data collection system. The Aviation System Performance Metrics (ASPM) database and the System Wide Information Management (SWIM) system were examined, but neither provides comprehensive coverage of GA airport operations [31] [32]. Since the required metrics in this study rely on breakdown of statistics by operation type, implementing flight detection and counting mechanisms is essential. Previous attempts, aside from ADS-B technology, have explored various alternative approaches, such as drones and acoustic arrays [25] [26]. However, these methods have been limited by hardware constraints, providing only limited operational insights. In contrast, ADS-B has proven to be a more accessible and efficient alternative due to its high data transmission frequency and fidelity. Its open communication protocol allows this project to collect and analyze flight data using cost- effective, easily assembled hardware, which can be built with off-the-shelf components. This 29 approach significantly reduces both the cost and technical barriers associated with data collection at GA airports, making high-resolution flight tracking more feasible for research and operational improvements. To achieve this, a Raspberry Pi-based data acquisition system was developed. Raspberry Pi is a compact, low-cost computing platform that has been widely used across various applications, including gaming stations and household automation [33] [34]. It offers several advantages, such as proximity between the computation unit and the data receiving unit, which minimizes signal noise and simplifies wiring. Additionally, Raspberry Pi’s high processing speed and data reception rate make it well-suited for real-time data acquisition [35]. Running on a Linux-based operating system, Raspberry Pi also provides flexibility in software configuration and efficient text-based processing. Each data receiving module in the system consists of an antenna, an amplifier, a bandpass filter, and an SDR. ADS-B messages are transmitted using pulse-position modulation (PPM) at fixed frequencies depending on the transponder type. These signals are captured by strategically placed antennas with an unobstructed line of sight to optimize reception. The received messages contain essential flight data, including GPS-derived coordinates and other kinematic information obtained from onboard instruments. Each 112-bit ADS-B message includes a 24-bit ICAO aircraft identifier for validation and a 56-bit segment containing flight status information. To ensure optimal signal quality, an amplifier is used to extend the strength of the receiver, followed by a bandpass filter that eliminates noise and interference from any part of the spectrum away from the target band(s). The filtered signal is then processed by an SDR, where demodulation occurs. Here, it is important to distinguish this signal decoding process from the message decoding process described in the system overview section. In the SDR step, 30 the received ADS-B signals are converted from radio frequency waves into digital data. The SDR extracts Mode S ADS-B transponder messages, parsing essential flight status information. However, since ADS-B messages do not inherently include timestamps, Unix timestamps (also known as Epoch timestamps) are assigned upon reception to provide temporal accuracy. This timestamping process is widely employed by ADS-B receivers. However, due to computational constraints, it can lead to data buffering—a phenomenon where messages accumulate before processing, resulting in timestamp inaccuracies. A solution to mitigate this issue will be introduced in the next section. Given that 77% of GA aircraft are equipped with 1090 MHz ADS-B transponders, 22% use 978 MHz UAT, and 1% operate dual-frequency transponders, each receiving unit is designed with two identical SDR-based systems, each dedicated to a fixed frequency to ensure comprehensive signal capture. An example of a data receiving unit is shown in Fig. 2.5a, while its hardware components are illustrated in Fig. 2.5b. (a) ADS-B data receiving unit at KFRG. (b) Components of data receiving units. Figure 2.5: ADS-B data receiving unit and its components. Considering the limited bandwidth of Raspberry Pis, the decoding, analysis, and visualization processes are conducted in AWS to efficiently handle the large and continuous data flow. Piracci et al. [28] have demonstrated that this ADS-B signal collection system maintains a 31 stable performance even under heavy signal congestion and jamming. While distant signals may experience interference, the on-site installation of receiver units minimizes these effects, ensuring reliable data reception and processing. The overall conceptual diagram of ADS-B data reception is illustrated in Fig. 2.6 Figure 2.6: Concept Map of ADS-B Reception. 32 2.4 Decoding Mechanism 2.4.1 Fundamentals of ADS-B Broadcast Aircraft equipped with ADS-B “Out” technology continuously broadcast essential flight information, including three-dimensional position data (latitude, longitude, and altitude), telemetry (ground speed, track, and rate of climb), and identifying details such as the aircraft’s unique ICAO address and, in some cases, its tail number or call sign. The frequency of these transmissions depends on the message type, which is specified by the type code within the message. In this project, three primary message types are of particular interest: surface messages, airborne position messages, and airborne velocity messages, all of which are transmitted at a rate of 2 Hz (twice per second). Position messages encode three-dimensional location data using the Compact Position Reporting (CPR) format, which enhances local accuracy and transmission efficiency. CPR achieves this by encoding coordinates in two alternating odd and even grid systems, allowing for precise global positioning through the convergence of these grids [1]. Velocity messages include key motion parameters such as ground speed (in knots), track, and ROC in knots per second. A more comprehensive mapping between message types and type codes is provided in Table 2.2. Since an aircraft can only broadcast one type of message at a time, position and velocity data are transmitted separately while in flight. However, when an aircraft is on the ground, position and velocity information are consolidated and sent together in surface messages. Regardless of the message type, aircraft in motion are expected to broadcast at 2 Hz, whereas 33 Table 2.2: ADS-B message type code and broadcast frequency [1]. Messages TC Ground (still) Ground (moving) Airborne Aircraft identification 1–4 0.1 Hz 0.2 Hz 0.2 Hz Surface position 5–8 0.2 Hz 2 Hz – Airborne position 9–18, 20–22 – – 2 Hz Airborne velocity 19 – – 2 Hz Aircraft status 28 0.2 Hz if no alert or code change 1.25 Hz if alert or code changes Target states and status 29 – – 0.8 Hz Operational status 31 0.2 Hz 0.4 Hz if no change in integrity or accuracy 1.25 Hz if integrity or accuracy changes stationary aircraft transmit at a reduced rate. The time intervals between consecutive messages of the same type should not be significantly lower than 0.5 seconds under normal conditions. This interval can fluctuate slightly due to random dithering, a mechanism designed to minimize the risk of message collisions from multiple aircraft transmitting simultaneously [1]. As previously noted, raw ADS-B data, as broadcast by aircraft, do not contain any absolute or relative timestamps. Since precise timing is essential for deriving kinematic properties such as velocity and ROC, temporal information is added by the Raspberry Pi processors based on the local system clock. This ensures that each message is timestamped at the moment it is received, allowing for time-based analyses. Once all necessary metadata have been appended, the encoded messages are transmitted to the AWS system, where they await further processing and decoding. 2.4.2 Timestamp Debuffering The traditional decoding algorithm processes raw messages in multiple steps, beginning with message type classification. In the first stage, position information from airborne position 34 messages and surface messages is extracted, along with the corresponding timestamps. Similarly, velocity data is decoded from airborne velocity messages. To construct complete messages that include timestamps, velocity, and position, several post-processing steps are applied: timestamps are rounded to the nearest integer, duplicate messages of the same type with identical rounded timestamps are removed, and finally, messages of different types are merged to form a complete data point. This structured approach ensures that each decoded message contains all necessary flight details while maintaining data consistency. Theoretically, this decoding method should yield messages that closely match the ground truth, assuming the raw messages and attached timestamps are accurate. In practice, however, significant discrepancies are observed. Fig. 2.7 illustrates a typical message stream, showing the inter-arrival times between consecutive decoded messages. The larger intervals predominantly align with multiples of 0.5 seconds, with slight variations likely caused by a built-in dithering mechanism intended to reduce the risk of message collisions. However, there are also unrealistically short intervals—less than 0.5 seconds—that contradict the expected ADS-B transmission behavior. The larger intervals suggest instances where intermediate messages were lost, while the smaller ones indicate that timestamps may have been buffered before being released. Two primary factors contribute to erroneous timestamp intervals. First, the demodulation and decoding process operates within a bandwidth-limited queue, meaning that messages do not exit the system with the same timing intervals as they entered. Second, the local processing unit (e.g., the Raspberry Pi) has limited computational capacity, leading to additional buffering and potential delays in message processing. Furthermore, message collisions during peak traffic periods, particularly at lower altitudes near the ground where aircraft are more 35 Figure 2.7: Typical inter-arrival times of buffered messages. densely concentrated, further distort the received message stream and introduce additional inconsistencies. It should be noted that there are expensive hardware solutions to mitigate data buffering issues. High-end demodulation hardware, where the decoding is accomplished in hardware via a field-programmable gate array (FPGA) rather than in software, often includes the capability to tag outgoing data with GPS timestamps. This ensures that the relative time intervals inferred from such hardware closely match the actual intervals at which the radio frequency messages were received, aligning with the expected transmission pattern. However, the methods discussed in this work are intended for scenarios where only widely available, cost-effective hobbyist-level equipment is used. Notably, this type of equipment forms the backbone of most crowd-sourced ADS-B data repositories, making this an important application. Furthermore, a growing number of small businesses are marketing ADS-B data products and analytical solutions to small airports, and their hardware and software platforms may not be significantly more sophisticated than those built with hobbyist-level technology. 36 To address the data buffering issue, a debuffering algorithm is developed in this project to improve the accuracy of recorded timestamps by repositioning data points to their most likely true temporal locations. This approach consists of two key components: data debuffering and data amalgamation. The fundamental principles of the debuffering algorithm are straightforward: • Any timestamp assigned to an ADS-B message must be later than the actual time at which the message reached the receiver due to processing delays caused by demodulation, decoding, and queueing. • Any consecutive messages with timestamp differences of less than 0.5 seconds were not actually transmitted at such short intervals; rather, their recorded timing reflects throughput limitations of the decoding process rather than the actual transmission intervals. Based on these principles, we propose shifting assigned timestamps backward in time using a first-in, first-out (FIFO) queue that runs in reverse order. This queue is designed with a capacity of two messages per second, aligning with the standard transmission rate of most ADS-B message types. While this method does not guarantee perfect realignment with actual arrival times, our hypothesis is that, on a larger scale, it produces timestamps that are significantly closer to the truth than the original buffered data. The overall process is illustrated in Fig. 2.8. In Fig. 2.8, the top timeline represents the original timestamps assigned to received ADS-B messages, where several intervals between messages are less than 0.5 seconds. Moving from right to left, these timestamps are adjusted through a FIFO queue with a capacity of two messages per second, resulting in the refined timestamps shown in the bottom timeline. Let ti and si represent the buffered and debuffered timestamps, respectively, of message i, then the FIFO queue operates by processing messages in reverse chronological order and applying the following recursion: 37 Figure 2.8: Illustration of debuffering algorithm. si = min{ti, si+1 − 0.5} (2.1) The original decoder’s method for amalgamating messages implicitly assumes that buffering has occurred, as it relies on rounded timestamps to merge position and velocity messages. When duplicate information arises due to rounding, only the most recent data point is retained. While effective for handling buffered data, this approach preserves the inaccuracies introduced by buffered timestamps, and it also loses data when frames are amalgamated through rounding that were originally distinct. A key challenge emerges when applying the debuffering algorithm: once timestamps are corrected, the resulting data points are spaced at least 0.5 seconds apart. This adjustment effectively shifts many original timestamps back by around 0.5 seconds. However, since the previous rounding-based amalgamation process also altered timestamps, its application becomes inconsistent with the fidelity required for this study. To address this, improving timestamp accuracy through debuffering necessitates a revised approach to message amalgamation at a higher resolution. Given that ADS-B messages with different type codes are transmitted independently at varying frequencies, an additional step is 38 introduced—stratifying messages by type code before applying debuffering. This adjustment ensures that the temporal alignment of messages is preserved according to their original transmission characteristics. Consequently, two distinct debuffering algorithm variants have been developed to optimize data integrity: • Debuffer complete messages which are already amalgamated. This approach is particularly useful for archival data where complete messages have been saved, but the original raw ADS-B messages are no longer available. It benefits users whose data collection systems lack structured archival storage, such as data warehouses or data lakes, or those who do not have access to raw ADS-B messages. • Stratify by message type before debuffering, then amalgamate. In this variant, raw ADS- B messages are first classified by type code in the decoder, debuffered separately, and then combined using a refined amalgamation process (described in the next section). This method better reconstructs the original transmission timeline and is expected to yield more accurate results when raw ADS-B messages are available. These two approaches provide flexibility based on the available data and the specific requirements of different users, ensuring that ADS-B timestamps can be corrected for various applications with different scenarios. The first 5 bits of the message section of an ADS-B message indicate the message type. In particular, in order to properly decode position, from its CPR format, a pair of odd/even position messages is required. It is notable that there are two types of position, i.e., surface position and airborne position. For the surface position messages, the altitude and rate of climb are defaulted to be 0. Surface position messages can also describe the movements and ground 39 track, which are also known as surface velocity. Then, the surface and airborne velocities are decoded by the software. Unlike surface velocity, airborne velocity messages are separate from airborne position messages. The rounding method of amalgamation relies on the premise that, once message timestamps are rounded to integer seconds, there will be the necessary odd/even position records, velocity records, etc., in the same group, to allow them to be merged to form a complete record. There can be excess messages of various types encountered during this process, and in this case older versions are dropped. Hence, this message has the added disadvantage that data are being lost; data that would have had a higher chance of being retained if the timestamps had been pushed back closer to the truth. When this rounding process is used and debuffer the resulting complete records, it is called “post-decoder debuffering”. 2.4.3 Message Amalgamation for In-Decoder Debuffering Another variant of the debuffering approach is named “in-decoder debuffering.” In this method, each message type is debuffered separately before being decoded and merged into a complete record. The process begins with debuffering and decoding surface and airborne position messages, which serve as “anchors” on the debuffered timeline. These anchor points are then used to attach any missing information, such as velocity data, to form complete records. Since surface position messages already contain velocity information, their timestamps remain unchanged even when debuffered separately. As a result, surface velocity and surface position messages can be directly merged without any data loss. In contrast, for airborne velocity and position messages, instead of rounding timestamps and discarding data, the algorithm searches for the nearest airborne velocity message within a 0.5-second range for each position 40 record. Fig. 2.9 illustrates this process and its various scenarios. Figure 2.9: Matching process for merging complete records. In this figure, the blue circles denote the positions on the timeline where the debuffered and merged position records fall, while the red circles represent velocity records that need to be paired with a corresponding position record. Each position record is assigned to its nearest velocity record, provided that the velocity record falls within the 0.5-second window. If no such velocity record is found, only the position is retained, as position data are considered the most critical component of the message stream. In cases where velocity data are missing, they can be approximated using numerical differentiation of adjacent position records under the assumption that speed does not change drastically over small time intervals. Any velocity records that do not have an associated position record within the allowable range are ignored, likely due to message garbling or missing data. A key reason for prioritizing position records over velocity records, and in particular, retaining their timestamps, is that an aircraft’s position changes constantly in flight, whereas velocity tends to be more stable. Slight adjustments to the timing of velocity records are unlikely to significantly affect accuracy. This principle is also followed in the rounding process used in the post-decoder debuffering variant. One significant advantage of the in-decoder approach is that it ensures the retention of all 41 position records, which might otherwise be lost due to rounding in post-decoder debuffering. By preserving as many velocity records as possible and improving timestamp alignment, this method generates a greater number of complete records. Fig. 2.10 demonstrates the extent to which additional data are retained when using in-decoder debuffering. Figure 2.10: Data retention comparison. In this figure, the difference in height between the blue and orange bars represents data loss for complete flights due to timestamp rounding in post-decoder debuffering. The green and yellow bars show a similar comparison for post-takeoff messages, highlighting that data loss is most significant for surface messages. Since debuffering effectively ”spreads out” surface messages, data integrity improvements can be directly validated through altitude integration plots. A more comprehensive quantitative validation is presented in the following subsection. 42 2.4.4 Validation Approaches for Debuffering Performances Complete ADS-B messages contain both barometric altitude data, in units of feet, and ROC, in units of feet per minute. At first glance, it might appear that there is redundancy here, as the altitude data should be the integral of the rate of climb data. However, there are instances where the rate of climb data in an aircraft’s message stream seem to be reasonable, while the altitude data do not. In such cases, it would make sense to construct an alternate version of the altitude data by integrating the rate of climb data. The complication is that one needs to know what time intervals over which to perform the integration. One way to validate the debuffering algorithm is to perform this rate of climb integration on flights whose altitude data is also reliable. The time intervals can be inferred from the consecutive differences in timestamps. If debuffering is doing a good job, then the debuffered timestamps should better replicate the altitude data (taken to be the “truth”) than the originally buffered timestamps would. Examples of applying this idea to some individual flights are shown in Fig. 2.11. With the proof of better performance on over a hundred flights compared to the rectangular rule and Simpson’s rule, the numerical integration is performed using the trapezoidal rule (because rates of climb can change quickly), but similar results can be found using Simpson’s rules. Fig. 2.11a shows a case of a single takeoff. The red line is the ground truth altitude data reported in the ADS-B data stream. The yellow line shows the results of integrating rate of climb to estimate altitude, based on the original buffered timestamps. Finally, the green line represents doing the same thing but with debuffered timestamps, and it is clearly more representative of the truth. Fig. 2.11b highlights a single flight that takes off and climbs, leveling off at just over 1000 43 (a) Altitude accuracy improvement with debuffering. (b) Altitude integration with buffered data. (c) Altitude integration with debuffered data. Figure 2.11: Comparison of buffered and debuffered data using integration. 44 ft, where it dwells for a short period, and then climbs again to just over 2000 ft. The integration step is inaccurate at the beginning of this exercise, because the timestamps were buffered, and this error remains for the rest of the trajectory. It is notable that in Fig. 2.11b, an incoherent section, which is caused by buffered timestamps, exists at the beginning of the takeoff operation. The same phenomenon can be found in many other takeoffs in KOSU, while the buffered timestamps in landings tend to be uniformly distributed along the time axis. A possible reason is that the layout of KOSU runways could lead to a higher signal density at the takeoff end, especially the area below 1500ft near KOSU. Both the original trajectory and the integrated trajectory are improved by applying debuffering, and thus a much better match is produced, as shown in Fig. 2.11c. For a more comprehensive test, 100 flights are selected to test this integrated altitude comparison. Of these flights, 75 of them are single takeoffs, and the rest are touch-and-go flights. The recorded altitudes in the received messages are deemed as the ground truth. Two cases are reconstructed for each flight: first, reconstruct trajectories with ROC and the buffered timestamps, and compare to the altitude data (assessed at those same buffered timestamps); second, perform the same operation with the debuffered timestamps. The error metric in each case is the final vertical distance (in units of feet) of altitude deviation between the true and the integrated altitude. This measurement is called “drift” in this study because it represents the final extent by which the two profiles have deviated. For the single takeoff flights, the experiment is started at the point of takeoff. The touch- and-go flights all start and end at 0 ft altitude above ground level (AGL). Altitude integration is more challenging for such flights, because it is our experience that the ROC data are biased towards positive rates of climb. Thus, even though the actual flight profile would suggest that 45 the troughs of the altitude profile should be very close to 0 ft AGL, the integrated data show their altitudes increasing over time, contributing to the final value of drift. This is ameliorated somewhat, but not entirely, by the debuffering process. Other possible explanations include differences in “visibility” of the transmitting antenna on the aircraft in climbing and descending attitudes, which could also depend on the receiver antenna placement. At this point, this phenomenon is not completely understood, and more inquiry is required. Table 2.3 shows a sample of 40 out of these 100 flights. Any row with a positive sign for the value of the improvement indicates that the drift derived from debuffered timestamps is better (i.e., less) than what would have been obtained with the original buffered timestamps. For these results, all the debuffered drifts are calculated using in-decoder debuffered data. From the takeoff flights, it could be found that 52 of the 75 tested takeoffs have smaller debuffered drifts. The average improvement is 40.77 ft. Only 2 cases have similar drifts in the buffered and debuffered trajectories. For the rest of the cases, the buffered drifts and debuffered drifts have an average difference of 12.63 ft, which is even smaller than the altitude resolution of 1090 MHz devices (25 ft). Moreover, 12 of the 15 cases with buffered drifts over 100 ft have smaller debuffered drifts. This result indicates that debuffering has the potential to ease drifts for flights with large initial integration deviation. As for the touch-and-go flights, 80% of the flights’ drifts are improved by debuffering. Compared to takeoffs, the touch-and-goes’ drifts are much larger; some of them are over 1000 ft, while the others are over 100 ft. The reason is that longer flight durations will cause more drifts to accumulate. Despite the ADS-B message stream not containing explicit time information, it is possible to approximate the elapsed time between consecutive messages using the other telemetry data in those messages. For message i, we denote the latitude lati, longitude loni, and speed vi. Since 46 aircraft tend not to exhibit large values of jerk at any phase of operations, it is assumed here that the acceleration is constant between two consecutive messages. So that the mean velocity between the two instances can be calculated by: vm ≃ vi+1 − vi 2 . (2.2) The elapsed time between the two messages can then be estimated by: ∆t ≈ d(lati, loni, lati+1, loni+1) vm , (2.3) where d(lati, loni, lati+1, loni+1) is a distance measurement based on the two pairs of latitude and longitude. We call this estimate of the time interval the ”implicit” time interval for this pair of messages. For the short time intervals expected in this application, the Haversine distance formula should be sufficiently accurate, and that is what was used to produce the following results. It should be noted that, of course, these position data are collected by the aircraft with some error distribution. If those errors are strongly correlated between consecutive measurements, then this distance estimate will be quite accurate; otherwise, it is less so. This method of validation, then, is to compare the values of ∆t estimated from each pair of consecutive messages, with the differences in the recorded timestamps (either buffered or debuffered). The validation hypothesis here is that debuffering is improving the accuracy of the timestamps if the sample correlation between the implicit timestamps and the debuffered timestamps is higher than that between the implicit timestamps and the buffered timestamps. Fig. 2.12 shows the results of this validation step for 50 representative flights. The blue bars represent the sample correlations between intervals formed from the original buffered 47 timestamps, and the implicit intervals constructed from Eq. 2.3. Figure 2.12: Correlation coefficients between implicit and buffered/debuffered time intervals for flights from KFRG and KOSU. Compared to drift measurements, this error metric provides a more stable validation approach because it does not involve accumulative errors. Each implicit and explicit time interval pair contains information from only two closely spaced data points, ensuring that the implicit and explicit timestamps remain independent, thus preventing bias accumulation. As part of a broader numerical experiment, all flights without significant data gaps in our database are analyzed from two small airports: KOSU in Ohio and KFRG in New York. Flights with large time gaps were excluded because such gaps render implicit timestamps unreliable, and debuffering has no impact over extended missing intervals. Among the 77 flights from KOSU, only one exhibited a lower debuffered correlation coefficient than its buffered counterpart. Similarly, for the 384 flights from KFRG, 96.61% demonstrated improved correlation coefficients following post-decoder debuffering. Notably, 100 of these flights involved single takeoff operations, with 99 showing better post-decoder coefficients than their buffered versions. Additionally, 63 of them had higher in-decoder debuffered coefficients compared to post-decoder results. On average, post-decoder debuffering improved correlation coefficients by 25.8%, while 48 in-decoder debuffering yielded an even greater enhancement of 30.24%. Table 2.3: Improvement in drift metric after debuffering. Flight ID Debuffered Drift (ft) Buffered Drift (ft) Improvement (ft) 356015 -29.11 -72.56 43.45 356185 42.87 48.80 5.93 356930 182.90 349.12 166.21 380627 -2.23 -30.43 28.20 283714 -7.43 -197.40 189.97 282936 -3.79 -20.38 16.58 284363 26.14 -377.91 351.77 271063 -5.21 -0.65 -4.55 282983 -30.04 -39.14 9.10 149569 -0.24 -0.87 0.63 284168 20.24 25.31 5.07 284451 -38.95 -57.16 18.22 283423 243.98 247.98 4.00 282676 139.93 142.03 2.10 283123 106.60 106.58 -0.02 283900 -70.10 -63.77 -6.33 270408 -12.18 -56.22 44.04 284314 -48.43 -23.00 -25.43 275489 7.19 -33.34 26.14 282259 36.81 24.76 -12.05 283207 5.02 8.47 3.45 378372 -11.75 1.03 -10.72 376508 -64.25 -70.18 5.93 376662 -29.39 -30.75 1.37 378598 -237.54 -246.57 9.03 376660 -10.64 -16.63 5.99 376792 49.36 72.40 23.04 377324 -300.37 -275.52 -24.84 377198 62.66 71.84 9.18 377754 -21.67 -27.36 5.69 377547 -107.07 -102.62 -4.45 378488 -16.98 -14.25 -2.72 377035 -37.27 48.30 11.03 377752 25.87 48.51 22.63 376918 -33.63 -72.12 38.49 377176 48.56 91.04 42.48 381389 -50.25 -13.72 -36.52 378880 -69.37 -84.71 15.34 378789 -16.64 -24.48 7.83 381209 2.27 -12.44 10.16 49 In summary, post-decoder debuffering significantly improves timestamp accuracy based on two validation techniques that use information from the ADS-B data stream as ground truth. Furthermore, in-decoder debuffering consistently outperforms the post-decoder approach. However, while in-decoder debuffering provides the best results, it requires access to the original ADS-B messages before they have been amalgamated. In cases where only archived, pre-decoded data is available, the post-decoder method remains a superior alternative to using buffered timestamps. The observations in this study indicate that some data streams contain substantial temporal gaps, likely due to antenna occlusion caused by building obstruction or suboptimal positioning relative to runway geometry. Any ADS-B data collection effort would benefit from using an externally mounted antenna with a clear line of sight to the entire airfield. Additionally, kinematic estimates over long data gaps should be interpreted cautiously, as aircraft speed, velocity, and heading cannot be assumed to remain constant over extended periods. Future improvements to the ADS-B communications protocol could consider incorporating standardized timing information in a subset of transmissions, enabling direct measurement of transmission and decoding lags. A potential experiment to refine timestamp accuracy would involve logging transponder-generated timestamps within the cockpit of a known aircraft and comparing these records with the received and decoded ADS-B data. However, given the curren