ABSTRACT Title of Dissertation: EMPOWERING TRAFFIC OPERATIONS AND SAFETY WITH TRANSPORTATION BIG DATA: PRACTICE SCAN, METHODOLOGY, AND APPLICATIONS Mofeng Yang, Doctor of Philosophy, 2022 Dissertation directed by: Professor, Paul Schonfeld, Department of Civil and Environmental Engineering In the past two decades, along with the technological advancement in mobile sensors and mobile networks, transportation big data, such as probe vehicle data and mobile device location data (MDLD), have been growing dramatically in terms of the spatiotemporal coverage of population and its mobility. These data sources have shown their great potential for large-scale and near real- time transportation applications to support travel behavior analysis, travel demand modeling, traffic operations and safety analyses. The objectives of this dissertation are to (1) comprehensively examine the state-of-the-practice applications and the state-of-the-art models developed based on emerging transportation big data, (2) identify key metrics, and (3) establish a series of big-data driven frameworks to enhance traffic operations and safety. Three main sections are included. The first section of this dissertation presents a literature review on models, tools, and metrics used for various levels of traffic analysis, and analyzes a survey distributed to transportation professionals to quantify the importance of these key metrics for improving traffic operations and safety. Based on the literature review and survey insights, two big-data driven frameworks are proposed accordingly to address both traffic operations and safety issues. In the second section of this dissertation, a big-data driven framework is developed which aims at improving the accuracy and reliability of emergency medical services (EMS) and trauma triage decisions for elderly persons at crash sites. The proposed framework integrates transportation big data sources from both the demand side (such as traffic volumes, and time- dependent vehicle speeds obtained from large-scale probe vehicles) and the supply side (i.e., transportation network features), as well as publicly available statewide crash data with health- related decisions such as EMS and hospital records. Decision tree models are adopted to simulate the decision-making process due to their wide applications, a proven capability in prediction, and interoperability. With records of over 55,000 elderly patients, results demonstrate that the proposed framework contributed to enhanced EMS decision and trauma triage accuracy for the elderly, and saving more lives from severe vehicle crashes. In the third section of this dissertation, a big-data driven framework is proposed for estimating a critical operational metric, namely vehicle volume, on an all-street network, and further estimating the pedestrian and bicyclist crashes at all intersections. This framework employs a series of cloud-based computational algorithms to extract multimodal trajectories and trip rosters from terabytes of MDLD. A scalable map matching and routing algorithm is then applied to snap and route vehicle trajectories to the roadway network. The observed vehicle counts on each roadway segment are weighted and calibrated against ground truth control totals, i.e., Annual Vehicle Miles of Travel (AVMT), and Annual Average Daily Traffic (AADT). The proposed framework is built on Amazon Web Service (AWS) which leverages cloud computing techniques to estimate vehicle volumes for all roadway segments in the state of Maryland using MDLD for the entire year 2019. The estimated vehicle volume is further integrated with statewide crash records to estimate the pedestrian and bicyclist crashes at all intersections with statistical models. Results indicate that the proposed framework can produce reliable vehicle volume estimates and estimated pedestrian and bicyclist crashes, while also demonstrating its transferability and generalization ability. In summary, this dissertation comprehensively examines the literature on transportation big data applications and proposes two big-data driven frameworks demonstrated with two real- world case studies. Results reveal the feasibility and advantages of empowering traffic operations and safety analysis with transportation big data. EMPOWERING TRAFFIC OPERATIONS AND SAFETY WITH TRANSPORTATION BIG DATA: PRACTICE SCAN, METHODOLOGY AND APPLICATIONS by Mofeng Yang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2022 Advisory Committee: Professor Paul Schonfeld, Chair, Department of Civil and Environmental Engineering Professor Kathleen Stewart, Dean’s Representative, Department of Geographical Sciences Professor Ali Haghani, Department of Civil and Environmental Engineering Assistant Professor Taylor Oshan, Department of Geographical Sciences Assistant Professor Chenfeng Xiong, Department of Civil and Environmental Engineering, Villanova University © Copyright by Mofeng Yang 2022 ii Dedication To my beloved parents Kun Yang and Peifan Li, and my wife Zhiyue Xia. And to my grandparents in heaven. iii Acknowledgements This dissertation is partially funded by U.S. Department of Transportation (U.S. DOT), Federal Highway Administration (FHWA), Maryland Department of Transportation State Highway Administration (MDOT SHA), and Maryland Transportation Institute (MTI). Opinions herein do not necessarily represent the views of the research sponsors. The author is responsible for the statements in the thesis. “Night is now falling. So ends this day. The road is now calling, and I must away.” This is the lyrics from the song “The Last Goodbye” from the movie “The Hobbit”. Just like Bilbo Baggins, instead of reclaiming the Lonely Mountain from the dragon Smaug, I was sitting in the plane at Beijing airport, waiting for my unknown journey to the other side of the Pacific Ocean. Now, four years have passed, and I am here writing this acknowledge to summarize this “unexpected journey”. First, I would like to express my sincere gratitude to my advisor, Dr. Paul Schonfeld for his continuous guidance since I joined the program in 2018. Dr. Schonfeld became my advisor in June 2022 and was also the committee member for my master thesis in 2020. I really appreciate all the support and guidance Dr. Schonfeld provides. I would also like to express my special thanks to Dr. Kathleen Stewart, not only for being the dean’s representative of my dissertation committee, but also for the guidance through research collaborations in the past few years. Dr. Stewart’s research works have always been my top references when conducting my own research. In the meantime, I would also like to thank all my doctoral dissertation committee members: Dr. Ali Haghani, Dr. Taylor Oshan, and Dr. Chenfeng iv Xiong for offering me their valuable comments to improve my research. I am extremely grateful to Dr. Haghani for his support when my previous advisor left the university. I sincerely appreciate Dr. Oshan for the perfect course I took with him at the Department of Geographical Science as well as serving in my master thesis committee. A special thanks to Dr. Chenfeng Xiong. The past four years of working together under Dr. Xiong supervision will be an unforgettable experience that I will always cherish. I would like to thank Dr. Lei Zhang for giving me the opportunity to come to the U.S. and get me involved into research projects. I want to thank my colleagues at Maryland Transportation: Jina Mahmoudi, Sepehr Ghader, Aref Darzi, Minha Lee, Weiyi Zhou, Aliakbar Kabiri, Songhua Hu, Guangchen Zhao, Weiyu Luo, Mohammad Ashoori, Saeed Saleh, Asal Tabrizi. Last, I would like to thank my parents, Kun Yang and Peifan Li, for always respecting my opinions and decisions. Thanks to my grandparents, Shizhen Yang and Daohua Chen, who passed away in 2021 and 2022. It was, is and will always be a pity in my life that I couldn’t be with you at the very last moment of your life. Thanks for my wife, Zhiyue Xia, who always supports me whenever I meet obstacles. Though not knowing the journey and where it leads, I embrace it, and I welcome every moment of it. Just like what it says in the song: “And though where the road then takes me. I cannot tell. We came all this way. But now comes the day. To bid you farewell” Mofeng Yang, at Greenbelt, MD v Table of Contents Dedication ....................................................................................................................................... ii Acknowledgements ........................................................................................................................ iii Table of Contents ............................................................................................................................ v List of Tables ............................................................................................................................... viii List of Figures ................................................................................................................................ ix List of Abbreviations ...................................................................................................................... x Chapter 1: Introduction ................................................................................................................... 1 1.1 Background ........................................................................................................................... 1 1.2 Objectives.............................................................................................................................. 2 1.3 Contributions ........................................................................................................................ 3 1.3.1 Uniqueness of the Data .................................................................................................. 3 1.3.2 Methodology Innovations .............................................................................................. 4 1.3.3 Applications ................................................................................................................... 5 1.4 Organization ......................................................................................................................... 5 Chapter 2: Literature Review .......................................................................................................... 8 2.1 Transportation Big Data Applications ................................................................................. 8 2.1.1 GPS Data ........................................................................................................................ 8 2.1.2 Cellular and Sighting Data ............................................................................................. 9 2.1.3 Location-based Service Data ....................................................................................... 10 2.2 Models and Algorithms for Transportation Big Data ........................................................ 11 2.2.1 Trip End Identification ................................................................................................. 11 2.2.2 Travel Mode Imputation .............................................................................................. 12 2.3 Transportation Big Data for Traffic Operations and Safety .............................................. 15 2.3.1 State-of-the-Practice on Crash Scene Decision Makings ............................................ 15 2.3.2 Estimating Vehicle Volume based on Transportation Big Data .................................. 18 2.3.3 Pedestrian and Bicyclist Crashes Estimation Methods ................................................ 20 Chapter 3: Identification of Metrics Used for Various Levels of Traffic Analysis ...................... 26 3.1 Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis 26 vi 3.2 Operations Practice Scan Survey ....................................................................................... 29 3.3 Survey Results ..................................................................................................................... 31 3.4 Performance Metrics Flowchart ......................................................................................... 34 3.4 Summary ............................................................................................................................. 37 Chapter 4: Supporting Triage Decisions for High-Risk Trauma Patients at Crash Sites with Location Data ................................................................................................................................ 38 4.1 Introduction ........................................................................................................................ 38 4.2 The Big-Data Driven Framework Integrating Transportation and Health Data ............... 39 4.2.1 Integrated Transportation and Health Data .................................................................. 40 4.2.2 Modeling Method ......................................................................................................... 44 4.3 Results and Discussions ...................................................................................................... 47 4.4 Conclusions ......................................................................................................................... 52 Chapter 5: A Big-Data Driven Framework for Estimating Vehicle Volume on Mobile Device Location Data ................................................................................................................................ 54 5.1 Problem Statement .............................................................................................................. 54 5.2 The Big-Data Driven Framework for Estimating Vehicle Volume and Pedestrian and Bicyclist Crashes....................................................................................................................... 55 5.2.1 The Framework ............................................................................................................ 55 5.2.2 Trip Identification and Travel Mode Imputation ......................................................... 56 5.2.3 Scalable Map Matching and Routing via Cloud Computing ....................................... 58 5.2.4 Weighting ..................................................................................................................... 61 5.2.4 Volume Calibration ...................................................................................................... 61 5.3 Vehicle Volume Estimation Case Study: the State of Maryland ......................................... 62 5.3.1 Data .............................................................................................................................. 62 5.3.2 Vehicle Volume Estimation Results ............................................................................ 65 5.4 Conclusion .......................................................................................................................... 72 Chapter 6: Modeling Pedestrians and Bicyclist Crashes with Transportation Big Data .............. 73 6.1 Pedestrian and Bicyclist Crashes Estimation ..................................................................... 73 6.1.1 Poisson and NB Models ............................................................................................... 73 6.1.1 ZIP and ZINB Models.................................................................................................. 75 6.2 Pedestrian and Bicyclist Crashes Estimation Case Study: the State of Maryland ............. 76 vii 6.2.1 Data .............................................................................................................................. 76 6.2.2 Model Development ..................................................................................................... 83 6.2.3 Pedestrian and Bicyclist Crash Estimation Results...................................................... 86 6.2.4 Assessment of Contribution of the Vehicle Volume and Pedestrian and Bicyclist Volume Estimated by MDLD to Model Performance .......................................................... 92 6.3 Conclusions and Discussions.............................................................................................. 93 Chapter 7: Conclusions, Limitations and Future Works .............................................................. 95 7.1 Conclusions ......................................................................................................................... 95 7.2 Limitations .......................................................................................................................... 96 7.3 Future Works ...................................................................................................................... 97 Appendix I. MDOT SHA Operations Practice Scan Survey ...................................................... 100 I.1 Survey ................................................................................................................................ 100 I.2 Survey Results .................................................................................................................... 112 Bibliography ............................................................................................................................... 121 viii List of Tables Table 2-1. Studies on Travel Mode Imputation Methods. ............................................................ 13 Table 2-2. State-of-the-Art Methodologies of Trauma Triage ..................................................... 16 Table 2-3. Examples of Past Studies on Pedestrian and Bicyclist Safety Models ........................ 24 Table 3-1. State-of-the-Practice Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis ..................................................................................................... 27 Table 3-2. States for which the Respondents Work. ..................................................................... 31 Table 4-1. Descriptive Statistics of the CODES Data .................................................................. 41 Table 4-2. Decision Tree Model Evaluations ............................................................................... 47 Table 4-3. Model Performance Measures and Comparison.......................................................... 48 Table 5-1. Volume Calibration Results Comparison by Link Type ............................................. 68 Table 5-2. Volume Calibration Results by Urban/Rural Status. ................................................... 70 Table 6-1. Level of Traffic Stress Correspondence Table ............................................................ 79 Table 6-2. Frequency of Pedestrian/Bicyclist Crashes at Maryland Intersections in 2019 .......... 83 Table 6-3. Independent Variables for Pedestrian/Bicyclist Crash Frequency Models ................. 84 Table 6-4. Results of the Pedestrian/Bicyclist Crash Frequency Models ..................................... 86 Table 6-5. Model Improvement Assessment Based on LBS Variables ........................................ 92 ix List of Figures Figure 1-1. Dissertation Outline. .................................................................................................... 6 Figure 3-1. Agencies that the Respondents Work at..................................................................... 31 Figure 3-2. Projects that the Respondents Work on. .................................................................... 31 Figure 3-3. Projects that the Respondents Work on. .................................................................... 33 Figure 4-1. The Big-Data Driven Framework for Integrating Transportation and Health Data. .. 39 Figure 4-2. CODES Data in the state of Maryland. ...................................................................... 41 Figure 4-3. Annual Average Daily Traffic in the state of Maryland. ........................................... 43 Figure 4-4. The Decision Tree Model of EMS Triage Using the Integrated Data ....................... 49 Figure 4-5. The Decision Tree Model of Trauma Triage Using the Integrated Data ................... 50 Figure 5-1. The Big-Data Driven Framework for Estimating Vehicle Volume and Pedestrian and Bicyclist Crashes ........................................................................................................................... 55 Figure 5-2. The Data-Driven Travel Mode Share Estimation Framework ................................... 57 Figure 5-3. Distribution of Distance between Link Nodes in the OSM Network ........................ 58 Figure 5-4. Example of Map Matching and Routing. ................................................................... 60 Figure 5-5. Mobile Device Location Data around the State of Maryland. ................................... 62 Figure 5-6. Number of Lanes and Speed Limits in OSM ............................................................. 63 Figure 5-7. (a) Weighted Vehicle Volume in Training Set; (b) Calibrated Vehicle Volume in Training Set; (c) Weighted Vehicle Volume in Testing Set; (d) Calibrated Vehicle Volume in Testing Set. ................................................................................................................................... 67 Figure 5-8. Volume Calibration Results Comparison by Link Type. ........................................... 68 Figure 5-9. Volume Calibration Results Comparison by Urban/Rural Status. ............................. 70 Figure 5-10. Visualization of Calibrated Vehicle Volume. (a) the State of Maryland; (b) Washington D.C.; (c) Baltimore City; (d) Hagerstown, MD. ....................................................... 71 Figure 6-1. LTS Examples (Source: http://www.northeastern.edu/peter.furth/research/level-of- traffic-stress) ................................................................................................................................. 78 Figure 6-2. LTS Estimates for: (a) the state of Maryland; (b) University of Maryland College Park Campus; (c) Baltimore City .................................................................................................. 81 Figure 6-3. ZINB model performance. ......................................................................................... 89 x List of Abbreviations AADT Annual Average Daily Traffic ANN Artificial Neural Networks AVMT Annual Vehicle Miles of Travel AWS Amazon Web Service BI Bayesian Inference BMC Baltimore Metropolitan Council BN Bayesian Network BTS Bureau of Transportation Statistics CART Classification and Regression Tree CASI Computer-Assisted Self-Interview CATI Computer-Assisted Telephone Interview CATT Center for Advanced Transportation Technology CBSA Core-based Statistical Area CDR Call Detail Record CNN Convolutional neural Network CODES Crash Outcome Data Evaluation System DBSCAN Density-based Spatial Clustering of Applications with Noise DHMM Discrete Hidden Markov Model DNN Deep Neural Networks DMV Washington Metropolitan Area EMS Emergency Medical Service FHWA Federal Highway Administration GPS Global Positioning System HPMS Highway Performance Monitoring System KNN K-Nearest Neighbors LBS Location-based Service LPR License Plate Recognition LRI Location Recording Interval MDOT Maryland Department of Transportation MDOT SHA Maryland Department of Transportation State Highway Administration MLP Multi-Layer Perceptron MaaS Mobility-as-a-Service MTI Maryland Transportation Institute MWCOG Metropolitan Washington Council of Government NASS National Automotive Sampling System NB Negative Binomial xi NHTS National Household Travel Survey NHTSA National Highway Traffic Safety Administration NTM National Transit Map OD Origin and Destination OSM OpenStreetMap PAPI Paper-And-Pencil Interview PCMDL Passively Collected Mobile Device Location RETTS-A Rapid Emergency Triage and Treatment System RF Random Forest RITIS Regional Integrated Transportation Information System SHA State Highway Administration SMOTE Synthetic Minority Over-sampling Technique SVC Support Vector Classifier SVM Support Vector Machine TAZ Traffic Analysis Zone TPB Transportation Planning Board TRBAM Transportation Research Board Annual Meeting UMD University of Maryland U.S. the United States USDOE the United States Department of Energy USDOT the United States Department of Transportation XGB eXtreme Gradient Boosting ZIP Zero-Inflated Poisson ZINB Zero-Inflated Negative Binomial 1 Chapter 1: Introduction 1.1 Background The current federal transportation legislation, "Moving Ahead for Progress in the 21st Century" (MAP-21), which was signed into law on July 6, 2012, advances statewide and metropolitan planning processes to incorporate a comprehensive performance-based approach to decision- making. Typical performance metrics in planning for traffic operations and safety include changes in vehicle trips, vehicle miles traveled, emissions reduction, travel time savings, improvements in travel time reliability, energy consumption reduction, noise impacts, safety impacts, monetary values of these changes, and lists of traffic operations equipment and costs. In addition to traditional data sources such as loop detector data, and video-based traffic counts, emerging transportation big data such as probe vehicle data, connected vehicle data, and passively collected mobile device location data have enabled large-scale and near real-time models and methods to support operation, safety, demand, and planning analyses. More specifically, it is now possible to tell which users and which origin-destination pairs are using a particular transportation facility and in turn. In the past two decades, along with the technological advancement in mobile sensors and mobile networks, transportation big data, such as probe vehicle data and mobile device location data (MDLD), have been growing dramatically in terms of the spatiotemporal coverage of population and its mobility. Initially, these data sources are considered supplements to travel surveys and travel behavior analysis. A series of practices and research studies have demonstrated the effectiveness of such data in enhancing traditional travel surveys as well as revealed its great potential to replace travel surveys [1, 2]. At the same time, obtaining travel statistics solely based 2 on these data sources are also worth investigating in order to reduce labor and cost compared to travel surveys. Apart from supporting travel surveys and travel behavior analysis, in recent years, transportation big data have also been leveraged in large-scale and near real-time transportation applications for traffic operations and safety analysis. For instance, vehicle volume, as a critical operational metric, is the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans, and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for large-scale implementations. Researchers and private sector companies have explored alternative solutions such as probe vehicle data, which still suffer from low penetration rates. With the introduction of transportation big data, vehicle volume can be estimated from terabytes of movement data for a larger geographical area with larger sample size. 1.2 Objectives The objective of this dissertation is to comprehensively examine the state-of-the-practice transportation big data applications and to develop state-of-the-art big-data driven frameworks that fully leverage the potential of the transportation big data, cloud computing techniques to improve traffic operations and safety analysis. In order to fulfill the research objective, four tasks are identified: (1) evaluating the state-of-the-practice applications and the state-of-the-art methods based on transportation big data and identifying the key research gap; (2) designing and distributing a survey to transportation professionals to identify key metrics for varying level of traffic analysis focusing on traffic operations and safety; (3) developing a big-data driven framework that leverages the instantaneous vehicle speed estimated nearly in real-time from large- 3 scale probe vehicle data to enhance Emergency Medical Services (EMS) and trauma triage for elderly persons at crash sites; (4) developing and validating a big-data driven framework that estimates vehicle volume on an all-street network and quantifies the pedestrian and bicyclist crashes at all intersections. 1.3 Contributions The main contributions of this dissertation can be classified into three aspects: (1) Uniqueness of the data; (2) Methodology innovations; (3) Applications. 1.3.1 Uniqueness of the Data This dissertation utilizes multiple transportation big data sources and develops in-house data collection methods for model development and validation. Though extensive studies have leveraged transportation big data such as MDLD and probe vehicles data, transportation big data used in this dissertation include both probe vehicle data from the Regional Integrated Transportation Information System (RITIS) developed by the Center for Advanced Transportation Technology (CATT) and large-scale anonymized MDLD from the Maryland Transportation Institute (MTI). Both of the aforementioned data sources cover the entire U.S. and are able to capture high-granularity spatiotemporal traffic and movement across the country. Therefore, the methodology frameworks proposed in this dissertation have the advantage of generalization to expand to the entire U.S. In addition to these external data sources, an in-house data collection method is achieved by developing one of the most advanced Mobility-as-a-Service (MaaS) mobile application, incenTrip. incenTrip collects travel behavior data with user-confirmed ground truth information including trip origin and destination, departure time, and travel mode. The intermediate locations of each trip are also recorded for model development. 4 Apart from transportation big data, this dissertation also collects multimodal transportation networks and transportation statistics from multiple authorized sources, such as the OpenStreetMap (OSM), the Maryland Department of Transportation State Highway Administration (MDOT SHA), United States Department of Transportation (USDOT) Bureau of Transportation Statistics (BTS), USDOT National Transit Map (NTM). These publicly available data sources are used together with the transportation big data sources for model developments and validations. 1.3.2 Methodology Innovations The methodology contribution of this dissertation focuses on the following aspect: (1) Integrating transportation big data with health-related data: This study is among the first to develop and demonstrate a methodological framework for integrating transportation-sector data with health-related data to support various decision-making scenarios in transportation safety, emergency responses, and trauma-care triage. (2) Computation algorithms for deriving travel behavior data from large-scale MDLD: The computation algorithms proposed in this study are developed in order to derive travel behavior data, such as activity locations, and travel modes, from large-scale MDLD. They are calibrated and validated against the existing travel surveys and annual vehicle miles of travel in order to serve real-world research needs. Also, these algorithms are further compiled and integrated into a data pipeline in the Amazon Web Service (AWS) platform which fully leverages the computing power and scalability of cloud computing techniques. (3) Scalable map matching and routing, weighting, and calibration algorithms that have superior transferability and generalization ability: The big-data framework proposed to estimate vehicle volume and pedestrian and bicyclist can be applied for every state in the U.S. The 5 algorithms are scalable, and the data sources proposed for weighting and calibration are generally available across the U.S. 1.3.3 Applications The frameworks proposed in this dissertation have been calibrated, validated, and ultimately deployed in several real-world applications and have huge potential for boosting future transportation big data applications. These applications cover a wide range of topics, including travel demand management (incenTrip, https://incentrip.org), traffic operations and safety (Vulnerable Road User Density Exposure Dashboard project, https://mti.umd.edu/sdi), public health and mobility (University of Maryland COVID-19 Impact Analysis Platform, https://data.covid.umd.edu), travel behavior analysis and surveys (Next Generation National Household Travel Survey National Passenger/Truck Origin-Destination Data, https://nhts.ornl.gov/od). 1.4 Organization https://incentrip.org/ https://mti.umd.edu/sdi https://data.covid.umd.edu/ https://nhts.ornl.gov/od 6 Figure 1-1. Dissertation Outline. The outline of this dissertation is organized as shown in Figure 1-1. Chapter 2 provides a comprehensive literature review about the state-of-the-practice transportation applications using and the state-of-the-art methodologies applied to transportation big data. Chapter 3 firstly scans the existing tools and metrics for traffic operations and safety analysis. Then a survey is designed and distributed to transportation professionals to identify key metrics for traffic operations and safety analyses. Chapter 4 develops a framework that leverages large-scale probe vehicle data to 7 improve the accuracy and reliability of emergency medical services (EMS) and trauma triage decision scenarios for the elderly population in crashes. Chapter 5 proposes a big data-driven framework. The proposed framework leverages cloud computing techniques to digest terabytes of transportation big data and produces an important operational metric, vehicle volume, on the all- street network and further estimating the corresponding pedestrian and bicyclist crashes in Chapter 6. Finally, Chapter 7 summarizes the conclusion and suggests future research directions. 8 Chapter 2: Literature Review 2.1 Transportation Big Data Applications In this section, existing applications based on transportation big data are reviewed. The transportation big data are categorized into three types: Global Positioning Service (GPS) data, Cellular and sighting data, and Location-based Service (LBS) data, where a majority can be also categorized as Mobile Device Location Data (MDLD). The applications for each type are reviewed and the state-of-the-art methods are summarized. 2.1.1 GPS Data The earliest and most widely used type of transportation big data is that obtained from GPS technology, where personal longitudinal location data is collected via GPS data loggers. Since the mid-1990s, researchers began investigating the possibility of using GPS data to enhance the quality of travel surveys. The initial version of the GPS data logger could only be installed in a vehicle and charged by the vehicle’s battery [3-11]. The vehicle location was recorded in each second when the vehicle was moving [6]. This approach can significantly improve the spatiotemporal accuracy of travel surveys by recording the exact origin and destination as Ill as the trip start and end times, but it only captures vehicle trips. Later, the wearable GPS further allowed respondents to carry them so that trips traveled by non-vehicle travel modes could also be recorded [12-15]. Some travel surveys utilized both in-vehicle and wearable GPS data loggers to take advantage of both technologies [16-18]. Since the GPS data can offer accurate locations of the devices, access to individual-level trajectories is highly restricted. Therefore, the individual-level GPS data are also aggregated by private sector companies to reveal travel demand without raising privacy concerns. For instance, 9 INRIX Traffic collects GPS probe data from commercial vehicle fleets, connected vehicles, and mobile device applications [19]. RITIS also started to incorporate the probe vehicle data into their commercial products. The data can be further aggregated into link- or corridor- levels to provide a real-time estimation of traffic speed and travel time [20-22]. Nonetheless, the low penetration rate (i.e., 2%-10%) of the commercial probe vehicle data remains the core challenge with respect to drawing the whole picture of travel patterns. 2.1.2 Cellular and Sighting Data Since mobile devices, such as smartphones and tablets, have gained in popularity, investigations into individual-level mobility patterns have become more practical. The cellular data, which are generated through communication between cellphones and cell towers when a phone call or a text message is made by the phone [23], have shown their great value in supporting large-scale travel demand analysis. In general, the cellular data can be categorized into Call Detail Record (CDR) and sightings [24]. Call Detail Record (CDR) data provide details on calls and messages, such as timestamp, duration, and locations of routing cell towers. Therefore, the location information of CDR data fully depends on the density of the cellular network and does not reflect the actual location of the device [24]. Similarly, sightings are also generated through communication with cell towers, but the actual location of the device is determined via triangular calculation [24]. Both types of cellular data have been widely used in studying human mobility patterns in the past two decades. For instance, Gonzalez, et al. combined two sets of CDRs to explore individual mobility patterns; one is composed of six months of records for 100,000 randomly selected anonymous individuals and the other is a complementary dataset capturing the locations of 206 mobile phone users every two hours for one week [25]. Further studies on human mobility have been conducted based on similar datasets [26-33]. Cellular data have also been widely applied in other research 10 areas such as social networks, residential location, and socioeconomic level [34-36]. Despite the large volume of data, cellular data are limited by their spatial and temporal resolution, which is determined by the density of cell towers and user cellphone usage [37]. However, on the positive side, cellular data require less advanced phones and should raise less concern about user privacy. 2.1.3 Location-based Service Data Another type of transportation big data is Location-based Service (LBS) data, in which spatial information is generated when a mobile application updates the device’s location with the most accurate sources, based on the existing location sensors such as Wi-Fi, Bluetooth, cellular tower, and GPS [24, 38]. Compared to the CDR data, the LBS data can reflect the exact location of mobile devices and thus provide invaluable location information describing individual-level mobility patterns [24, 25-27, 38, 39]. Many applications have been developed using the LBS data. For instance, a recent smartphone-enhanced travel survey conducted in the U.S. used a mobile application, rMove, developed by Resource Systems Group (RSG), to collect high-frequency location data and let respondents recall their trips by showing the trajectories in rMove [40-43]. Airsage leveraged LBS data to develop a traffic platform that can estimate traffic flow, speed, congestion, and road user sociodemographic for every road and time of day [44].The Maryland Transportation Institute (MTI) at the University of Maryland (UMD) developed the COVID-19 Impact Analysis Platform (data.covid.umd.edu) to provide insight on COVID-19’s impact on mobility, health, economy, and society across the U.S. [45-47]. In summary, transportation big data used in the literature are different in terms of spatiotemporal coverage of population and its mobility, as well as data quality, e.g., spatial accuracy and location recording interval (LRI) [48, 49]. The GPS data in general have the highest spatial accuracy (e.g., 10 meters) and the lowest LRI (usually 1 second), but usually cover only a 11 small percentage of the population, and thus cannot reflect population-level travel behavior without a statistical weighting process. Therefore, most of the GPS data are used as supplementary data sources for regional travel surveys. The cellular data and LBS data have significantly higher spatiotemporal coverage of the population than the GPS data because of the large penetration rate of cellphone and mobile devices in the U.S. However, the ground truth information is usually missing, The LRI for both types of data is high and has a larger variation depending on mobile device usage thus also has a larger variation [49]. In addition, although cellular data may have higher coverage, the spatial accuracy of the data and the temporal frequency of the pins are inferior to the LBS data. This is because cellular technology relies on the density of cell towers and does not reflect the actual location of the devices. Also, cellular data are generated based on calls and messages or a network-driven event which might lead to a lower number of events. 2.2 Models and Algorithms for Transportation Big Data 2.2.1 Trip End Identification The trip end identification algorithm for high-frequency data, i.e., GPS data, has been well-studied and used in practical applications [48]. To obtain accurate trip ends, the traditional way is the rule- based trip end identification method. This type of method designs rules and parameters based on domain knowledge. The trip ends are obtained by applying the rules to location data point by point and at the same time examining the interrelation between two consecutive location points. The parameters used in these rules are mostly defined by domain knowledge, such as dwell time and speed [50-57]. In recent years, some researchers also leveraged supervised machine learning models as a supplement to the rule-based methods, which classify each location point as static or moving [58-60]. Different clustering methods are also applied to obtain trip ends by first 12 identifying people’s activity locations from the location data [61-64]. A recent study utilized a spatiotemporal clustering method with three combined optimization models to detect trip ends [64]. In recent years, there was also a special focus on deriving the trip ends from LBS data. A “Divide, Conquer and Integrate” (DCI) framework was proposed to process the LBS data to extract mobility patterns in the Puget Sound region [39]. The proposed framework combined a rule-based method and incremental clustering method to handle the bi-modally distributed LBS data. The results were aggregated at the census tract level and compared with household travel surveys. 2.2.2 Travel Mode Imputation After the trip ends are identified, it is also important to impute the travel mode for each trip to obtain multimodal travel patterns. Travel mode imputation can be categorized mainly into two approaches: (1) trip-based approach; and (2) segment/point-based approach. The trip-based approach is based on the already identified trip ends, where each trip has only one travel mode to be imputed. The segment/point-based approach separates the trip into fixed-length segments (time or distance) or a single point and then imputes the travel mode for each segment or point [49]. Then the segments/points with the same travel mode are further merged to form a single-mode trip. Both previous trip-based approaches and segment/point-based approaches have used similar features in order to distinguish between different travel modes. 13 Table 2-1. Studies on Travel Mode Imputation Methods. Author LRI Model Main Features Modes Acc. Gong et al. 2012 [54] / Rules Speed, Acceleration, Transit Stations, Transit Network Drive, Train, Bus, Walk, Bike, Static 82.6% Stenneth et al. 2011 [65] 30 s RF Speed, Acceleration, Heading change, Bus location, Transit Network Drive, Bus, Train, Walk, Bike, Static 93.7% Bruunauer et al. 2013 [66] 1-10 s MLP Speed, Acceleration, Bendiness Drive, Bus, Train, Walk, Bike 92.0% Xiao et al. 2015 [68] 1 s BN Speed, Acceleration, Trip Distance Drive Bus, Walk, Bike, E- Bike 92.0% Nitsche et al. 2014 [67] 1 s DHMM Speed, Acceleration, Direction Drive, Bus, Motorcycle, Train, Tram, Subway, Walk, Bike 65% - 95% Dabiri and Heaslip. 2018 [71] 1-5 s CNN Speed, Acceleration, Jerk, Bearing Rate Drive, Bus, Train, Walk, Bike 84.8% Bachir et al. 2019 [32] / BI Road and Rail Trip Counts Road, Rail / Vaughan et al. 2020 [73] / DNN Speed, Trip Distance, Land Use, Time of Day Drive, Bus, Active (Walk, Bike) 87% Burkhard et al. 2020 [49] 1 s subsampled to 5 min KNN, RF, etc. Speed, Public Transport Stops, and Lines Drive, Train, Tram, Bus, Walk, Bike / Breyer et al. 2021 [74] / KNN etc. Road and Train Route Geometry Road, Train 95.5% * RF: Random Forest; MLP: Multi-Layer Perceptron; BN: Bayesian Network; DHMM: Discrete Hidden Markov Model; CNN: Convolutional neural Network; BI: Bayesian Inference; DNN: Deep Neural Network 14 Table 2-1 summarizes typical methods and features that are used for travel mode imputation. According to the literature review done by Huang et al. 2019 and Burkhard et al. 2020, it can be observed that typical features include speed and acceleration [49, 54, 65-73]. Specifically, when the LRI is less than 10 seconds, the speed (speed variation) and acceleration features are more important in differentiating among different travel modes, which can be imputed solely from the data. When the LRI is relatively high, such as 30 s, additional features can be added to maintain the same level of accuracy such as real-time transit information [65], multimodal transportation network [49, 54, 65, 74], and sociodemographic information [70, 73]. However, most of these studies tested the algorithms using the low-LRI GPS data sample, which has frequent observations. Limited efforts have been spent on developing suitable algorithms for cellular data or LBS data that suffer from the high-LRI issue. Burkhard et al. examined the required spatial accuracy and LRI to accurately detect travel mode from the high-LRI MDLDs by subsampling the low-LRI GPS data [49]. They concluded that the LRI should be less than a minute to ensure the travel mode imputation accuracy. Bachir et al. developed a Bayesian Inference (BI) method to separate road and rail modes from the CDR data in the Greater Paris region by leveraging the road and rail trip counts from the travel survey [32]. Vaughan et al. trained a Deep Neural Network (DNN) model to separate drive, bus, and active modes with artificial CDR traces reconstructed from the travel survey data [73]. The model is applied to the real-world CDR data to obtain travel mode shares. Breyer et al. developed multiple classification methods using labeled CDR data to separate only the road and train modes between two OD pairs [74]. The major limitation of these studies is that either the study area is small (e.g., an OD pair or a region) or the method only separates easy-to- detect modes (e.g., Road versus Rail). As Huang et al. 2019 mentioned in their review [48], the supervised machine learning methods have not been fully exploited yet due to the lack of ground 15 truth labeled data, and might be worth investigating for MDLDs, especially for the cellular data and LBS data. Besides, rather than identifying easy-to-detect modes (e.g., rail versus road), their review suggests including more mode categories 2.3 Transportation Big Data for Traffic Operations and Safety 2.3.1 State-of-the-Practice on Crash Scene Decision Makings For a long time, older people’s limitations in traffic safety research were emphasized in discussing the contributing factors of crash injury severity, such as traffic conditions, roadway geometries, land use type, environmental conditions, and driver characteristics through different statistical analyses [75-79]. Also, numerous methods have been developed to predict the crash injury severity with the crash report data. Although crash severity prediction models have come into common discussion during the last decade in policy, research as well as practice, they are still suffering from a lack of clarity and accuracy regarding their interpretation and data availability. In turn, it limits the capability of applying these methods for real-time decision-making. In reality, two major decisions need to be made at the crash scene: • whether an EMS is needed: when someone is injured in a vehicle crash, the responding EMS providers must provide emergency care at the scene and then transport the patient to healthcare based on the injury severity [80]; • whether an injured person should be triaged to the trauma centers: If an EMS is dispatched, the EMS providers must not only determine the severity of the injury and initiate medical management, but also identify the most appropriate transport destination facility through a process called “field triage” [81]. 16 Table 2-2. State-of-the-Art Methodologies of Trauma Triage Authors Method Outcome Factors Performance Scheetz et al. 2007 [88] Decision Tree Trauma or Non- trauma Age, Gender, Height, Light, Glasgow coma scale, Injury severity, etc. 95.15% SE* and 76.47% SP * for severe injury, 83.1% SE and 81.5% SP* for moderate injury Wang et al. 2009 [83] Rule-based Trauma or Non- trauma Glasgow coma scale, Blood pressure, Respiratory rate, Crash characteristics, Estimated traffic speed, etc. / Sasser et al. 2012 [84] & Davidson et al. 2014 [85] Rule-based Trauma or Non- trauma Same as Wang et al. 2009 / Newgard et al. 2016 [89] Decision Tree Trauma or Non- trauma Age, Gender, Glasgow coma scale, Blood pressure, Respiratory rate, Mechanism of injury, etc. 92.1% SE* and 41.5% SP* AtiksaIdparit et al. 2019 [91] Statistical model Severe injury and death Age, Gender, Body mass index (BMI), Crash characteristics, EMS response time, Mechanism of injury, Physiological status, etc. 90.2% SE* and 75.9% SP* for severe injury, 98.7% SE* and 68.8 SP* for death Van Rein et al. 2019 [92] Statistical model Injury severity score Age, Glasgow coma scale, Blood pressure, Mechanism criteria, Penetrating injury etc. 88.8% SE* and 50.0* SP* Van der Sluijs et al. 2019 [90] Decision Tree Injury severity score Age, Gender, Glasgow coma scale, Blood pressure, Mechanism of injury, Injury type etc. Not reported Magnusson et al. 2020 [93] Rule-based RETTS-A* triage levels Dispatch medical index (DMI) including Chest pain, Extremity, Respiratory difficulties etc. 81.0% SE* and 64.0% SP* Shanahan et al. 2021 [94] Statistical model Injury severity score Same as Van Rein et al. 2019 83.0% SE* and 50.0% SP* * SE: Sensitivity, also called true positive rate or recall. * SP: Specificity, also called true negative rate. * RETTS-A: The Rapid Emergency Triage and Treatment System. 17 After an EMS team arrives at the scene, field triage decisions need to be made by EMS providers to determine whether the injured occupants should be sent to a trauma center. A recent study used the National Automotive Sampling System (NASS) to study the factors affecting triage decisions. Their results indicate that though injury severity and resulting mortality among the older group (age > 60) was higher than for younger counterparts, the older group is less likely to be transported to a trauma center [81, 82]. These findings emphasized that the triage decision significantly saves people’s lives, especially old people. With these considerations, several field triage decision guidelines were developed for reference. The universally used Field Triage Decision Scheme was revised by a National Expert Panel organized by the Centers for Disease Control and Prevention, where comprehensive crash and health-related data were used [83]. In 2012, the National Center for Injury Prevention and Control and the Division of Injury Response, in collaboration with the National Highway Traffic Safety Administration (NHTSA), Office of Emergency Medical Services, and in association with the American College of Surgeons, Division of Research and Optimal Patient Care also released the Guidelines for Field Triage of Injured Patients [84, 85]. These represent the latest development on rule-based guidelines for field triage. Several studies were also conducted focusing on validating these guidelines [86, 87]. Apart from these guidelines, data mining and decision tree methods, such as Classification and Regression Tree (CART) [88, 89] and gradient boosting decision tree [90] were also largely used by researchers to predict the trauma triage decisions. These studies are reviewed in Table 2-2. While several studies have touched on the strong association between crash severity and traffic conditions (e.g., [83]), few studies have linked trauma triage decision-making with transportation domain knowledge and/or transportation-sector data. With the recent engineering 18 advances in transportation big data and data-driven analytical methods, transportation-sector data becomes increasingly available, in terms of both the data coverage and the timeliness to support real-time or near real-time decisions such as EMS and trauma triage. Motivated by this cross- disciplinary research needs, this study aims at filling the data gap with integrated transportation and health data. It contributes to the existing literature with a methodological framework that integrates relevant transportation-sector data sources including network characteristics, traffic volumes, and historical travel speed at the crash scenes into the health-related decisions. Such integration is also believed to contribute to an enhanced accuracy compared to existing studies (as shown in Table 2-2). This study then empirically tests the framework on EMS and trauma triage decision scenarios using Maryland datasets. Decision tree models are adopted due to their wide applications and proven capability in prediction. Results demonstrate that the integrated transportation and health data contribute to enhanced prediction accuracy, reducing under-triage for the elderly, and saving more lives from vehicle crashes. 2.3.2 Estimating Vehicle Volume based on Transportation Big Data Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private sector companies have also explored alternative solutions such as probe vehicle data, while still suffering from a low penetration rate. In recent years, along with the technological advancement in mobile sensors and mobile networks, Mobile Device Location Data (MDLD) have been growing dramatically in terms of the spatiotemporal coverage of the population and its mobility. Three ways of estimating vehicle volumes are reviewed below. Loop detectors are widely used to record traffic volumes and occupancy levels. These sensors are usually buried under the pavements to detect the induction change from the presence 19 of a vehicle. Kwon et al. 2003 developed an algorithm using data from single loop detectors to estimate truck traffic volumes [95]. The results showed a 5.7% error compared with the ground truth highway data. Loop detector data were also applied together with probe vehicle data to estimate queue length [96] and vehicle volume at a city-wide scale [97]. Although proven to be efficient in estimating vehicle volume, the high installation and maintenance cost of loop detectors limit their capability of being scaled up to cover the entire transportation network. Therefore, loop detector datasets are often incomplete and mostly unavailable at minor arterials and local streets. In the past two decades, MDLD have gained significant attention and have been utilized for estimating various traffic characteristics, including vehicle volumes. With the development of MDLD, estimating vehicle volumes at the city scale became a reality. Probe vehicles can record their trajectory data with high granularity (i.e., 1Hz). Based on the trajectory data obtained from probe vehicles, a wide range of methods can be used by researchers to solve transportation problems. Zhao et al. proposed novel methods to estimate queue length and vehicle volume based on the probability theory without prior information about the penetration rate or queue length distribution [98]. Guo et al. estimated vehicle volume and queue length at signalized intersections and proposed a new framework to optimize traffic signal control operations [99]. Sekuła et al. applied several machine learning and neural networks to estimate historical hourly vehicle volume between sparsely located sensors based on the probe vehicle data [100]. Shockwave theories were also applied to probe vehicle data by a few studies [101, 102]. Many studies have been conducted focusing on estimating traffic flow and detecting congestion using cellular data [103, 104]. Xing et al. utilized CDR with the Time Difference of Arrival (TDOA) positioning technique in order to estimate multimodal traffic volumes on different types of urban roadways by identifying three modes of travel – namely, drive alone, carpooling, 20 and bus [105]. The results showed that compared with the ground truth vehicle volume obtained from License Plate Recognition (LPR) cameras, the mean relative error was in the range of 17.1% to 25.7%, depending on the roadway type. Despite significant advances in positioning techniques, cellular data still suffer from low accuracy issues, whereas LBS data have a noticeable advantage due to utilizing different sources to accurately locate the user – a feature that has resulted in increased usage of this type of data by researchers and the private sector for estimating vehicle volume. Fan et al. developed a computing framework alongside a heuristic map matching algorithm to estimate Vehicle Miles of Travel (VMT) and AADT for the state of Maryland using INRIX data [106]. The results showed an R2 of 0.878 when fitting the estimated AADT with the ground truth AADT. Moreover, a number of state agencies conducted rigorous evaluations of vehicle volume obtained through traditional methods as well as from MDLD obtained by private sector companies. They found the latter to be a promising source for supplementing current surveys and traditional methods [107]. 2.3.3 Pedestrian and Bicyclist Crashes Estimation Methods According to the National Highway Traffic Safety Administration’s Traffic Safety Facts 2019 Report indicates, in 2019, pedestrian and bicyclist fatalities accounted for nearly 20% of all traffic crash-related deaths in the U.S. [108]. In Maryland alone in 2019, 3,136 pedestrian crashes and 848 bicycle crashes occurred, where over 90% of pedestrian crashes and over 80% of bicycle crashes resulted in injuries or fatalities [109, 110]. Approximately one out of every four individuals killed in traffic crashes in Maryland was a pedestrian [109]. Studies on pedestrian and bicyclist safety issues are abundant. They identify key contributing factors to pedestrian- and bicyclist-involved crashes as well as suitable methodologies for crash frequency analysis. To address the fundamental issues typically associated with crash 21 frequency data, previous research studies have employed various methodologies to analyze pedestrian- and bicyclist-involved crash frequency. Many factors have been suggested to play a role in pedestrian and bicyclist crashes, including those representing pedestrian and bicyclist risk exposure [111-116]. land use and the built environment [113, 117-120], and sociodemographic/socioeconomic status [113, 117, 118, 120, 121]. Among those, one of the important factors is vehicle volume, which significantly correlates with the frequency of pedestrian and bicyclist crashes. In this case, vehicle volume estimated from the MDLD can be integrated into existing traffic safety modelling methods to estimate pedestrian and bicyclist crashes for all intersections to promote traffic safety analysis. According to Lord and Mannering [122], one of the main issues characterizing crash frequency data is overdispersion, which happens when the standard deviation of the crash counts is considerably larger than the mean. The other issue that usually affects crash frequency data is having excess zeros, which happens when crash counts contain a significant number of zero values [116, 122]. To predict pedestrian and bicyclist crash frequency at intersections, Saad et al. [115] used bicycle crowdsourced data from Strava [123] and developed a negative binomial (NB) model. They found that the frequency of bicycle crashes at intersections was positively associated with intersection size, the intersection being a signalized intersection, the number of intersection legs being four (compared to three-legged intersections), as well as total entering vehicle volume. The study also indicated that the frequency of bicycle crashes at intersections was negatively associated with the presence of a bike lane at those intersections. Raihan et al. [116] used a zero-inflated negative binomial (ZINB) model to develop crash modification factors (CMFs) for bicyclist crashes in Florida’s urban areas. They found that road design characteristics such as lane width and speed limit had positive effects on reducing bicycle crashes. Lower bicycle crash probabilities 22 on segments were associated with increased bicycle activity. However, increased bicycle activity was associated with higher bicycle crash probabilities at intersections. Increased bicycle crash probabilities at intersections were also associated with the number of bus stops within the intersection influence area as well. Ukkusuri et al. [117] examined the role of various built environment, land use, road network, and sociodemographic factors as well as key exposure measures including traffic volume, transit ridership, and proportion of nonmotorized trip-makers in the frequency of total, injury-causing, and fatal pedestrian crashes. The study employed NB and ZINB models to estimate crash frequency and found that increased numbers of total and/or fatal pedestrian crashes were associated with increased proportions of industrial and commercial land use, increased transit ridership, increased numbers of subway stations, increased proportions of intersections with four and five approaches, increased proportions of primary roads without access restriction, and increased number of lanes. Sanders et al. [119] employed Poisson regression to examine the role of various factors in pedestrian exposure at intersections as well as bicycle exposure at various road segments in Seattle, Washington. They found that variables representing population and land use (i.e., number of households, number of commercial properties, and the presence of a university near the intersection) were significantly associated with pedestrian exposure at intersections. Moreover, bicycle exposure was associated with the number of bicycle lanes on the road segment and land use variables such as the presence of a university or a school near the count location. The findings of that study provided insights into the factors affecting pedestrian and bicyclist risk exposure, which is a key contributing factor to pedestrian and bicyclist crashes. Jestico et al. [124] used a crowdsourced bicycling incident dataset for the Capital Regional District in British Columbia, Canada, to identify design attributes associated with unsafe intersections between multi-use trails and roads. NB regression was used to model the links 23 between the number of bicycle crashes and near-miss incidents and the infrastructure characteristics at multi-use trail-road intersections. The results showed that factors associated with bicycle incident frequency at multi-use trail-road intersections included bicycling volumes, vehicle volumes, and trail sight distance. Many other studies also investigated factors affecting pedestrian and bicyclist safety risk exposure and modeled pedestrian- and bicyclist-involved crash frequency. The key contributing factors affecting pedestrian/bicyclist safety exposure and crash frequency that emerge from the literature include: sociodemographic and socioeconomic factors such as proportion of the population by race or age group [113, 117, 118, 120, 121]; land use and built environment factors such as population density, employment density, activity diversity, bus stop density, and ratio of residential, industrial, and commercial uses [113, 118-120]; and traffic- and travel-related factors such as vehicle, pedestrian, and bicycle volumes as exposure measures [111-116]. Further, the literature review reveals that the most prominent methodologies that have been applied to pedestrian and bicyclist crash frequency analysis are Poisson regression, negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression, and zero-inflated negative binomial (ZINB) regression [111, 120, 122-125]. The Poisson regression is usually considered the starting point in crash frequency modeling [111]. Moreover, while the ZIP and ZINB regression methodologies have frequently been applied in empirical research to account for the preponderance of zeros observed in crash count data, the ZINB regression is applicable for count data that exhibit both overdispersion and excess zeros issues [116]. 24 Table 2-3. Examples of Past Studies on Pedestrian and Bicyclist Safety Models Study Unit of Analysis Study Area Safety Measure Methodology Key Exposure Measure(s) Ukkusuri et al. 2012 [117] Census tract, zip code New York City (NYC), NY Total pedestrian crashes, severe crashes, and fatal crashes NB, ZINB Traffic volume, pedestrian activity, operating speeds Hosseinpour et al. 2012 [111] Road segment Federal Road Network, Malaysia Frequency of pedestrian crashes Poisson, NB, ZIP, ZINB Motorized traffic volume Lee et al. 2015 [121] Zip code Various locations in FL Pedestrian crashes per crash location zip code, crash-involved pedestrians per residence zip code Bayesian Poisson lognormal simultaneous equations spatial error model Log of population, log of vehicle miles traveled Sanders et al. 2017 [119] Intersection, road segment Seattle, WA Pedestrian and bicyclist counts Poisson model — a Jestico et al. 2017 [124] Multi-use trail intersection Capital Regional District, British Columbia, Canada Frequency of bicyclist crash and near miss incidents NB Pedestrian, bicyclist, and vehicle volumes Xie at al. 2017 [113] Grid cell (300×300 ft2) Manhattan (NYC), NY Pedestrian crash cost Tobit model Vehicle miles traveled, taxi trips, subway ridership Mansfield et al. 2018 [120] Census tract United States Frequency of pedestrian fatalities NB, ZINB, ZINB mixed model Vehicle miles traveled density (thousand VMT/mi2) by roadway functional class Saad et al. 2019 [115] Intersection Orange County, FL Frequency of bicycle crashes NB Total entering volume, bicycle volume Raihan et al. 2019 [116] Intersection, road segment Urban areas, FL Bicycle crash modification factors ZINB Bicycle activity (Strava volumes) [122] Lee et al. 2019 [125] Intersection Orange and Seminole Counties, FL Pedestrian crashes NB, ZINB Observed and predicted pedestrian trips Notes: — a: This was an exposure study; therefore, the exposure measures were the response variables in the models (i.e., pedestrian and bicyclist counts). 25 Considering factors and methodologies used in exposure and crash analyses for vulnerable road users, Table 2-3 summarizes a few previous pedestrian and bicyclist safety studies. Overall, the literature review reveals that while pedestrian and bicyclist safety risk analyses are becoming more data-driven, usage of consistent and reliable exposure data such as crowdsourced big data in conducting pedestrian and bicyclist crash analyses remains scarce—particularly with regards to pedestrians. This study aims at addressing that gap in empirical research by utilization of mobile- device location big data in analysis of pedestrian and bicyclist crashes. 26 Chapter 3: Identification of Metrics Used for Various Levels of Traffic Analysis 3.1 Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis This section reviews the state-of-the-practice models, tools, and metrics developed by State agencies such as the Department of Transportation (DOT) and Metropolitan Planning Organizations (MPOs) or universities for planning and designing transportation projects while considering systemic feasibility and efficiency, including for traffic operations and safety. Transportation project decisions require cooperative actions across various organizations, offices, and working groups within an organization when the plans cover different municipal areas or techniques governed by multiple authorities. Many different tools and methods are available to support the quantitative analysis of TSM&O and traffic operations strategies in planning and programming. Based on the U.S. Department of Transportation’s (USDOT) Federal Highway Administration’s (FHWA) Applying Analysis Tools in Planning for Operations report, the following tools can be used for analyzing strategies at various levels of the planning process. • Sketch planning and prioritization tools for highway needs inventory (e.g., Tool for Operations Benefit-Cost Analysis – TOPS-BC, MOSAIC) • Travel demand models (e.g., MSTM, BMC InSITE, MWCOG model) with postprocessors (e.g., Intelligent Transportation System (ITS) Deployment Analysis System – IDAS) • Analytical tools (e.g., Highway Capacity Manual and traffic signal optimization tools). • Microscopic simulation models (e.g., VISSIM, AIMSUN) • Mesoscopic simulation models (e.g., DTALite, DynusT) 27 Table 3-1. State-of-the-Practice Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis State/Agency Model Descriptions Sketch-Planning Tools FHWA ITS Deployment Analysis System (IDAS) The objective of IDAS is to estimate the impacts and costs resulting from the deployment of various ITS components. Northeastern Illinois IDAS IDAS is used to evaluate four types of ITS deployment: electric toll collection, freeway variable message signs, electric transit fare collection system, and transit vehicle signal priority. Ohio-Kentucky-Indiana IDAS The components of Advanced Regional Traffic Interactive Management Information System (ARTIMIS) are evaluated using IDAS, including closed-circuit TV cameras, electronic dynamic message signs, traveler advisory telephone service, highway advisory radio, freeway service patrol vans, ramp and reference makers, vehicle detectors, total station electronic surveying equipment and operations control center. Michigan IDAS The components of Temporary Traffic Management System (TTMS) are investigated, including closed-circuit TV cameras, portable dynamic message signs, detection devices for traffic queueing and construction zones, video monitoring stations, telephone/web-based traveler information, and a traffic management center. Florida DOT Florida Standard Urban Transportation Model Structure (FSUTMS) FSUTMS can produce various performance measures including vehicle miles of travel, vehicle hours of travel, average speed, number of accidents, fuel consumption, monetary benefits to users and/or agency, and emissions. CalTrans California Life-Cycle Benefit/Cost (Cal-B/C) Cal-B/C uses a set of spreadsheet-based tools that cover multi-modal analysis of highway, transit, bicycle, pedestrian, ITS, operational improvement, and passenger rail projects. University of South Florida Trip Reduction Impacts of Mobility Management Strategies (TRIMMS) TRIMMS allows quantifying the net social benefits of a wide range of transportation demand management initiatives in terms of emissions reductions, accident reductions, congestion reductions, excess fuel consumption, and adverse global climate change impacts by estimating changes in travel behavior. New York State DOT ITS Options Analysis Model (ITSOAM) ITSOAM has three components including Delay Model, Safety Model, and Environmental Benefits Model. Post-Processing Analysis 28 Florida DOT Integrated Regional Information Sharing and Decision Support System (IRISDS) IRISDS is a web-based platform that provides decision support for estimating and predicting system performance using data mining techniques, traffic analysis, and simulation modeling. Florida DOT Florida ITS Evaluation (FITSEVAL) FITSEVAL evaluates the benefits and costs of thirteen different ITS deployment alternatives and can assess the mobility, safety, environmental, and monetary benefits and produces estimates of the present-worth and benefits-cost ratios of ITS. Florida DOT ITS Data Capture and Performance Management (ITSDCAP) ITSDCAP conducts ITS evaluations based on ITS data and four types of ITS can be evaluated including incident management, ramp metering, smart work zone, and road weather information system. Virginia DOT Virginia System Operations Performance Reports (VSOPR) VSOPR assesses four categories of measures including Traffic, Incidents, Traveler information, and ITS device reliability. Wisconsin DOT Summary of ITS evaluation methods The evaluation process consists of nine steps and assesses four types of measures, including Performance metrics, Benefits valuation measures, Net benefits, and B/C ratio. Multi-Dimensional Models Florida DOT FITSEVAL FITSEVAL uses the output of the FSUTMS modeling environment under CUBE, which quantified Congestion/Mobility, Safety, Environmental and energy, and Agency and user costs measures. Oregon DOT Analysis and modeling tools Statewide Integrated Model (SWIM), SWIM2, Land Use Scenario DevelopR (LUSDR), DTA, VISSIM, etc. Maryland DOT InSITE ABM-DTALite and SILK AgBM- DTALite InSITE ABM-DTALite is the result of integrating a DTA tool based on an existing DTALite model that covers the InSITE ABM. SILK AgBM-DTALite is an agent- based microsimulation travel demand model. Maryland DOT Maryland Integrated Travel Analysis Modeling System (MITAMS) MITAMS has a special focus on various applications ranging from short-term and long-term applications. Ohio DOT and Kentucky Transportation Cabinet ARTIMIS ARTIMIS aims to optimize freeway system efficiency, improve safety and benefit air quality. It includes over 80 cameras, 57 center-lane miles of fiber-optic cable, approximately 1100 detectors, and numerous freeway message signs in Cincinnati. University of Florida Corridor Simulation (CORSIM/TSIS) The Traffic Software Integrated System (TSIS) integrates with the microscopic TRAF tools of CORSIM, namely FRESIM for freeway simulation and NETSIM for surface arterials and network simulation. 29 Based on the FHWA’s Operations Benefit/Cost Analysis Desk Reference, there are in general three types of tools: • Sketch-planning tools can provide a simple, quick, and low-cost estimation of operational strategy benefits and costs. Examples include spreadsheets that rely on generally available data as Ill as static cause-effect relations between strategies and their impacts. Usually, these are inexpensive to use but have a high inaccuracy or risk. • Post-processing analysis tools seek to link the evaluation of operations with the travel demand, network data, and performance measure outputs from regional travel demand and simulation models. They are often more capable of assessing the impacts of the route, mode, or temporal shifts than sketch-planning methods but tend to cost more. • Multi-dimensional models are the most complex and costly, but typically provide a high level of confidence in the accuracy of the results. They are often used to integrate various analyses (e.g., a travel demand model and a DTA simulation) to estimate the full range of impacts of operations strategies or transportation projects. Table 3-1 summarizes the state-of-the-practice models, tools, and metrics used by various DOTs and MPOs. These models and tools usually largely rely on traditional transportation data collection methods, such as loop detectors and manual counting. 3.2 Operations Practice Scan Survey Based on the literature review in Section 3.1, the selection of analysis, modeling, and simulation tools, and the corresponding performance metrics vary during each stage of the transportation planning and operations process and should serve analytical purposes. At the long-range planning stage, it is impractical to apply the most complex tool for each conceived traffic operations project. Sketch planning tools or travel demand model postprocessing tools may be more suitable. At the 30 Transportation Improvement Program (TIP) and project planning stages, mesoscopic and microscopic traffic simulation tools may be considered for traffic operations project studies. Multi- scenario and multi-resolution tools for estimating travel reliability impact under different weather and accident conditions may also be added at these stages to provide more comprehensive information to support decision-making. Post-project evaluation could rely on existing performance monitoring dashboard tools such as the Regional Integrated Transportation Information System (RITIS). To obtain a standard workflow for prioritizing tools and metrics for transportation planning and operations, a dedicated survey is designed to collect insights from stakeholders including transportation practitioners from federal, state, and local agencies and other private-sector professionals with experience with performance evaluation of transportation projects. The objective of the survey is to help understand what performance measures are needed to make decisions at the planning, construction, and operations stages of a transportation project. Different types of projects and the level of analyses and metrics required to make reasonable recommendations have been identified and reviewed. A flowchart framework that documents the best practice metrics used in evaluating projects for different stages of planning and operations processes was produced to support transportation planners and engineers in their decision-making. In this survey, three major stages of the general transportation project are identified, including: Feasibility & Planning, Design & Construction, and Maintenance & Operations. Under each stage, various performance metrics have been used to evaluate the project are listed. Based on these presumptions and categorization, the survey questions focus on understanding (1) best practices in performance metrics chosen for the evaluation of projects at different stages of the traffic operations planning process; (2) the usefulness of the metrics; and (3) challenges and 31 potential solutions for data and additional metrics that would offer insights. The survey collected 78 usable responses from the web-based survey. 3.3 Survey Results Table 3-2. States for which the Respondents Work. State Number of Respondents State Number of Respondents Georgia 1 Mississippi 1 North Carolina 1 Nebraska 2 Maryland 50 Pennsylvania 3 South Carolina 1 Washington 3 Virginia 3 Wyoming 2 Maryland, Virginia, District of Columbia 3 Figure 3-1. Agencies that the Respondents Work at. Figure 3-2. Projects that the Respondents Work on. 32 The detailed survey and results are documented in Appendix I. As shown in Table 3-2, Figure 3-1, and Figure 3-2, in total there are 79 respondents completed the survey, either filled online distributed by Maryland Department of Transportation State Highway Administration (MDOT SHA) or filled in-person using electronic devices during the Transportation Research Board Annual Meeting (TRBAM). Table 3-2 shows the states for which the respondents are currently working (nine of them did not select locations). Figure 3-1 summarizes the agency distribution of the respondents from across the U.S. Most of the respondents are from Counties (26), Local Municipalities (17), State Departments of Transportation (14), and Private Consulting Firms (9), while the remaining (13) are from other organizations. As shown in Figure 3-2, the respondents have mixed backgrounds, with 37 of them working most frequently on highway projects, 16 on arterial projects, 13 on pedestrian and bike projects, and 8 on transit-related projects (four of them did not select projects). 33 Figure 3-3. Projects that the Respondents Work on. 34 3.4 Performance Metrics Flowchart Based on the survey results from 79 respondents, a flowchart (see Figure 3-3) is produced that documents the best practice performance metrics used in prioritizing and evaluating transportation projects. The complete survey questionnaire and results can be found in Appendix I. In the Feasibility and Planning stage, the project type was further refined to reflect industry needs, knowing that best-practice metrics would differ structurally within different project types. Six types of transportation projects were identified separately: • Mobility: focused on reducing congestion delays, typically capacity improvements, micro- mobility infrastructure, transit solutions, etc; • Reliability: focused on maximizing existing operations, such as technology deployments to manage the transportation system more effectively; • Safety: focused on systematically and holistically promoting safety, using metrics such as the severity of crashes, high rate of crashes, vulnerable user interactions with vehicles, and freight design concerns; • Environmental: focused on managing environmental impact, sustainability, energy/emissions, and public health. Metrics could also include stream restoration and flooding mitigation; • Socio-economic: focused on economic revitalization, food desert programs, equity-related, etc.; • Recreational: focused on trails, visitor rest stops, etc. Based on these predefined project types, typical performance metrics used for each type of project were reviewed and listed in the survey questions to facilitate the post-processing of responses. Respondents were then asked to rank the frequency of these performance metrics when 35 performing a planning-level analysis or feasibility assessment. In case metrics were missing from the list, the respondents were asked to fill in an open-ended section with the metrics they felt were relevant to the question. As shown in the flowchart, during the feasibility and planning stage, more frequently used performance metrics were identified based on the respondents’ responses. Below is a summary of the major findings: • For mobility-related projects, “Delay”, “Travel Time” and “Volume-to-Capacity Ratio” are the three most frequently used performance metrics. Respondents also suggest metrics such as “Government Operations (e.g., Resource allocation, Master plan conformance, Equipment availability)” and “Multimodal Mobility (e.g., Mode share, Transfer time, Bicycle network). • For reliability-related projects, in addition to the commonly used “Travel Time Index”, the “Planning Time Index” and “Total Trip Time by Modes” are also frequently used. Respondents also suggest metrics such as “Congestion Impact (e.g., Delays, Buffer index, Wait times)” and “Safety Impact (e.g., Level of comfort/safety)”. • For safety-related projects, the typical performance metrics include “Crash Reduction”, “Conflict Reduction” and “Fatality Reduction”. Respondents emphasized the importance of “Pedestrian and Bicyclist Safety (e.g., Pedestrian movement)”. Some other frequently used performance metrics mentioned by the respondents include “Incident Rate (e.g., Incident rate per mile)” and “Speed Limits & Speed of Nearby Traffic”. • For environment-related projects, “Natural Resource Impact” and “Emission Reduction” are deemed frequently used. Respondents also suggest “Hazardous Impacts (e.g., Flood planning, stormwater planning)”, “Environmental Impacts (e.g., Exposure per person to emission, Noise impact)”, and “Cost of Environmental Testing”. 36 • For socio-economic-related projects, “Land Use”, “Employment”, and “Regional Economic Development” are deemed frequently used. Respondents also suggest “Community Impacts (e.g., Community revitalization, Older adult demographic)” and “Access to Public Transportation”. • For recreation-related projects, “Number of Trail Users”, “Visitation” and “Recreation Event Participation” are deemed necessary for decision-making. Respondents also suggest “Trail Conditions (e.g., Width of trails, Nexus to other network, Barrier separation)”. In the Design and Construction stage, the actual design and construction plan for the project should be the main consideration. Therefore, when the project moves to the Design and Construction stage, most performance metrics are used to determine whether the project has critical failure points. In the survey, respondents are asked to rank the performance metrics used to examine project performance. The survey results helped identify four major performance metrics to support the decision-making, including “Project Cost”, “Cost/Benefits Ratio”, “Public Opposition”, and “Major Design Flaw”. Furthermore, a list of standard performance metrics was also suggested by respondents based on their own experience. These standard performance metrics included “Is the mix of projects to be funded annually a reasonable distribution across modes?”, “Is the project still within anticipated cost?”, “Adequate Public Facilities Ordinance (APFO)”, “Cost and O&M Projects”, “Travel Time”, and “Level of Service”. During the Maintenance and Operation stage, the focus shifts to how to maintain and operate the project at the expected levels. Based on the results, these can be measured with “On- time Performance”, “Alternative Routes”, “Bridge Condition”, “State of Good Repair”, “Age of Transit Flee”, “Surface Condition”, “Signage Availability”, “Sufficient Funding”, “Clear Making (e.g. marking for crosswalks, travel lanes)”, “Reporting Issues” and “Priority Lists”. 37 3.4 Summary In summary, this chapter presents a dedicated survey to help understand what performance measures are needed to make decisions at the planning, construction, and operations stages of a transportation project. Different types of projects and the level of analyses and metrics required to make reasonable recommendations are identified and reviewed. A flowchart framework that documents the best practice metrics used in evaluating projects for different stages of planning and operations processes is produced to support transportation planners and engineers in their decision- making. Among six types of projects, “Mobility” and “Safety” projects are the two types that are closely related to traffic operations and safety research which aims at congestion reduction and safety improvement. Based on the flowchart, it can be seen that for “Mobility” projects, delay, travel time, and volume-to-capacity ratio metrics are deemed most frequently used. For “Safety” projects, crash/fatality rate and crash/fatality reduction are of the greatest concerns to transportation professionals when planning transportation projects. In order to quantify the importance of these performance metrics, mainly three types of upstream data are needed, namely vehicle volume, travel time (or traffic speed), and crash rate (or crash count). As indicated in the literature, instead of collecting data manually or using traditional technologies, transportation big data can be used to estimate these upstream data. In this dissertation, probe vehicle data in RITIS and large-scale MDLD are used to develop big-data driven frameworks to support crash-related decisions, estimate vehicle volumes, and pedestrian and bicyclist crashes, which ultimately support “Mobility” and “Safety” transportation projects. 38 Chapter 4: Supporting Triage Decisions for High-Risk Trauma Patients at Crash Sites with Location Data 4.1 Introduction As identified in Chapter 3, when planning for “Safety”-related transportation projects, the most considered, and important metrics are crash/fatality rate and crash/fatality reduction. This chapter focuses on incorporating one type of transportation big data, probe vehicle data, as well as the annual average daily traffic data to support crash-related decisions. More specifically, to estimate the emergency medical services (EMS) and trauma triage decisions at crash scene in order to reduce fatality rate caused by severe injuries. Although the research on improving EMS efficiency marks contemporary healthcare service and traffic safety discussions, its roots can be traced back to the 2000s, when epidemiologists, health researchers, and practitioners stressed the vulnerability of the old age persons who are involved in traffic crashes [126-128]. When a vehicle crash occurs, the decision for EMS and field trauma triage must be made in a timely manner to save lives, especially for elder persons [129]. According to a Population Bulletin, “Aging in the United States,” _the number of US seniors (age 65+) is projected to nearly double from 52 million in 2018 to 95 million by 2060, and the 65-and-older age group’s share of the total population will rise from 16 percent to 23 percent. As they have passed through each major stage of life, baby boomers have brought both risks and challenges to the transportation, infrastructure, and healthcare institutions. “Under-triage” often occurs in the medical decision process, where a large proportion of seriously injured older patients are transported to non-trauma hospitals or fail to be transported at all [89, 130-134]. This leads to a significant mismatch between the supply side from hospitals and 39 the demand side from those patients. Therefore, the gap not only degrades the health outcomes but also imparts the whole Illbeing from a longer perspective [75, 76]. On the other side, transferring the under-triaged patients to trauma centers inappropriately, might also waste time and public resources which could be used to help other crash victims. Thus, it is important for policymakers, health-related practitioners and scholars to be equipped with a tool for better evaluating and analyzing the efficiency of the current EMS system in the era of data-driven problem-solving. 4.2 The Big-Data Driven Framework Integrating Transportation and Health Data Figure 4-1. The Big-Data Driven Framework for Integrating Transportation and Health Data. The overall big-data driven framework is illustrated in Figure 4-1. It consists of two main pillars. Pillar 1 is the integration of the transportation big data and health datasets (shown on the left of Figure 4-1). Health and safety-related data, such as crash, EMS, and hospital triage records, is typically stored in Crash Outcomes Data Evaluation System (CODES) in Maryland. These CODES data are mapped to the roadway network (HERE network is used for its availability) using their geolocation information and then joined with datasets from the transportation sector through 40 a spatiotemporal map matching. The two most important transportation datasets, i.e., the traffic volumes and the historical time-dependent travel speed, are obtained at the roadway segment level from the Annual Average Daily Traffic dataset (AADT) and large-scale probe vehicle data sources (available from RITIS, the Regional Integrated Transportation Information System). These two transportation datasets are then matched to the network using the Traffic Message Channel (TMC). With the integrated dataset, information beyond crashes themselves becomes available, including the EMS involvement, hospital triage, as well as traffic and roadway conditions at the crash scene (volumes, travel speed, weather, road surface conditions, etc.). Pillar 2 of the framework employs the integrated dataset to evaluate decision-making. Classification models of decision trees are developed to model EMS and trauma decisions. This big-data driven framework enables the analysis of a wide spectrum of modeling methods in various application contexts. 4.2.1 Integrated Transportation and Health Data In this research, CODES dataset was with transportation data, including the roadway network, link level volume information (i.e. Annual Average Daily Traffic, AADT), and link level observed travel speed information obtained via probe vehicle information. In this section, the three major data sources are described. 41 Figure 4-2. CODES Data in the state of Maryland. Table 4-1. Descriptive Statistics of the CODES Data Variable Name Type Description sex Binary Gender 1 = female (53.5%); 2 = male (45.8%); 99 = unknown (0.7%) speed_limi Categorical Speed limit 5mph (2.5%); 10mph (2.3%); 15mph (3.3%); 20mph (0.9%); 25mph (16.2%); 30mph (13.9%); 35mph (16.8%); 40mph (13%); 45mph (8.2%); 50mph (7.2%); 55mph (11.9%); 60mph (0.3%) 65mph (2.8%); 70mph (0.5%) age Numeric Age min=65; max=109; mean=73; standard deviation=6.9 eldveh Binary If the person shared a vehicle with an elderly person A constant value of 1 damage Binary If the vehicle is disabled or destroyed due to the crash 1=yes (43.8%); 0=no (56.2%) eject Binary If the person is ejected 1=yes (0.4%); 0=no (99.6%) notbelted Binary If the belt is used improperly 1=yes (2.7%); 0=no (97.3%) light_code Categorical Light condition 0=Not Applicable (1.3%); 1=Daylight (77%); 3=Dark Lights On (13%) 4=Dark No Lights (3.9%); 5=Dawn (1.4%); 6=Dusk (2.4%) 7=Dark- Unknown Lighting (0.7%); 88=Other (0.2%); 99=Unknown (0.2%) collision_ Categorical Collision type 0= Not Applicable (1.3%); 1=Head On(2.6%); 2= Head On Left 42 Turn(7.2%); 3=Same Direction Rear End(29.7%); 4= Same Direction Rear End Right Turn(3.9%); 5= Same Direction Rear End Left Turn(1.4%); 6= Opposite Direction Sideswipe(1.3%); 7= Same Direction Sideswipe(6.5%); 8= Same Direction Right Turn(2.3%); 9= Same Direction Left Turn(2.5%); 10= Same Direction Both Left Turn(0.4%); 11= Same Movement Angle(20.3%); 12= Angle Meets Right Turn(0.6%); 13= Angle Meets Left Turn(0.7%); 14= Angle Meets Left Turn Head On(0.4%); 15= Opposite Direction Both Left Turn(0.2%); 17= Single Vehicle (11.8%); 88=Other (10.8%); 99=Unknown (0.3%) harm_event Categorical Subjects involved in accidents 0=Not Applicable (0.9%); 1=Other Vehicle (78.1%); 2=Parked Vehicle (4.5%); 3=Pedestrian (3.9%); 4=Bicycle (0.6%); 5=Other Pedalcycle (0.05%); 6=Other Conveyance (0.05%); 7=Railway Train (0.004%); 8=Animal (0.8%); 9=Fixed Object (7.2%); 10=Other Object (0.5%) ; 11=Overturn (0.1%); 12=Spilled Cargo (0.03%); 13=Jackknife (0.01%); 14=Units Separated (0.01%); 15=Other Non-Collision (0.2%); 16=Off Road (1.5%); 17=Downhill Roadway (0.01%); 18=Explosion or Fire (0.1%); 19=Backing (0.09%); 20=U-turn (0.01%); 21.15=Immersion (0.01%); 22.15=Fell Jumped from Motor Vehicle (0.04%); 23.15=Thrown or Falling Object (0.06%); 88=Other (0.3%); 99=Unknown (0.08%) belt_use Categorical Type of belts 1=Combined lab-shoulder protection (83.1%); 2= shoulder only (7.9%); 8= lap only (7.7%); 0=no restraint use (1.2%) emstrans (Output) Binary EMS decision 1=sent via EMS (18%); 0=not transported via EMS (82%) trauma (Output) Binary Trauma triage decision 1=sent to trauma center (0.8%); 0=not sent to trauma center (99.2%) 1) CODES data: The crash data are collected from Crash Outcome Data Evaluation System (CODES). The dataset includes crash scene information of car crashe