ABSTRACT 

 
Title of Dissertation: EMPOWERING TRAFFIC OPERATIONS 

AND SAFETY WITH TRANSPORTATION 

BIG DATA: PRACTICE SCAN, 

METHODOLOGY, AND APPLICATIONS 

  
 Mofeng Yang, Doctor of Philosophy, 2022 

  
Dissertation directed by: Professor, Paul Schonfeld, Department of Civil 

and Environmental Engineering 

 
In the past two decades, along with the technological advancement in mobile sensors and mobile 

networks, transportation big data, such as probe vehicle data and mobile device location data 

(MDLD), have been growing dramatically in terms of the spatiotemporal coverage of population 

and its mobility. These data sources have shown their great potential for large-scale and near real-

time transportation applications to support travel behavior analysis, travel demand modeling, 

traffic operations and safety analyses. The objectives of this dissertation are to (1) 

comprehensively examine the state-of-the-practice applications and the state-of-the-art models 

developed based on emerging transportation big data, (2) identify key metrics, and (3) establish a 

series of big-data driven frameworks to enhance traffic operations and safety. Three main sections 

are included.  

The first section of this dissertation presents a literature review on models, tools, and 

metrics used for various levels of traffic analysis, and analyzes a survey distributed to 


transportation professionals to quantify the importance of these key metrics for improving traffic 

operations and safety. Based on the literature review and survey insights, two big-data driven 

frameworks are proposed accordingly to address both traffic operations and safety issues.  

In the second section of this dissertation, a big-data driven framework is developed which 

aims at improving the accuracy and reliability of emergency medical services (EMS) and trauma 

triage decisions for elderly persons at crash sites. The proposed framework integrates 

transportation big data sources from both the demand side (such as traffic volumes, and time-

dependent vehicle speeds obtained from large-scale probe vehicles) and the supply side (i.e., 

transportation network features), as well as publicly available statewide crash data with health-

related decisions such as EMS and hospital records. Decision tree models are adopted to simulate 

the decision-making process due to their wide applications, a proven capability in prediction, and 

interoperability. With records of over 55,000 elderly patients, results demonstrate that the proposed 

framework contributed to enhanced EMS decision and trauma triage accuracy for the elderly, and 

saving more lives from severe vehicle crashes.  

In the third section of this dissertation, a big-data driven framework is proposed for 

estimating a critical operational metric, namely vehicle volume, on an all-street network, and 

further estimating the pedestrian and bicyclist crashes at all intersections. This framework employs 

a series of cloud-based computational algorithms to extract multimodal trajectories and trip rosters 

from terabytes of MDLD. A scalable map matching and routing algorithm is then applied to snap 

and route vehicle trajectories to the roadway network. The observed vehicle counts on each 

roadway segment are weighted and calibrated against ground truth control totals, i.e., Annual 

Vehicle Miles of Travel (AVMT), and Annual Average Daily Traffic (AADT). The proposed 

framework is built on Amazon Web Service (AWS) which leverages cloud computing techniques 


to estimate vehicle volumes for all roadway segments in the state of Maryland using MDLD for 

the entire year 2019. The estimated vehicle volume is further integrated with statewide crash 

records to estimate the pedestrian and bicyclist crashes at all intersections with statistical models. 

Results indicate that the proposed framework can produce reliable vehicle volume estimates and 

estimated pedestrian and bicyclist crashes, while also demonstrating its transferability and 

generalization ability. 

In summary, this dissertation comprehensively examines the literature on transportation 

big data applications and proposes two big-data driven frameworks demonstrated with two real-

world case studies. Results reveal the feasibility and advantages of empowering traffic operations 

and safety analysis with transportation big data. 

 
EMPOWERING TRAFFIC OPERATIONS AND SAFETY WITH 

TRANSPORTATION BIG DATA: PRACTICE SCAN, METHODOLOGY AND 

APPLICATIONS 

 
by 

 
Mofeng Yang 

 
Dissertation submitted to the Faculty of the Graduate School of the 

University of Maryland, College Park, in partial fulfillment 

of the requirements for the degree of 

Doctor of Philosophy 

2022 

 
Advisory Committee: 
Professor Paul Schonfeld, Chair, Department of Civil and Environmental Engineering 
Professor Kathleen Stewart, Dean’s Representative, Department of Geographical Sciences 

Professor Ali Haghani, Department of Civil and Environmental Engineering 

Assistant Professor Taylor Oshan, Department of Geographical Sciences 

Assistant Professor Chenfeng Xiong, Department of Civil and Environmental Engineering, Villanova University 

 
© Copyright by 

Mofeng Yang 

2022 

 
ii 

 
Dedication 

 
To my beloved parents Kun Yang and Peifan Li, and my wife Zhiyue Xia. 

 
And to my grandparents in heaven. 


iii 

 
Acknowledgements 

This dissertation is partially funded by U.S. Department of Transportation (U.S. DOT), Federal 

Highway Administration (FHWA), Maryland Department of Transportation State Highway 

Administration (MDOT SHA), and Maryland Transportation Institute (MTI). Opinions herein do 

not necessarily represent the views of the research sponsors. The author is responsible for the 

statements in the thesis.  

“Night is now falling. 

So ends this day. 

The road is now calling, and I must away.” 

This is the lyrics from the song “The Last Goodbye” from the movie “The Hobbit”. Just 

like Bilbo Baggins, instead of reclaiming the Lonely Mountain from the dragon Smaug, I was 

sitting in the plane at Beijing airport, waiting for my unknown journey to the other side of the 

Pacific Ocean. Now, four years have passed, and I am here writing this acknowledge to summarize 

this “unexpected journey”.  

First, I would like to express my sincere gratitude to my advisor, Dr. Paul Schonfeld for 

his continuous guidance since I joined the program in 2018. Dr. Schonfeld became my advisor in 

June 2022 and was also the committee member for my master thesis in 2020.  I really appreciate 

all the support and guidance Dr. Schonfeld provides. 

I would also like to express my special thanks to Dr. Kathleen Stewart, not only for being 

the dean’s representative of my dissertation committee, but also for the guidance through research 

collaborations in the past few years. Dr. Stewart’s research works have always been my top 

references when conducting my own research. In the meantime, I would also like to thank all my 

doctoral dissertation committee members:  Dr. Ali Haghani, Dr. Taylor Oshan, and Dr. Chenfeng 


iv 

 
Xiong for offering me their valuable comments to improve my research. I am extremely grateful 

to Dr. Haghani for his support when my previous advisor left the university. I sincerely appreciate 

Dr. Oshan for the perfect course I took with him at the Department of Geographical Science as 

well as serving in my master thesis committee. A special thanks to Dr. Chenfeng Xiong. The past 

four years of working together under Dr. Xiong supervision will be an unforgettable experience 

that I will always cherish. I would like to thank Dr. Lei Zhang for giving me the opportunity to 

come to the U.S. and get me involved into research projects. 

I want to thank my colleagues at Maryland Transportation: Jina Mahmoudi, Sepehr Ghader, 

Aref Darzi, Minha Lee, Weiyi Zhou, Aliakbar Kabiri, Songhua Hu, Guangchen Zhao, Weiyu Luo, 

Mohammad Ashoori, Saeed Saleh, Asal Tabrizi.  

Last, I would like to thank my parents, Kun Yang and Peifan Li, for always respecting my 

opinions and decisions. Thanks to my grandparents, Shizhen Yang and Daohua Chen, who passed 

away in 2021 and 2022. It was, is and will always be a pity in my life that I couldn’t be with you 

at the very last moment of your life. Thanks for my wife, Zhiyue Xia, who always supports me 

whenever I meet obstacles. 

Though not knowing the journey and where it leads, I embrace it, and I welcome every 

moment of it. Just like what it says in the song: 

“And though where the road then takes me. 

I cannot tell. 

We came all this way. 

But now comes the day. 

To bid you farewell” 

Mofeng Yang, at Greenbelt, MD 


v 

 
 Table of Contents  

 
Dedication ....................................................................................................................................... ii 

Acknowledgements ........................................................................................................................ iii 

Table of Contents ............................................................................................................................ v 

List of Tables ............................................................................................................................... viii 

List of Figures ................................................................................................................................ ix 

List of Abbreviations ...................................................................................................................... x 

Chapter 1: Introduction ................................................................................................................... 1 

1.1 Background ........................................................................................................................... 1 

1.2 Objectives.............................................................................................................................. 2 

1.3 Contributions ........................................................................................................................ 3 

1.3.1 Uniqueness of the Data .................................................................................................. 3 

1.3.2 Methodology Innovations .............................................................................................. 4 

1.3.3 Applications ................................................................................................................... 5 

1.4 Organization ......................................................................................................................... 5 

Chapter 2: Literature Review .......................................................................................................... 8 

2.1 Transportation Big Data Applications ................................................................................. 8 

2.1.1 GPS Data ........................................................................................................................ 8 

2.1.2 Cellular and Sighting Data ............................................................................................. 9 

2.1.3 Location-based Service Data ....................................................................................... 10 

2.2 Models and Algorithms for Transportation Big Data ........................................................ 11 

2.2.1 Trip End Identification ................................................................................................. 11 

2.2.2 Travel Mode Imputation .............................................................................................. 12 

2.3 Transportation Big Data for Traffic Operations and Safety .............................................. 15 

2.3.1 State-of-the-Practice on Crash Scene Decision Makings ............................................ 15 

2.3.2 Estimating Vehicle Volume based on Transportation Big Data .................................. 18 

2.3.3 Pedestrian and Bicyclist Crashes Estimation Methods ................................................ 20 

Chapter 3: Identification of Metrics Used for Various Levels of Traffic Analysis ...................... 26 

3.1 Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis 26 


vi 

 
3.2 Operations Practice Scan Survey ....................................................................................... 29 

3.3 Survey Results ..................................................................................................................... 31 

3.4 Performance Metrics Flowchart ......................................................................................... 34 

3.4 Summary ............................................................................................................................. 37 

Chapter 4: Supporting Triage Decisions for High-Risk Trauma Patients at Crash Sites with 

Location Data ................................................................................................................................ 38 

4.1 Introduction ........................................................................................................................ 38 

4.2 The Big-Data Driven Framework Integrating Transportation and Health Data ............... 39 

4.2.1 Integrated Transportation and Health Data .................................................................. 40 

4.2.2 Modeling Method ......................................................................................................... 44 

4.3 Results and Discussions ...................................................................................................... 47 

4.4 Conclusions ......................................................................................................................... 52 

Chapter 5: A Big-Data Driven Framework for Estimating Vehicle Volume on Mobile Device 

Location Data ................................................................................................................................ 54 

5.1 Problem Statement .............................................................................................................. 54 

5.2 The Big-Data Driven Framework for Estimating Vehicle Volume and Pedestrian and 

Bicyclist Crashes....................................................................................................................... 55 

5.2.1 The Framework ............................................................................................................ 55 

5.2.2 Trip Identification and Travel Mode Imputation ......................................................... 56 

5.2.3 Scalable Map Matching and Routing via Cloud Computing ....................................... 58 

5.2.4 Weighting ..................................................................................................................... 61 

5.2.4 Volume Calibration ...................................................................................................... 61 

5.3 Vehicle Volume Estimation Case Study: the State of Maryland ......................................... 62 

5.3.1 Data .............................................................................................................................. 62 

5.3.2 Vehicle Volume Estimation Results ............................................................................ 65 

5.4 Conclusion .......................................................................................................................... 72 

Chapter 6: Modeling Pedestrians and Bicyclist Crashes with Transportation Big Data .............. 73 

6.1 Pedestrian and Bicyclist Crashes Estimation ..................................................................... 73 

6.1.1 Poisson and NB Models ............................................................................................... 73 

6.1.1 ZIP and ZINB Models.................................................................................................. 75 

6.2 Pedestrian and Bicyclist Crashes Estimation Case Study: the State of Maryland ............. 76 


vii 

 
6.2.1 Data .............................................................................................................................. 76 

6.2.2 Model Development ..................................................................................................... 83 

6.2.3 Pedestrian and Bicyclist Crash Estimation Results...................................................... 86 

6.2.4 Assessment of Contribution of the Vehicle Volume and Pedestrian and Bicyclist 

Volume Estimated by MDLD to Model Performance .......................................................... 92 

6.3 Conclusions and Discussions.............................................................................................. 93 

Chapter 7:  Conclusions, Limitations and Future Works .............................................................. 95 

7.1 Conclusions ......................................................................................................................... 95 

7.2 Limitations .......................................................................................................................... 96 

7.3 Future Works ...................................................................................................................... 97 

Appendix I. MDOT SHA Operations Practice Scan Survey ...................................................... 100 

I.1 Survey ................................................................................................................................ 100 

I.2 Survey Results .................................................................................................................... 112 

Bibliography ............................................................................................................................... 121 


viii 

 
List of Tables 

Table 2-1. Studies on Travel Mode Imputation Methods. ............................................................ 13 

Table 2-2. State-of-the-Art Methodologies of Trauma Triage ..................................................... 16 

Table 2-3. Examples of Past Studies on Pedestrian and Bicyclist Safety Models ........................ 24 

Table 3-1. State-of-the-Practice Models, Tools, and Metrics for Various Levels of Traffic 

operations and Safety Analysis ..................................................................................................... 27 

Table 3-2. States for which the Respondents Work. ..................................................................... 31 

Table 4-1. Descriptive Statistics of the CODES Data .................................................................. 41 

Table 4-2. Decision Tree Model Evaluations ............................................................................... 47 

Table 4-3. Model Performance Measures and Comparison.......................................................... 48 

Table 5-1. Volume Calibration Results Comparison by Link Type ............................................. 68 

Table 5-2. Volume Calibration Results by Urban/Rural Status. ................................................... 70 

Table 6-1. Level of Traffic Stress Correspondence Table ............................................................ 79 

Table 6-2. Frequency of Pedestrian/Bicyclist Crashes at Maryland Intersections in 2019 .......... 83 

Table 6-3. Independent Variables for Pedestrian/Bicyclist Crash Frequency Models ................. 84 

Table 6-4. Results of the Pedestrian/Bicyclist Crash Frequency Models ..................................... 86 

Table 6-5. Model Improvement Assessment Based on LBS Variables ........................................ 92 

 
ix 

 
List of Figures 

Figure 1-1. Dissertation Outline. .................................................................................................... 6 

Figure 3-1. Agencies that the Respondents Work at..................................................................... 31 

Figure 3-2. Projects that the Respondents Work on. .................................................................... 31 

Figure 3-3. Projects that the Respondents Work on. .................................................................... 33 

Figure 4-1. The Big-Data Driven Framework for Integrating Transportation and Health Data. .. 39 

Figure 4-2. CODES Data in the state of Maryland. ...................................................................... 41 

Figure 4-3. Annual Average Daily Traffic in the state of Maryland. ........................................... 43 

Figure 4-4. The Decision Tree Model of EMS Triage Using the Integrated Data ....................... 49 

Figure 4-5. The Decision Tree Model of Trauma Triage Using the Integrated Data ................... 50 

Figure 5-1. The Big-Data Driven Framework for Estimating Vehicle Volume and Pedestrian and 

Bicyclist Crashes ........................................................................................................................... 55 

Figure 5-2. The Data-Driven Travel Mode Share Estimation Framework ................................... 57 

Figure 5-3. Distribution of Distance between Link Nodes in the OSM Network ........................ 58 

Figure 5-4. Example of Map Matching and Routing. ................................................................... 60 

Figure 5-5. Mobile Device Location Data around the State of Maryland. ................................... 62 

Figure 5-6. Number of Lanes and Speed Limits in OSM ............................................................. 63 

Figure 5-7. (a) Weighted Vehicle Volume in Training Set; (b) Calibrated Vehicle Volume in 

Training Set; (c) Weighted Vehicle Volume in Testing Set; (d) Calibrated Vehicle Volume in 

Testing Set. ................................................................................................................................... 67 

Figure 5-8. Volume Calibration Results Comparison by Link Type. ........................................... 68 

Figure 5-9. Volume Calibration Results Comparison by Urban/Rural Status. ............................. 70 

Figure 5-10. Visualization of Calibrated Vehicle Volume. (a) the State of Maryland; (b) 

Washington D.C.; (c) Baltimore City; (d) Hagerstown, MD. ....................................................... 71 

Figure 6-1. LTS Examples (Source: http://www.northeastern.edu/peter.furth/research/level-of-

traffic-stress) ................................................................................................................................. 78 

Figure 6-2. LTS Estimates for: (a) the state of Maryland; (b) University of Maryland College 

Park Campus; (c) Baltimore City .................................................................................................. 81 

Figure 6-3. ZINB model performance. ......................................................................................... 89 

  
x 

 
List of Abbreviations 

AADT   Annual Average Daily Traffic 

ANN   Artificial Neural Networks 

AVMT   Annual Vehicle Miles of Travel 

AWS   Amazon Web Service 

BI   Bayesian Inference 

BMC   Baltimore Metropolitan Council 

BN   Bayesian Network 

BTS   Bureau of Transportation Statistics 

CART    Classification and Regression Tree 

CASI   Computer-Assisted Self-Interview 

CATI   Computer-Assisted Telephone Interview 

CATT   Center for Advanced Transportation Technology 

CBSA   Core-based Statistical Area 

CDR   Call Detail Record 

CNN   Convolutional neural Network 

CODES  Crash Outcome Data Evaluation System 

DBSCAN  Density-based Spatial Clustering of Applications with Noise 

DHMM  Discrete Hidden Markov Model 

DNN   Deep Neural Networks 

DMV   Washington Metropolitan Area 

EMS   Emergency Medical Service 

FHWA   Federal Highway Administration 

GPS   Global Positioning System 

HPMS   Highway Performance Monitoring System 

KNN   K-Nearest Neighbors 

LBS   Location-based Service 

LPR   License Plate Recognition 

LRI   Location Recording Interval 

MDOT    Maryland Department of Transportation 

MDOT SHA  Maryland Department of Transportation State Highway Administration 

MLP   Multi-Layer Perceptron 

MaaS   Mobility-as-a-Service 

MTI   Maryland Transportation Institute 

MWCOG  Metropolitan Washington Council of Government 

NASS   National Automotive Sampling System 

NB   Negative Binomial 


xi 

 
NHTS   National Household Travel Survey 

NHTSA  National Highway Traffic Safety Administration 

NTM   National Transit Map 

OD   Origin and Destination 

OSM   OpenStreetMap 

PAPI   Paper-And-Pencil Interview 

PCMDL  Passively Collected Mobile Device Location 

RETTS-A  Rapid Emergency Triage and Treatment System 

RF   Random Forest 

RITIS   Regional Integrated Transportation Information System 

SHA   State Highway Administration 

SMOTE  Synthetic Minority Over-sampling Technique 

SVC   Support Vector Classifier 

SVM   Support Vector Machine 

TAZ   Traffic Analysis Zone 

TPB    Transportation Planning Board 

TRBAM   Transportation Research Board Annual Meeting 

UMD   University of Maryland 

U.S.   the United States 

USDOE  the United States Department of Energy 

USDOT  the United States Department of Transportation 

XGB   eXtreme Gradient Boosting  

ZIP   Zero-Inflated Poisson 

ZINB   Zero-Inflated Negative Binomial 

 
1 

 
Chapter 1: Introduction 

1.1 Background 

The current federal transportation legislation, "Moving Ahead for Progress in the 21st Century" 

(MAP-21), which was signed into law on July 6, 2012, advances statewide and metropolitan 

planning processes to incorporate a comprehensive performance-based approach to decision-

making. Typical performance metrics in planning for traffic operations and safety include changes 

in vehicle trips, vehicle miles traveled, emissions reduction, travel time savings, improvements in 

travel time reliability, energy consumption reduction, noise impacts, safety impacts, monetary 

values of these changes, and lists of traffic operations equipment and costs. In addition to 

traditional data sources such as loop detector data, and video-based traffic counts, emerging 

transportation big data such as probe vehicle data, connected vehicle data, and passively collected 

mobile device location data have enabled large-scale and near real-time models and methods to 

support operation, safety, demand, and planning analyses. More specifically, it is now possible to 

tell which users and which origin-destination pairs are using a particular transportation facility and 

in turn.  

In the past two decades, along with the technological advancement in mobile sensors and 

mobile networks, transportation big data, such as probe vehicle data and mobile device location 

data (MDLD), have been growing dramatically in terms of the spatiotemporal coverage of 

population and its mobility. Initially, these data sources are considered supplements to travel 

surveys and travel behavior analysis. A series of practices and research studies have demonstrated 

the effectiveness of such data in enhancing traditional travel surveys as well as revealed its great 

potential to replace travel surveys [1, 2]. At the same time, obtaining travel statistics solely based 


2 

 
on these data sources are also worth investigating in order to reduce labor and cost compared to 

travel surveys.  

Apart from supporting travel surveys and travel behavior analysis, in recent years, 

transportation big data have also been leveraged in large-scale and near real-time transportation 

applications for traffic operations and safety analysis. For instance, vehicle volume, as a critical 

operational metric, is the fundamental basis for traffic signal control, transportation project 

prioritization, road maintenance plans, and more. Traditional methods of quantifying vehicle 

volume rely on manual counting, video cameras, and loop detectors at a limited number of 

locations. These efforts require significant labor and cost for large-scale implementations. 

Researchers and private sector companies have explored alternative solutions such as probe 

vehicle data, which still suffer from low penetration rates. With the introduction of transportation 

big data, vehicle volume can be estimated from terabytes of movement data for a larger 

geographical area with larger sample size. 

1.2 Objectives 

The objective of this dissertation is to comprehensively examine the state-of-the-practice 

transportation big data applications and to develop state-of-the-art big-data driven frameworks that 

fully leverage the potential of the transportation big data, cloud computing techniques to improve 

traffic operations and safety analysis. In order to fulfill the research objective, four tasks are 

identified: (1) evaluating the state-of-the-practice applications and the state-of-the-art methods 

based on transportation big data and identifying the key research gap; (2) designing and 

distributing a survey to transportation professionals to identify key metrics for varying level of 

traffic analysis focusing on traffic operations and safety; (3) developing a big-data driven 

framework that leverages the instantaneous vehicle speed estimated nearly in real-time from large-


3 

 
scale probe vehicle data to enhance Emergency Medical Services (EMS) and trauma triage for 

elderly persons at crash sites; (4) developing and validating a big-data driven framework that 

estimates vehicle volume on an all-street network and quantifies the pedestrian and bicyclist 

crashes at all intersections. 

1.3 Contributions 

 
The main contributions of this dissertation can be classified into three aspects: (1) Uniqueness of 

the data; (2) Methodology innovations; (3) Applications. 

1.3.1 Uniqueness of the Data  

This dissertation utilizes multiple transportation big data sources and develops in-house data 

collection methods for model development and validation. Though extensive studies have 

leveraged transportation big data such as MDLD and probe vehicles data, transportation big data 

used in this dissertation include both probe vehicle data from the Regional Integrated 

Transportation Information System (RITIS) developed by the Center for Advanced Transportation 

Technology (CATT) and large-scale anonymized MDLD from the Maryland Transportation 

Institute (MTI). Both of the aforementioned data sources cover the entire U.S. and are able to 

capture high-granularity spatiotemporal traffic and movement across the country. Therefore, the 

methodology frameworks proposed in this dissertation have the advantage of generalization to 

expand to the entire U.S. In addition to these external data sources, an in-house data collection 

method is achieved by developing one of the most advanced Mobility-as-a-Service (MaaS) mobile 

application, incenTrip. incenTrip collects travel behavior data with user-confirmed ground truth 

information including trip origin and destination, departure time, and travel mode. The 

intermediate locations of each trip are also recorded for model development.  


4 

 
Apart from transportation big data, this dissertation also collects multimodal transportation 

networks and transportation statistics from multiple authorized sources, such as the 

OpenStreetMap (OSM), the Maryland Department of Transportation State Highway 

Administration (MDOT SHA), United States Department of Transportation (USDOT) Bureau of 

Transportation Statistics (BTS), USDOT National Transit Map (NTM). These publicly available 

data sources are used together with the transportation big data sources for model developments 

and validations. 

1.3.2 Methodology Innovations 

The methodology contribution of this dissertation focuses on the following aspect:  

(1) Integrating transportation big data with health-related data: This study is among the 

first to develop and demonstrate a methodological framework for integrating transportation-sector 

data with health-related data to support various decision-making scenarios in transportation safety, 

emergency responses, and trauma-care triage.  

(2) Computation algorithms for deriving travel behavior data from large-scale MDLD: 

The computation algorithms proposed in this study are developed in order to derive travel behavior 

data, such as activity locations, and travel modes, from large-scale MDLD. They are calibrated 

and validated against the existing travel surveys and annual vehicle miles of travel in order to serve 

real-world research needs. Also, these algorithms are further compiled and integrated into a data 

pipeline in the Amazon Web Service (AWS) platform which fully leverages the computing power 

and scalability of cloud computing techniques.  

(3) Scalable map matching and routing, weighting, and calibration algorithms that have 

superior transferability and generalization ability: The big-data framework proposed to estimate 

vehicle volume and pedestrian and bicyclist can be applied for every state in the U.S. The 


5 

 
algorithms are scalable, and the data sources proposed for weighting and calibration are generally 

available across the U.S. 

1.3.3 Applications 

The frameworks proposed in this dissertation have been calibrated, validated, and ultimately 

deployed in several real-world applications and have huge potential for boosting future 

transportation big data applications. These applications cover a wide range of topics, including 

travel demand management (incenTrip, https://incentrip.org), traffic operations and safety 

(Vulnerable Road User Density Exposure Dashboard project, https://mti.umd.edu/sdi), public 

health and mobility (University of Maryland COVID-19 Impact Analysis Platform, 

https://data.covid.umd.edu), travel behavior analysis and surveys (Next Generation National 

Household Travel Survey National Passenger/Truck Origin-Destination Data, 

https://nhts.ornl.gov/od).  

1.4 Organization 

 
https://incentrip.org/
https://mti.umd.edu/sdi
https://data.covid.umd.edu/
https://nhts.ornl.gov/od


6 

 
Figure 1-1. Dissertation Outline. 

 
The outline of this dissertation is organized as shown in Figure 1-1. Chapter 2 provides a 

comprehensive literature review about the state-of-the-practice transportation applications using 

and the state-of-the-art methodologies applied to transportation big data. Chapter 3 firstly scans 

the existing tools and metrics for traffic operations and safety analysis. Then a survey is designed 

and distributed to transportation professionals to identify key metrics for traffic operations and 

safety analyses. Chapter 4 develops a framework that leverages large-scale probe vehicle data to 


7 

 
improve the accuracy and reliability of emergency medical services (EMS) and trauma triage 

decision scenarios for the elderly population in crashes. Chapter 5 proposes a big data-driven 

framework. The proposed framework leverages cloud computing techniques to digest terabytes of 

transportation big data and produces an important operational metric, vehicle volume, on the all-

street network and further estimating the corresponding pedestrian and bicyclist crashes in Chapter 

6. Finally, Chapter 7 summarizes the conclusion and suggests future research directions. 

 
8 

 
Chapter 2: Literature Review 

2.1 Transportation Big Data Applications 

In this section, existing applications based on transportation big data are reviewed. The 

transportation big data are categorized into three types: Global Positioning Service (GPS) data, 

Cellular and sighting data, and Location-based Service (LBS) data, where a majority can be also 

categorized as Mobile Device Location Data (MDLD). The applications for each type are reviewed 

and the state-of-the-art methods are summarized. 

2.1.1 GPS Data 

The earliest and most widely used type of transportation big data is that obtained from GPS 

technology, where personal longitudinal location data is collected via GPS data loggers. Since the 

mid-1990s, researchers began investigating the possibility of using GPS data to enhance the quality 

of travel surveys. The initial version of the GPS data logger could only be installed in a vehicle 

and charged by the vehicle’s battery [3-11]. The vehicle location was recorded in each second 

when the vehicle was moving [6].  This approach can significantly improve the spatiotemporal 

accuracy of travel surveys by recording the exact origin and destination as Ill as the trip start and 

end times, but it only captures vehicle trips. Later, the wearable GPS further allowed respondents 

to carry them so that trips traveled by non-vehicle travel modes could also be recorded [12-15]. 

Some travel surveys utilized both in-vehicle and wearable GPS data loggers to take advantage of 

both technologies [16-18]. 

Since the GPS data can offer accurate locations of the devices, access to individual-level 

trajectories is highly restricted. Therefore, the individual-level GPS data are also aggregated by 

private sector companies to reveal travel demand without raising privacy concerns. For instance, 


9 

 
INRIX Traffic collects GPS probe data from commercial vehicle fleets, connected vehicles, and 

mobile device applications [19]. RITIS also started to incorporate the probe vehicle data into their 

commercial products. The data can be further aggregated into link- or corridor- levels to provide 

a real-time estimation of traffic speed and travel time [20-22]. Nonetheless, the low penetration 

rate (i.e., 2%-10%) of the commercial probe vehicle data remains the core challenge with respect 

to drawing the whole picture of travel patterns.  

2.1.2 Cellular and Sighting Data 

Since mobile devices, such as smartphones and tablets, have gained in popularity, investigations 

into individual-level mobility patterns have become more practical. The cellular data, which are 

generated through communication between cellphones and cell towers when a phone call or a text 

message is made by the phone [23], have shown their great value in supporting large-scale travel 

demand analysis. In general, the cellular data can be categorized into Call Detail Record (CDR) 

and sightings [24]. Call Detail Record (CDR) data provide details on calls and messages, such as 

timestamp, duration, and locations of routing cell towers. Therefore, the location information of 

CDR data fully depends on the density of the cellular network and does not reflect the actual 

location of the device [24]. Similarly, sightings are also generated through communication with 

cell towers, but the actual location of the device is determined via triangular calculation [24]. Both 

types of cellular data have been widely used in studying human mobility patterns in the past two 

decades. For instance, Gonzalez, et al. combined two sets of CDRs to explore individual mobility 

patterns; one is composed of six months of records for 100,000 randomly selected anonymous 

individuals and the other is a complementary dataset capturing the locations of 206 mobile phone 

users every two hours for one week [25]. Further studies on human mobility have been conducted 

based on similar datasets [26-33]. Cellular data have also been widely applied in other research 


10 

 
areas such as social networks, residential location, and socioeconomic level [34-36]. Despite the 

large volume of data, cellular data are limited by their spatial and temporal resolution, which is 

determined by the density of cell towers and user cellphone usage [37]. However, on the positive 

side, cellular data require less advanced phones and should raise less concern about user privacy. 

2.1.3 Location-based Service Data 

Another type of transportation big data is Location-based Service (LBS) data, in which spatial 

information is generated when a mobile application updates the device’s location with the most 

accurate sources, based on the existing location sensors such as Wi-Fi, Bluetooth, cellular tower, 

and GPS [24, 38]. Compared to the CDR data, the LBS data can reflect the exact location of mobile 

devices and thus provide invaluable location information describing individual-level mobility 

patterns [24, 25-27, 38, 39]. Many applications have been developed using the LBS data. For 

instance, a recent smartphone-enhanced travel survey conducted in the U.S. used a mobile 

application, rMove, developed by Resource Systems Group (RSG), to collect high-frequency 

location data and let respondents recall their trips by showing the trajectories in rMove [40-43]. 

Airsage leveraged LBS data to develop a traffic platform that can estimate traffic flow, speed, 

congestion, and road user sociodemographic for every road and time of day [44].The Maryland 

Transportation Institute (MTI) at the University of Maryland (UMD) developed the COVID-19 

Impact Analysis Platform (data.covid.umd.edu) to provide insight on COVID-19’s impact on 

mobility, health, economy, and society across the U.S. [45-47]. 

In summary, transportation big data used in the literature are different in terms of 

spatiotemporal coverage of population and its mobility, as well as data quality, e.g., spatial 

accuracy and location recording interval (LRI) [48, 49]. The GPS data in general have the highest 

spatial accuracy (e.g., 10 meters) and the lowest LRI (usually 1 second), but usually cover only a 


11 

 
small percentage of the population, and thus cannot reflect population-level travel behavior 

without a statistical weighting process. Therefore, most of the GPS data are used as supplementary 

data sources for regional travel surveys. The cellular data and LBS data have significantly higher 

spatiotemporal coverage of the population than the GPS data because of the large penetration rate 

of cellphone and mobile devices in the U.S. However, the ground truth information is usually 

missing, The LRI for both types of data is high and has a larger variation depending on mobile 

device usage thus also has a larger variation [49]. In addition, although cellular data may have 

higher coverage, the spatial accuracy of the data and the temporal frequency of the pins are inferior 

to the LBS data. This is because cellular technology relies on the density of cell towers and does 

not reflect the actual location of the devices. Also, cellular data are generated based on calls and 

messages or a network-driven event which might lead to a lower number of events. 

2.2 Models and Algorithms for Transportation Big Data 

2.2.1 Trip End Identification 

The trip end identification algorithm for high-frequency data, i.e., GPS data, has been well-studied 

and used in practical applications [48]. To obtain accurate trip ends, the traditional way is the rule-

based trip end identification method. This type of method designs rules and parameters based on 

domain knowledge. The trip ends are obtained by applying the rules to location data point by point 

and at the same time examining the interrelation between two consecutive location points. The 

parameters used in these rules are mostly defined by domain knowledge, such as dwell time and 

speed [50-57]. In recent years, some researchers also leveraged supervised machine learning 

models as a supplement to the rule-based methods, which classify each location point as static or 

moving [58-60]. Different clustering methods are also applied to obtain trip ends by first 


12 

 
identifying people’s activity locations from the location data [61-64]. A recent study utilized a 

spatiotemporal clustering method with three combined optimization models to detect trip ends [64]. 

In recent years, there was also a special focus on deriving the trip ends from LBS data. A “Divide, 

Conquer and Integrate” (DCI) framework was proposed to process the LBS data to extract mobility 

patterns in the Puget Sound region [39]. The proposed framework combined a rule-based method 

and incremental clustering method to handle the bi-modally distributed LBS data. The results were 

aggregated at the census tract level and compared with household travel surveys. 

2.2.2 Travel Mode Imputation 

After the trip ends are identified, it is also important to impute the travel mode for each trip to 

obtain multimodal travel patterns. Travel mode imputation can be categorized mainly into two 

approaches: (1) trip-based approach; and (2) segment/point-based approach. The trip-based 

approach is based on the already identified trip ends, where each trip has only one travel mode to 

be imputed. The segment/point-based approach separates the trip into fixed-length segments (time 

or distance) or a single point and then imputes the travel mode for each segment or point [49]. 

Then the segments/points with the same travel mode are further merged to form a single-mode trip. 

Both previous trip-based approaches and segment/point-based approaches have used similar 

features in order to distinguish between different travel modes.


13 

 
Table 2-1. Studies on Travel Mode Imputation Methods. 

Author LRI Model Main Features Modes Acc. 

Gong et al. 2012 [54] / Rules 
Speed, Acceleration, Transit 

Stations, Transit Network 

Drive, Train, Bus, Walk, 

Bike, Static 
82.6% 

Stenneth et al. 2011 [65] 30 s RF 

Speed, Acceleration, Heading 

change, 

Bus location, Transit Network 

Drive, Bus, Train, Walk, 

Bike, Static 
93.7% 

Bruunauer et al. 2013 

[66] 
1-10 s MLP 

Speed, Acceleration, 

Bendiness 

Drive, Bus, Train, Walk, 

Bike 
92.0% 

Xiao et al. 2015 [68] 1 s BN 
Speed, Acceleration, Trip 

Distance 

Drive Bus, Walk, Bike, E-

Bike 
92.0% 

Nitsche et al. 2014 [67] 1 s DHMM 
Speed, Acceleration, 

Direction 

Drive, Bus, Motorcycle, 

Train, Tram, Subway, Walk, 

Bike 

65% - 95% 

Dabiri and Heaslip. 2018 

[71] 
1-5 s  CNN 

Speed, Acceleration, Jerk, 

Bearing Rate 

Drive, Bus, Train, Walk, 

Bike 
84.8% 

Bachir et al. 2019 [32] / BI Road and Rail Trip Counts Road, Rail / 

Vaughan et al. 2020 [73] / DNN 
Speed, Trip Distance, Land 

Use, Time of Day 

Drive, Bus, Active (Walk, 

Bike) 
87% 

Burkhard et al. 2020 [49] 
1 s subsampled 

to 5 min 
KNN, RF, etc. 

Speed, Public Transport 

Stops, and Lines 

Drive, Train, Tram, Bus, 

Walk, Bike 
/ 

Breyer et al. 2021 [74] / KNN etc. 
Road and Train Route 

Geometry 
Road, Train 95.5% 

* RF: Random Forest; MLP: Multi-Layer Perceptron; BN: Bayesian Network; DHMM: Discrete Hidden Markov Model; CNN: Convolutional 

neural Network; BI: Bayesian Inference; DNN: Deep Neural Network


14 

 
Table 2-1 summarizes typical methods and features that are used for travel mode 

imputation. According to the literature review done by Huang et al. 2019 and Burkhard et al. 2020, 

it can be observed that typical features include speed and acceleration [49, 54, 65-73]. Specifically, 

when the LRI is less than 10 seconds, the speed (speed variation) and acceleration features are 

more important in differentiating among different travel modes, which can be imputed solely from 

the data. When the LRI is relatively high, such as 30 s, additional features can be added to maintain 

the same level of accuracy such as real-time transit information [65], multimodal transportation 

network [49, 54, 65, 74], and sociodemographic information [70, 73]. However, most of these 

studies tested the algorithms using the low-LRI GPS data sample, which has frequent observations. 

Limited efforts have been spent on developing suitable algorithms for cellular data or LBS data 

that suffer from the high-LRI issue. Burkhard et al. examined the required spatial accuracy and 

LRI to accurately detect travel mode from the high-LRI MDLDs by subsampling the low-LRI GPS 

data [49]. They concluded that the LRI should be less than a minute to ensure the travel mode 

imputation accuracy. Bachir et al. developed a Bayesian Inference (BI) method to separate road 

and rail modes from the CDR data in the Greater Paris region by leveraging the road and rail trip 

counts from the travel survey [32]. Vaughan et al. trained a Deep Neural Network (DNN) model 

to separate drive, bus, and active modes with artificial CDR traces reconstructed from the travel 

survey data [73]. The model is applied to the real-world CDR data to obtain travel mode shares. 

Breyer et al. developed multiple classification methods using labeled CDR data to separate only 

the road and train modes between two OD pairs [74]. The major limitation of these studies is that 

either the study area is small (e.g., an OD pair or a region) or the method only separates easy-to-

detect modes (e.g., Road versus Rail). As Huang et al. 2019 mentioned in their review [48], the 

supervised machine learning methods have not been fully exploited yet due to the lack of ground 


15 

 
truth labeled data, and might be worth investigating for MDLDs, especially for the cellular data 

and LBS data.  Besides, rather than identifying easy-to-detect modes (e.g., rail versus road), their 

review suggests including more mode categories 

2.3 Transportation Big Data for Traffic Operations and Safety 

2.3.1 State-of-the-Practice on Crash Scene Decision Makings  

 
For a long time, older people’s limitations in traffic safety research were emphasized in discussing 

the contributing factors of crash injury severity, such as traffic conditions, roadway geometries, 

land use type, environmental conditions, and driver characteristics through different statistical 

analyses [75-79]. Also, numerous methods have been developed to predict the crash injury severity 

with the crash report data. Although crash severity prediction models have come into common 

discussion during the last decade in policy, research as well as practice, they are still suffering 

from a lack of clarity and accuracy regarding their interpretation and data availability. In turn, it 

limits the capability of applying these methods for real-time decision-making. In reality, two major 

decisions need to be made at the crash scene:  

• whether an EMS is needed: when someone is injured in a vehicle crash, the responding 

EMS providers must provide emergency care at the scene and then transport the patient 

to healthcare based on the injury severity [80]; 

• whether an injured person should be triaged to the trauma centers: If an EMS is 

dispatched, the EMS providers must not only determine the severity of the injury and 

initiate medical management, but also identify the most appropriate transport 

destination facility through a process called “field triage” [81].  

 
16 

 
Table 2-2. State-of-the-Art Methodologies of Trauma Triage 

Authors Method Outcome Factors Performance 

Scheetz et al. 

2007 [88] 

Decision 

Tree 

Trauma or Non-

trauma 

Age, Gender, Height, Light, 

Glasgow coma scale, Injury 

severity, etc. 

95.15% SE* and 

76.47% SP * for 

severe injury, 

83.1% SE and 

81.5% SP* for 

moderate injury  

Wang et al. 

2009 [83] 
Rule-based 

Trauma or Non-

trauma 

Glasgow coma scale, Blood 

pressure, Respiratory rate, 

Crash characteristics, 

Estimated traffic speed, etc. 

/ 

Sasser et al. 

2012 [84] & 

Davidson et 

al. 2014 [85] 

Rule-based 
Trauma or Non-

trauma 
Same as Wang et al. 2009 / 

Newgard et 

al. 2016 [89] 

Decision 

Tree 

Trauma or Non-

trauma 

Age, Gender, Glasgow coma 

scale, Blood pressure, 

Respiratory rate, Mechanism 

of injury, etc. 

92.1% SE* and 

41.5% SP* 

AtiksaIdparit 

et al. 2019 

[91] 

Statistical 

model 

Severe injury 

and death 

Age, Gender, Body mass 

index (BMI), Crash 

characteristics, EMS response 

time, Mechanism of injury, 

Physiological status, etc. 

90.2% SE* and 

75.9% SP* for 

severe injury, 

98.7% SE* and 

68.8 SP* for death 

Van Rein et 

al. 2019 [92] 

Statistical 

model 

Injury severity 

score 

Age, Glasgow coma scale, 

Blood pressure, Mechanism 

criteria, Penetrating injury etc. 

88.8% SE* and 

50.0* SP* 

Van der 

Sluijs et al. 

2019 [90] 

Decision 

Tree 

Injury severity 

score 

Age, Gender, Glasgow coma 

scale, Blood pressure, 

Mechanism of injury, Injury 

type etc.  

Not reported 

Magnusson 

et al. 2020 

[93] 

Rule-based 
RETTS-A* 

triage levels 

Dispatch medical index (DMI) 

including Chest pain, 

Extremity, Respiratory 

difficulties etc. 

81.0% SE* and 

64.0% SP* 

Shanahan et 

al. 2021 [94] 

Statistical 

model 

Injury severity 

score 
Same as Van Rein et al. 2019 

83.0% SE* and 

50.0% SP* 

* SE: Sensitivity, also called true positive rate or recall.  

* SP: Specificity, also called true negative rate.  

* RETTS-A: The Rapid Emergency Triage and Treatment System. 

 
17 

 
After an EMS team arrives at the scene, field triage decisions need to be made by EMS 

providers to determine whether the injured occupants should be sent to a trauma center. A recent 

study used the National Automotive Sampling System (NASS) to study the factors affecting triage 

decisions. Their results indicate that though injury severity and resulting mortality among the older 

group (age > 60) was higher than for younger counterparts, the older group is less likely to be 

transported to a trauma center [81, 82]. These findings emphasized that the triage decision 

significantly saves people’s lives, especially old people. With these considerations, several field 

triage decision guidelines were developed for reference. 

The universally used Field Triage Decision Scheme was revised by a National Expert Panel 

organized by the Centers for Disease Control and Prevention, where comprehensive crash and 

health-related data were used [83]. In 2012, the National Center for Injury Prevention and Control 

and the Division of Injury Response, in collaboration with the National Highway Traffic Safety 

Administration (NHTSA), Office of Emergency Medical Services, and in association with the 

American College of Surgeons, Division of Research and Optimal Patient Care also released the 

Guidelines for Field Triage of Injured Patients [84, 85]. These represent the latest development on 

rule-based guidelines for field triage. Several studies were also conducted focusing on validating 

these guidelines [86, 87]. Apart from these guidelines, data mining and decision tree methods, such 

as Classification and Regression Tree (CART) [88, 89] and gradient boosting decision tree [90] 

were also largely used by researchers to predict the trauma triage decisions. These studies are 

reviewed in Table 2-2.  

While several studies have touched on the strong association between crash severity and 

traffic conditions (e.g., [83]), few studies have linked trauma triage decision-making with 

transportation domain knowledge and/or transportation-sector data. With the recent engineering 


18 

 
advances in transportation big data and data-driven analytical methods, transportation-sector data 

becomes increasingly available, in terms of both the data coverage and the timeliness to support 

real-time or near real-time decisions such as EMS and trauma triage. Motivated by this cross-

disciplinary research needs, this study aims at filling the data gap with integrated transportation 

and health data. It contributes to the existing literature with a methodological framework that 

integrates relevant transportation-sector data sources including network characteristics, traffic 

volumes, and historical travel speed at the crash scenes into the health-related decisions. Such 

integration is also believed to contribute to an enhanced accuracy compared to existing studies (as 

shown in Table 2-2). This study then empirically tests the framework on EMS and trauma triage 

decision scenarios using Maryland datasets. Decision tree models are adopted due to their wide 

applications and proven capability in prediction. Results demonstrate that the integrated 

transportation and health data contribute to enhanced prediction accuracy, reducing under-triage 

for the elderly, and saving more lives from vehicle crashes. 

2.3.2 Estimating Vehicle Volume based on Transportation Big Data 

Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and 

loop detectors at a limited number of locations. These efforts require significant labor and cost for 

expansions. Researchers and private sector companies have also explored alternative solutions 

such as probe vehicle data, while still suffering from a low penetration rate. In recent years, along 

with the technological advancement in mobile sensors and mobile networks, Mobile Device 

Location Data (MDLD) have been growing dramatically in terms of the spatiotemporal coverage 

of the population and its mobility. Three ways of estimating vehicle volumes are reviewed below. 

Loop detectors are widely used to record traffic volumes and occupancy levels. These 

sensors are usually buried under the pavements to detect the induction change from the presence 


19 

 
of a vehicle. Kwon et al. 2003 developed an algorithm using data from single loop detectors to 

estimate truck traffic volumes [95]. The results showed a 5.7% error compared with the ground 

truth highway data. Loop detector data were also applied together with probe vehicle data to 

estimate queue length [96] and vehicle volume at a city-wide scale [97]. Although proven to be 

efficient in estimating vehicle volume, the high installation and maintenance cost of loop detectors 

limit their capability of being scaled up to cover the entire transportation network. Therefore, loop 

detector datasets are often incomplete and mostly unavailable at minor arterials and local streets. 

In the past two decades, MDLD have gained significant attention and have been utilized 

for estimating various traffic characteristics, including vehicle volumes. With the development of 

MDLD, estimating vehicle volumes at the city scale became a reality. Probe vehicles can record 

their trajectory data with high granularity (i.e., 1Hz). Based on the trajectory data obtained from 

probe vehicles, a wide range of methods can be used by researchers to solve transportation 

problems. Zhao et al. proposed novel methods to estimate queue length and vehicle volume based 

on the probability theory without prior information about the penetration rate or queue length 

distribution [98]. Guo et al. estimated vehicle volume and queue length at signalized intersections 

and proposed a new framework to optimize traffic signal control operations [99]. Sekuła et al. 

applied several machine learning and neural networks to estimate historical hourly vehicle volume 

between sparsely located sensors based on the probe vehicle data [100]. Shockwave theories were 

also applied to probe vehicle data by a few studies [101, 102]. 

Many studies have been conducted focusing on estimating traffic flow and detecting 

congestion using cellular data [103, 104]. Xing et al. utilized CDR with the Time Difference of 

Arrival (TDOA) positioning technique in order to estimate multimodal traffic volumes on different 

types of urban roadways by identifying three modes of travel – namely, drive alone, carpooling, 


20 

 
and bus [105]. The results showed that compared with the ground truth vehicle volume obtained 

from License Plate Recognition (LPR) cameras, the mean relative error was in the range of 17.1% 

to 25.7%, depending on the roadway type. Despite significant advances in positioning techniques, 

cellular data still suffer from low accuracy issues, whereas LBS data have a noticeable advantage 

due to utilizing different sources to accurately locate the user – a feature that has resulted in 

increased usage of this type of data by researchers and the private sector for estimating vehicle 

volume. Fan et al. developed a computing framework alongside a heuristic map matching 

algorithm to estimate Vehicle Miles of Travel (VMT) and AADT for the state of Maryland using 

INRIX data [106]. The results showed an R2 of 0.878 when fitting the estimated AADT with the 

ground truth AADT. Moreover, a number of state agencies conducted rigorous evaluations of 

vehicle volume obtained through traditional methods as well as from MDLD obtained by private 

sector companies. They found the latter to be a promising source for supplementing current surveys 

and traditional methods [107].  

2.3.3 Pedestrian and Bicyclist Crashes Estimation Methods 

According to the National Highway Traffic Safety Administration’s Traffic Safety Facts 2019 

Report indicates, in 2019, pedestrian and bicyclist fatalities accounted for nearly 20% of all traffic 

crash-related deaths in the U.S. [108]. In Maryland alone in 2019, 3,136 pedestrian crashes and 

848 bicycle crashes occurred, where over 90% of pedestrian crashes and over 80% of bicycle 

crashes resulted in injuries or fatalities [109, 110]. Approximately one out of every four individuals 

killed in traffic crashes in Maryland was a pedestrian [109]. 

Studies on pedestrian and bicyclist safety issues are abundant. They identify key 

contributing factors to pedestrian- and bicyclist-involved crashes as well as suitable methodologies 

for crash frequency analysis. To address the fundamental issues typically associated with crash 


21 

 
frequency data, previous research studies have employed various methodologies to analyze 

pedestrian- and bicyclist-involved crash frequency. Many factors have been suggested to play a 

role in pedestrian and bicyclist crashes, including those representing pedestrian and bicyclist risk 

exposure [111-116]. land use and the built environment [113, 117-120], and 

sociodemographic/socioeconomic status [113, 117, 118, 120, 121]. Among those, one of the 

important factors is vehicle volume, which significantly correlates with the frequency of pedestrian 

and bicyclist crashes. In this case, vehicle volume estimated from the MDLD can be integrated 

into existing traffic safety modelling methods to estimate pedestrian and bicyclist crashes for all 

intersections to promote traffic safety analysis. 

According to Lord and Mannering [122], one of the main issues characterizing crash 

frequency data is overdispersion, which happens when the standard deviation of the crash counts 

is considerably larger than the mean. The other issue that usually affects crash frequency data is 

having excess zeros, which happens when crash counts contain a significant number of zero values 

[116, 122]. To predict pedestrian and bicyclist crash frequency at intersections, Saad et al. [115] 

used bicycle crowdsourced data from Strava [123] and developed a negative binomial (NB) model. 

They found that the frequency of bicycle crashes at intersections was positively associated with 

intersection size, the intersection being a signalized intersection, the number of intersection legs 

being four (compared to three-legged intersections), as well as total entering vehicle volume. The 

study also indicated that the frequency of bicycle crashes at intersections was negatively associated 

with the presence of a bike lane at those intersections. Raihan et al. [116] used a zero-inflated 

negative binomial (ZINB) model to develop crash modification factors (CMFs) for bicyclist 

crashes in Florida’s urban areas. They found that road design characteristics such as lane width 

and speed limit had positive effects on reducing bicycle crashes. Lower bicycle crash probabilities 


22 

 
on segments were associated with increased bicycle activity. However, increased bicycle activity 

was associated with higher bicycle crash probabilities at intersections. Increased bicycle crash 

probabilities at intersections were also associated with the number of bus stops within the 

intersection influence area as well. Ukkusuri et al. [117] examined the role of various built 

environment, land use, road network, and sociodemographic factors as well as key exposure 

measures including traffic volume, transit ridership, and proportion of nonmotorized trip-makers 

in the frequency of total, injury-causing, and fatal pedestrian crashes. The study employed NB and 

ZINB models to estimate crash frequency and found that increased numbers of total and/or fatal 

pedestrian crashes were associated with increased proportions of industrial and commercial land 

use, increased transit ridership, increased numbers of subway stations, increased proportions of 

intersections with four and five approaches, increased proportions of primary roads without access 

restriction, and increased number of lanes. Sanders et al. [119] employed Poisson regression to 

examine the role of various factors in pedestrian exposure at intersections as well as bicycle 

exposure at various road segments in Seattle, Washington. They found that variables representing 

population and land use (i.e., number of households, number of commercial properties, and the 

presence of a university near the intersection) were significantly associated with pedestrian 

exposure at intersections. Moreover, bicycle exposure was associated with the number of bicycle 

lanes on the road segment and land use variables such as the presence of a university or a school 

near the count location. The findings of that study provided insights into the factors affecting 

pedestrian and bicyclist risk exposure, which is a key contributing factor to pedestrian and bicyclist 

crashes. Jestico et al. [124] used a crowdsourced bicycling incident dataset for the Capital Regional 

District in British Columbia, Canada, to identify design attributes associated with unsafe 

intersections between multi-use trails and roads. NB regression was used to model the links 


23 

 
between the number of bicycle crashes and near-miss incidents and the infrastructure 

characteristics at multi-use trail-road intersections. The results showed that factors associated with 

bicycle incident frequency at multi-use trail-road intersections included bicycling volumes, vehicle 

volumes, and trail sight distance. 

Many other studies also investigated factors affecting pedestrian and bicyclist safety risk 

exposure and modeled pedestrian- and bicyclist-involved crash frequency. The key contributing 

factors affecting pedestrian/bicyclist safety exposure and crash frequency that emerge from the 

literature include: sociodemographic and socioeconomic factors such as proportion of the 

population by race or age group [113, 117, 118, 120, 121]; land use and built environment factors 

such as population density, employment density, activity diversity, bus stop density, and ratio of 

residential, industrial, and commercial uses [113, 118-120]; and traffic- and travel-related factors 

such as vehicle, pedestrian, and bicycle volumes as exposure measures [111-116]. 

Further, the literature review reveals that the most prominent methodologies that have been 

applied to pedestrian and bicyclist crash frequency analysis are Poisson regression, negative 

binomial (NB) regression, zero-inflated Poisson (ZIP) regression, and zero-inflated negative 

binomial (ZINB) regression [111, 120, 122-125]. The Poisson regression is usually considered the 

starting point in crash frequency modeling [111]. Moreover, while the ZIP and ZINB regression 

methodologies have frequently been applied in empirical research to account for the 

preponderance of zeros observed in crash count data, the ZINB regression is applicable for count 

data that exhibit both overdispersion and excess zeros issues [116]. 

 
24 

 
Table 2-3. Examples of Past Studies on Pedestrian and Bicyclist Safety Models 

Study 
Unit of 

Analysis 
Study Area 

Safety 

Measure 
Methodology 

Key Exposure 

Measure(s) 

Ukkusuri et al. 

2012 [117] 

Census tract, zip 

code 

New York City 

(NYC), NY 

Total pedestrian crashes, 

severe crashes, and fatal 

crashes 

NB, ZINB 
Traffic volume, pedestrian 

activity, operating speeds 

Hosseinpour et 

al. 2012 [111] 
Road segment 

Federal Road 

Network, Malaysia 

Frequency of pedestrian 

crashes 
Poisson, NB, ZIP, ZINB Motorized traffic volume 

Lee et al. 2015 

[121] 
Zip code 

Various locations 

in FL 

 
Pedestrian crashes per 

crash location zip code, 

crash-involved pedestrians 

per residence zip code 

Bayesian Poisson lognormal 

simultaneous equations 

spatial 

error model 

Log of population, log of 

vehicle miles traveled 

Sanders et al. 

2017 [119] 

Intersection, 

road segment 
Seattle, WA 

Pedestrian and bicyclist 

counts 
Poisson model — a 

Jestico et al. 

2017 [124] 

Multi-use trail 

intersection 

Capital Regional 

District, British 

Columbia, Canada 

Frequency of bicyclist 

crash and near miss 

incidents 

NB 
Pedestrian, bicyclist, and 

vehicle volumes 

Xie at al. 2017 

[113] 

Grid cell 

(300×300 ft2)  

Manhattan (NYC), 

NY 
Pedestrian crash cost Tobit model 

Vehicle miles traveled, taxi 

trips, subway ridership 

Mansfield et al. 

2018 [120] 
Census tract United States 

Frequency of pedestrian 

fatalities 

NB, ZINB, ZINB mixed 

model 

Vehicle miles traveled 

density (thousand 

VMT/mi2) by roadway 

functional class 

Saad et al. 2019 

[115] 
Intersection 

Orange County, 

FL 

Frequency of bicycle 

crashes  
NB 

Total entering volume, 

bicycle volume 

Raihan et al. 

2019 [116] 

Intersection, 

road segment 
Urban areas, FL 

Bicycle crash modification 

factors 
ZINB 

Bicycle activity (Strava 

volumes) [122] 

Lee et al. 2019 

[125] 
Intersection 

Orange and 

Seminole 

Counties, FL 

Pedestrian crashes NB, ZINB 
Observed and predicted 

pedestrian trips 

Notes: — a: This was an exposure study; therefore, the exposure measures were the response variables in the models (i.e., pedestrian and bicyclist counts). 

 
25 

 
Considering factors and methodologies used in exposure and crash analyses for vulnerable 

road users, Table 2-3 summarizes a few previous pedestrian and bicyclist safety studies. Overall, 

the literature review reveals that while pedestrian and bicyclist safety risk analyses are becoming 

more data-driven, usage of consistent and reliable exposure data such as crowdsourced big data in 

conducting pedestrian and bicyclist crash analyses remains scarce—particularly with regards to 

pedestrians. This study aims at addressing that gap in empirical research by utilization of mobile-

device location big data in analysis of pedestrian and bicyclist crashes. 

  
26 

 
Chapter 3: Identification of Metrics Used for Various Levels of Traffic 

Analysis 

3.1 Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis 

This section reviews the state-of-the-practice models, tools, and metrics developed by State 

agencies such as the Department of Transportation (DOT) and Metropolitan Planning 

Organizations (MPOs) or universities for planning and designing transportation projects while 

considering systemic feasibility and efficiency, including for traffic operations and safety. 

Transportation project decisions require cooperative actions across various organizations, offices, 

and working groups within an organization when the plans cover different municipal areas or 

techniques governed by multiple authorities. Many different tools and methods are available to 

support the quantitative analysis of TSM&O and traffic operations strategies in planning and 

programming. Based on the U.S. Department of Transportation’s (USDOT) Federal Highway 

Administration’s (FHWA) Applying Analysis Tools in Planning for Operations report, the 

following tools can be used for analyzing strategies at various levels of the planning process.  

• Sketch planning and prioritization tools for highway needs inventory (e.g., Tool for 

Operations Benefit-Cost Analysis – TOPS-BC, MOSAIC) 

• Travel demand models (e.g., MSTM, BMC InSITE, MWCOG model) with postprocessors 

(e.g., Intelligent Transportation System (ITS) Deployment Analysis System – IDAS) 

• Analytical tools (e.g., Highway Capacity Manual and traffic signal optimization tools). 

• Microscopic simulation models (e.g., VISSIM, AIMSUN) 

• Mesoscopic simulation models (e.g., DTALite, DynusT) 

 
27 

 
Table 3-1. State-of-the-Practice Models, Tools, and Metrics for Various Levels of Traffic operations and Safety Analysis 

State/Agency Model Descriptions 

Sketch-Planning Tools 

FHWA 
ITS Deployment Analysis 

System (IDAS) 

The objective of IDAS is to estimate the impacts and costs resulting from the 

deployment of various ITS components. 

Northeastern Illinois IDAS 

IDAS is used to evaluate four types of ITS deployment: electric toll collection, 

freeway variable message signs, electric transit fare collection system, and transit 

vehicle signal priority. 

Ohio-Kentucky-Indiana IDAS 

The components of Advanced Regional Traffic Interactive Management Information 

System (ARTIMIS) are evaluated using IDAS, including closed-circuit TV cameras, 

electronic dynamic message signs, traveler advisory telephone service, highway 

advisory radio, freeway service patrol vans, ramp and reference makers, vehicle 

detectors, total station electronic surveying equipment and operations control center. 

Michigan IDAS 

The components of Temporary Traffic Management System (TTMS) are investigated, 

including closed-circuit TV cameras, portable dynamic message signs, detection 

devices for traffic queueing and construction zones, video monitoring stations, 

telephone/web-based traveler information, and a traffic management center. 

Florida DOT 

Florida Standard Urban 

Transportation Model 

Structure (FSUTMS)  

FSUTMS can produce various performance measures including vehicle miles of 

travel, vehicle hours of travel, average speed, number of accidents, fuel consumption, 

monetary benefits to users and/or agency, and emissions. 

CalTrans 
California Life-Cycle 

Benefit/Cost (Cal-B/C) 

Cal-B/C uses a set of spreadsheet-based tools that cover multi-modal analysis of 

highway, transit, bicycle, pedestrian, ITS, operational improvement, and passenger rail 

projects. 

University of South 

Florida 

Trip Reduction Impacts of 

Mobility Management 

Strategies (TRIMMS) 

TRIMMS allows quantifying the net social benefits of a wide range of transportation 

demand management initiatives in terms of emissions reductions, accident reductions, 

congestion reductions, excess fuel consumption, and adverse global climate change 

impacts by estimating changes in travel behavior. 

New York State DOT 
ITS Options Analysis 

Model (ITSOAM) 

ITSOAM has three components including Delay Model, Safety Model, and 

Environmental Benefits Model. 

Post-Processing Analysis 


28 

 
Florida DOT 

Integrated Regional 

Information Sharing and 

Decision Support System 

(IRISDS) 

IRISDS is a web-based platform that provides decision support for estimating and 

predicting system performance using data mining techniques, traffic analysis, and 

simulation modeling. 

Florida DOT 
Florida ITS Evaluation 

(FITSEVAL) 

FITSEVAL evaluates the benefits and costs of thirteen different ITS deployment 

alternatives and can assess the mobility, safety, environmental, and monetary benefits 

and produces estimates of the present-worth and benefits-cost ratios of ITS. 

Florida DOT 

ITS Data Capture and 

Performance Management 

(ITSDCAP) 

ITSDCAP conducts ITS evaluations based on ITS data and four types of ITS can be 

evaluated including incident management, ramp metering, smart work zone, and road 

weather information system. 

Virginia DOT 

Virginia System 

Operations Performance 

Reports (VSOPR) 

 VSOPR assesses four categories of measures including Traffic, Incidents, Traveler 

information, and ITS device reliability. 

Wisconsin DOT 
Summary of ITS 

evaluation methods 

The evaluation process consists of nine steps and assesses four types of measures, 

including Performance metrics, Benefits valuation measures, Net benefits, and B/C 

ratio. 

Multi-Dimensional Models 

Florida DOT FITSEVAL 

FITSEVAL uses the output of the FSUTMS modeling environment under CUBE, 

which quantified Congestion/Mobility, Safety, Environmental and energy, and Agency 

and user costs measures. 

Oregon DOT 
Analysis and modeling 

tools 

Statewide Integrated Model (SWIM), SWIM2, Land Use Scenario DevelopR 

(LUSDR), DTA, VISSIM, etc. 

Maryland DOT 

InSITE ABM-DTALite 

and SILK AgBM-

DTALite 

InSITE ABM-DTALite is the result of integrating a DTA tool based on an existing 

DTALite model that covers the InSITE ABM. SILK AgBM-DTALite is an agent-

based microsimulation travel demand model. 

Maryland DOT 
Maryland Integrated 
Travel Analysis Modeling 

System (MITAMS) 

MITAMS has a special focus on various applications ranging from short-term and 

long-term applications. 

Ohio DOT and 

Kentucky 

Transportation Cabinet 

ARTIMIS 

ARTIMIS aims to optimize freeway system efficiency, improve safety and benefit air 

quality. It includes over 80 cameras, 57 center-lane miles of fiber-optic cable, 

approximately 1100 detectors, and numerous freeway message signs in Cincinnati. 

University of Florida 
Corridor Simulation 

(CORSIM/TSIS) 

The Traffic Software Integrated System (TSIS) integrates with the microscopic TRAF 

tools of CORSIM, namely FRESIM for freeway simulation and NETSIM for surface 

arterials and network simulation.  


29 

 
Based on the FHWA’s Operations Benefit/Cost Analysis Desk Reference, there are in 

general three types of tools: 

• Sketch-planning tools can provide a simple, quick, and low-cost estimation of operational 

strategy benefits and costs. Examples include spreadsheets that rely on generally available 

data as Ill as static cause-effect relations between strategies and their impacts. Usually, 

these are inexpensive to use but have a high inaccuracy or risk. 

• Post-processing analysis tools seek to link the evaluation of operations with the travel 

demand, network data, and performance measure outputs from regional travel demand and 

simulation models. They are often more capable of assessing the impacts of the route, mode, 

or temporal shifts than sketch-planning methods but tend to cost more. 

• Multi-dimensional models are the most complex and costly, but typically provide a high 

level of confidence in the accuracy of the results. They are often used to integrate various 

analyses (e.g., a travel demand model and a DTA simulation) to estimate the full range of 

impacts of operations strategies or transportation projects. 

Table 3-1 summarizes the state-of-the-practice models, tools, and metrics used by various 

DOTs and MPOs. These models and tools usually largely rely on traditional transportation data 

collection methods, such as loop detectors and manual counting. 

3.2 Operations Practice Scan Survey 

Based on the literature review in Section 3.1, the selection of analysis, modeling, and simulation 

tools, and the corresponding performance metrics vary during each stage of the transportation 

planning and operations process and should serve analytical purposes. At the long-range planning 

stage, it is impractical to apply the most complex tool for each conceived traffic operations project. 

Sketch planning tools or travel demand model postprocessing tools may be more suitable. At the 


30 

 
Transportation Improvement Program (TIP) and project planning stages, mesoscopic and 

microscopic traffic simulation tools may be considered for traffic operations project studies. Multi-

scenario and multi-resolution tools for estimating travel reliability impact under different weather 

and accident conditions may also be added at these stages to provide more comprehensive 

information to support decision-making. Post-project evaluation could rely on existing 

performance monitoring dashboard tools such as the Regional Integrated Transportation 

Information System (RITIS).  

To obtain a standard workflow for prioritizing tools and metrics for transportation planning 

and operations, a dedicated survey is designed to collect insights from stakeholders including 

transportation practitioners from federal, state, and local agencies and other private-sector 

professionals with experience with performance evaluation of transportation projects. The 

objective of the survey is to help understand what performance measures are needed to make 

decisions at the planning, construction, and operations stages of a transportation project. Different 

types of projects and the level of analyses and metrics required to make reasonable 

recommendations have been identified and reviewed. A flowchart framework that documents the 

best practice metrics used in evaluating projects for different stages of planning and operations 

processes was produced to support transportation planners and engineers in their decision-making.  

In this survey, three major stages of the general transportation project are identified, 

including: Feasibility & Planning, Design & Construction, and Maintenance & Operations. Under 

each stage, various performance metrics have been used to evaluate the project are listed. Based 

on these presumptions and categorization, the survey questions focus on understanding (1) best 

practices in performance metrics chosen for the evaluation of projects at different stages of the 

traffic operations planning process; (2) the usefulness of the metrics; and (3) challenges and 


31 

 
potential solutions for data and additional metrics that would offer insights. The survey collected 

78 usable responses from the web-based survey. 

3.3 Survey Results 

Table 3-2. States for which the Respondents Work. 

State Number of Respondents State Number of Respondents 

Georgia 1 Mississippi 1 

North Carolina 1 Nebraska 2 

Maryland 50 Pennsylvania 3 

South Carolina 1 Washington 3 

Virginia 3 Wyoming 2 

Maryland, Virginia, 

District of Columbia 
3   

 
Figure 3-1. Agencies that the Respondents Work at. 

 
Figure 3-2. Projects that the Respondents Work on. 


32 

 
The detailed survey and results are documented in Appendix I. As shown in Table 3-2, 

Figure 3-1, and Figure 3-2, in total there are 79 respondents completed the survey, either filled 

online distributed by Maryland Department of Transportation State Highway Administration 

(MDOT SHA) or filled in-person using electronic devices during the Transportation Research 

Board Annual Meeting (TRBAM). Table 3-2 shows the states for which the respondents are 

currently working (nine of them did not select locations). Figure 3-1 summarizes the agency 

distribution of the respondents from across the U.S. Most of the respondents are from Counties 

(26), Local Municipalities (17), State Departments of Transportation (14), and Private Consulting 

Firms (9), while the remaining (13) are from other organizations. As shown in Figure 3-2, the 

respondents have mixed backgrounds, with 37 of them working most frequently on highway 

projects, 16 on arterial projects, 13 on pedestrian and bike projects, and 8 on transit-related projects 

(four of them did not select projects). 


33 

 
Figure 3-3. Projects that the Respondents Work on. 


34 

 
3.4 Performance Metrics Flowchart 

Based on the survey results from 79 respondents, a flowchart (see Figure 3-3) is produced that 

documents the best practice performance metrics used in prioritizing and evaluating transportation 

projects. The complete survey questionnaire and results can be found in Appendix I. 

In the Feasibility and Planning stage, the project type was further refined to reflect industry 

needs, knowing that best-practice metrics would differ structurally within different project types. 

Six types of transportation projects were identified separately: 

• Mobility:  focused on reducing congestion delays, typically capacity improvements, micro-

mobility infrastructure, transit solutions, etc;  

• Reliability: focused on maximizing existing operations, such as technology deployments 

to manage the transportation system more effectively;  

• Safety: focused on systematically and holistically promoting safety, using metrics such as 

the severity of crashes, high rate of crashes, vulnerable user interactions with vehicles, and 

freight design concerns;  

• Environmental:  focused on managing environmental impact, sustainability, 

energy/emissions, and public health. Metrics could also include stream restoration and 

flooding mitigation;  

• Socio-economic: focused on economic revitalization, food desert programs, equity-related, 

etc.;  

• Recreational: focused on trails, visitor rest stops, etc.  

Based on these predefined project types, typical performance metrics used for each type of 

project were reviewed and listed in the survey questions to facilitate the post-processing of 

responses. Respondents were then asked to rank the frequency of these performance metrics when 


35 

 
performing a planning-level analysis or feasibility assessment. In case metrics were missing from 

the list, the respondents were asked to fill in an open-ended section with the metrics they felt were 

relevant to the question. As shown in the flowchart, during the feasibility and planning stage, more 

frequently used performance metrics were identified based on the respondents’ responses. Below 

is a summary of the major findings: 

• For mobility-related projects, “Delay”, “Travel Time” and “Volume-to-Capacity Ratio” 

are the three most frequently used performance metrics. Respondents also suggest metrics 

such as “Government Operations (e.g., Resource allocation, Master plan conformance, 

Equipment availability)” and “Multimodal Mobility (e.g., Mode share, Transfer time, 

Bicycle network). 

• For reliability-related projects, in addition to the commonly used “Travel Time Index”, the 

“Planning Time Index” and “Total Trip Time by Modes” are also frequently used. 

Respondents also suggest metrics such as “Congestion Impact (e.g., Delays, Buffer index, 

Wait times)” and “Safety Impact (e.g., Level of comfort/safety)”. 

• For safety-related projects, the typical performance metrics include “Crash Reduction”, 

“Conflict Reduction” and “Fatality Reduction”. Respondents emphasized the importance 

of “Pedestrian and Bicyclist Safety (e.g., Pedestrian movement)”. Some other frequently 

used performance metrics mentioned by the respondents include “Incident Rate (e.g., 

Incident rate per mile)” and “Speed Limits & Speed of Nearby Traffic”. 

• For environment-related projects, “Natural Resource Impact” and “Emission Reduction” 

are deemed frequently used. Respondents also suggest “Hazardous Impacts (e.g., Flood 

planning, stormwater planning)”, “Environmental Impacts (e.g., Exposure per person to 

emission, Noise impact)”, and “Cost of Environmental Testing”. 


36 

 
• For socio-economic-related projects, “Land Use”, “Employment”, and “Regional 

Economic Development” are deemed frequently used. Respondents also suggest 

“Community Impacts (e.g., Community revitalization, Older adult demographic)” and 

“Access to Public Transportation”. 

• For recreation-related projects, “Number of Trail Users”, “Visitation” and “Recreation 

Event Participation” are deemed necessary for decision-making. Respondents also suggest 

“Trail Conditions (e.g., Width of trails, Nexus to other network, Barrier separation)”. 

In the Design and Construction stage, the actual design and construction plan for the project should 

be the main consideration. Therefore, when the project moves to the Design and Construction stage, 

most performance metrics are used to determine whether the project has critical failure points. In 

the survey, respondents are asked to rank the performance metrics used to examine project 

performance. The survey results helped identify four major performance metrics to support the 

decision-making, including “Project Cost”, “Cost/Benefits Ratio”, “Public Opposition”, and 

“Major Design Flaw”. Furthermore, a list of standard performance metrics was also suggested by 

respondents based on their own experience. These standard performance metrics included “Is the 

mix of projects to be funded annually a reasonable distribution across modes?”, “Is the project still 

within anticipated cost?”, “Adequate Public Facilities Ordinance (APFO)”, “Cost and O&M 

Projects”, “Travel Time”, and “Level of Service”.  

During the Maintenance and Operation stage, the focus shifts to how to maintain and 

operate the project at the expected levels. Based on the results, these can be measured with “On-

time Performance”, “Alternative Routes”, “Bridge Condition”, “State of Good Repair”, “Age of 

Transit Flee”, “Surface Condition”, “Signage Availability”, “Sufficient Funding”, “Clear Making 

(e.g. marking for crosswalks, travel lanes)”, “Reporting Issues” and “Priority Lists”. 


37 

 
3.4 Summary 

In summary, this chapter presents a dedicated survey to help understand what performance 

measures are needed to make decisions at the planning, construction, and operations stages of a 

transportation project. Different types of projects and the level of analyses and metrics required to 

make reasonable recommendations are identified and reviewed. A flowchart framework that 

documents the best practice metrics used in evaluating projects for different stages of planning and 

operations processes is produced to support transportation planners and engineers in their decision-

making.  

Among six types of projects, “Mobility” and “Safety” projects are the two types that are 

closely related to traffic operations and safety research which aims at congestion reduction and 

safety improvement. Based on the flowchart, it can be seen that for “Mobility” projects, delay, 

travel time, and volume-to-capacity ratio metrics are deemed most frequently used. For “Safety” 

projects, crash/fatality rate and crash/fatality reduction are of the greatest concerns to 

transportation professionals when planning transportation projects. In order to quantify the 

importance of these performance metrics, mainly three types of upstream data are needed, namely 

vehicle volume, travel time (or traffic speed), and crash rate (or crash count). As indicated in the 

literature, instead of collecting data manually or using traditional technologies, transportation big 

data can be used to estimate these upstream data. In this dissertation, probe vehicle data in RITIS 

and large-scale MDLD are used to develop big-data driven frameworks to support crash-related 

decisions, estimate vehicle volumes, and pedestrian and bicyclist crashes, which ultimately support 

“Mobility” and “Safety” transportation projects. 


38 

 
Chapter 4: Supporting Triage Decisions for High-Risk Trauma Patients 

at Crash Sites with Location Data 

4.1 Introduction 

As identified in Chapter 3, when planning for “Safety”-related transportation projects, the most 

considered, and important metrics are crash/fatality rate and crash/fatality reduction. This chapter 

focuses on incorporating one type of transportation big data, probe vehicle data, as well as the 

annual average daily traffic data to support crash-related decisions. More specifically, to estimate 

the emergency medical services (EMS) and trauma triage decisions at crash scene in order to 

reduce fatality rate caused by severe injuries.  

Although the research on improving EMS efficiency marks contemporary healthcare 

service and traffic safety discussions, its roots can be traced back to the 2000s, when 

epidemiologists, health researchers, and practitioners stressed the vulnerability of the old age 

persons who are involved in traffic crashes [126-128]. When a vehicle crash occurs, the decision 

for EMS and field trauma triage must be made in a timely manner to save lives, especially for elder 

persons [129]. According to a Population Bulletin, “Aging in the United States,” _the number of 

US seniors (age 65+) is projected to nearly double from 52 million in 2018 to 95 million by 2060, 

and the 65-and-older age group’s share of the total population will rise from 16 percent to 23 

percent. As they have passed through each major stage of life, baby boomers have brought both 

risks and challenges to the transportation, infrastructure, and healthcare institutions. 

“Under-triage” often occurs in the medical decision process, where a large proportion of 

seriously injured older patients are transported to non-trauma hospitals or fail to be transported at 

all [89, 130-134]. This leads to a significant mismatch between the supply side from hospitals and 


39 

 
the demand side from those patients. Therefore, the gap not only degrades the health outcomes but 

also imparts the whole Illbeing from a longer perspective [75, 76]. On the other side, transferring 

the under-triaged patients to trauma centers inappropriately, might also waste time and public 

resources which could be used to help other crash victims. Thus, it is important for policymakers, 

health-related practitioners and scholars to be equipped with a tool for better evaluating and 

analyzing the efficiency of the current EMS system in the era of data-driven problem-solving.  

4.2 The Big-Data Driven Framework Integrating Transportation and Health Data 

 
Figure 4-1. The Big-Data Driven Framework for Integrating Transportation and Health Data. 

The overall big-data driven framework is illustrated in Figure 4-1. It consists of two main pillars. 

Pillar 1 is the integration of the transportation big data and health datasets (shown on the left of 

Figure 4-1). Health and safety-related data, such as crash, EMS, and hospital triage records, is 

typically stored in Crash Outcomes Data Evaluation System (CODES) in Maryland. These 

CODES data are mapped to the roadway network (HERE network is used for its availability) using 

their geolocation information and then joined with datasets from the transportation sector through 


40 

 
a spatiotemporal map matching. The two most important transportation datasets, i.e., the traffic 

volumes and the historical time-dependent travel speed, are obtained at the roadway segment level 

from the Annual Average Daily Traffic dataset (AADT) and large-scale probe vehicle data sources 

(available from RITIS, the Regional Integrated Transportation Information System). These two 

transportation datasets are then matched to the network using the Traffic Message Channel (TMC). 

With the integrated dataset, information beyond crashes themselves becomes available, 

including the EMS involvement, hospital triage, as well as traffic and roadway conditions at the 

crash scene (volumes, travel speed, weather, road surface conditions, etc.). Pillar 2 of the 

framework employs the integrated dataset to evaluate decision-making. Classification models of 

decision trees are developed to model EMS and trauma decisions. This big-data driven framework 

enables the analysis of a wide spectrum of modeling methods in various application contexts. 

4.2.1 Integrated Transportation and Health Data  

In this research, CODES dataset was with transportation data, including the roadway network, link 

level volume information (i.e. Annual Average Daily Traffic, AADT), and link level observed 

travel speed information obtained via probe vehicle information. In this section, the three major 

data sources are described. 


41 

 
Figure 4-2. CODES Data in the state of Maryland. 

Table 4-1. Descriptive Statistics of the CODES Data 

Variable Name Type Description 

sex Binary Gender 

1 = female (53.5%); 2 = male (45.8%); 99 = unknown (0.7%) 

speed_limi Categorical Speed limit  

5mph (2.5%); 10mph (2.3%); 15mph (3.3%); 20mph (0.9%); 

25mph (16.2%); 30mph (13.9%); 35mph (16.8%); 40mph (13%); 

45mph (8.2%); 50mph (7.2%); 55mph (11.9%); 60mph (0.3%) 

65mph (2.8%); 70mph (0.5%) 

age Numeric Age 

min=65; max=109; mean=73; standard deviation=6.9 

eldveh Binary If the person shared a vehicle with an elderly person  

A constant value of 1 

damage Binary If the vehicle is disabled or destroyed due to the crash 

1=yes (43.8%); 0=no (56.2%) 

eject Binary If the person is ejected 

1=yes (0.4%); 0=no (99.6%) 

notbelted Binary If the belt is used improperly 

1=yes (2.7%); 0=no (97.3%) 

light_code Categorical Light condition 

0=Not Applicable (1.3%); 1=Daylight (77%); 3=Dark Lights On 

(13%) 

4=Dark No Lights (3.9%); 5=Dawn (1.4%); 6=Dusk (2.4%) 

7=Dark- Unknown Lighting (0.7%); 88=Other (0.2%); 

99=Unknown (0.2%) 

collision_ Categorical Collision type 

0= Not Applicable (1.3%); 1=Head On(2.6%); 2= Head On Left 


42 

 
Turn(7.2%); 3=Same Direction Rear End(29.7%); 4= Same 

Direction Rear End Right Turn(3.9%); 5= Same Direction Rear End 

Left Turn(1.4%); 6= Opposite Direction Sideswipe(1.3%); 7= Same 

Direction Sideswipe(6.5%); 8= Same Direction Right Turn(2.3%); 

9= Same Direction Left Turn(2.5%); 10= Same Direction Both Left 

Turn(0.4%); 11= Same Movement Angle(20.3%); 12= Angle Meets 

Right Turn(0.6%); 13= Angle Meets Left Turn(0.7%); 14= Angle 

Meets Left Turn Head On(0.4%); 15= Opposite Direction Both Left 

Turn(0.2%); 17= Single Vehicle (11.8%); 88=Other (10.8%); 

99=Unknown (0.3%) 

harm_event Categorical Subjects involved in accidents 

0=Not Applicable (0.9%); 1=Other Vehicle (78.1%); 2=Parked 

Vehicle (4.5%); 3=Pedestrian (3.9%); 4=Bicycle (0.6%); 5=Other 

Pedalcycle (0.05%); 6=Other Conveyance (0.05%); 7=Railway 

Train (0.004%); 8=Animal (0.8%); 9=Fixed Object (7.2%); 

10=Other Object (0.5%) ; 11=Overturn (0.1%); 12=Spilled Cargo 

(0.03%); 13=Jackknife (0.01%); 14=Units Separated (0.01%); 

15=Other Non-Collision (0.2%); 16=Off Road (1.5%); 

17=Downhill Roadway (0.01%); 18=Explosion or Fire (0.1%); 

19=Backing (0.09%); 20=U-turn (0.01%); 21.15=Immersion 

(0.01%); 22.15=Fell Jumped from Motor Vehicle (0.04%); 

23.15=Thrown or Falling Object (0.06%); 88=Other (0.3%); 

99=Unknown (0.08%) 

belt_use Categorical Type of belts 

1=Combined lab-shoulder protection (83.1%); 2= shoulder only 

(7.9%); 8= lap only (7.7%); 0=no restraint use (1.2%) 

emstrans (Output) Binary EMS decision 

1=sent via EMS (18%); 0=not transported via EMS (82%) 

trauma (Output) Binary Trauma triage decision 

1=sent to trauma center (0.8%); 0=not sent to trauma center (99.2%) 

 
1) CODES data: The crash data are collected from Crash Outcome Data Evaluation System 

(CODES). The dataset includes crash scene information of car crashe