ABSTRACT Introducing Frameworks to Analyze Human Title of Dissertation: Mobility Behavior with Advanced Computational Algorithms and Machine Learning Methods Using Mobile Device Location Data Aref Darzi, Doctor of Philosophy, 2022 Dissertation directed by: Professor Lei Zhang, Department of Civil and Environmental Engineering The emergence of mobile device location data (MDLD) provides new opportunities to analyze human mobility behaviors. The large penetration rate and the possibility of observing human mobility behaviors continuously are among the most important features of the passively collected mobile device location data. However, to utilize MDLD in mobility behavior analysis, comprehensive computational algorithms need to be developed to carefully process the data. This research proposes novel sets of frameworks to extract mobility context from the raw MDLD. First, this study introduces a set of algorithms to construct the travel behavior of mobile device owners along with the non-observable attributes of both trips and travelers by extracting trips, identifying significant activity locations of the travelers such as their home and work locations, and imputing the travel mode. The proposed algorithms in this study were tested against the state-of-practice and state- of-art algorithms developed in the literature. The proposed algorithms were shown to have superior performance compared to other methods. Next, this study further examines the usefulness of the proposed framework in providing near real-time insights on the evolution of human mobility behavior during the Coronavirus disease 2019 (COVID-19) pandemic. As a part of this study, a new metric has also been introduced to measure the social distancing practices from the mobility perspective. Additional investigations are also conducted to understand the linkage between the outbreak of COVID-19 and the mobility behavior of the communities. Lastly, this study seeks to develop a framework to investigate the evacuation behavior of individuals during a natural disaster and construct the evacuation evolution patterns and decisions based on the MDLD. This dissertation evaluates the importance of the historical mobility behavior of the device owners in their decision- making procedure during natural disasters using statistical discrete choice models. INTRODUCING FRAMEWORKS TO ANALYZE HUMAN MOBILITY BEHAVIOR WITH ADVANCED COMPUTATIONAL ALGORITHMS AND MACHINE LEARNING METHODS USING MOBILE DEVICE LOCATION DATA by Aref Darzi Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2022 Advisory Committee: Professor Lei Zhang, Chair Professor Katharine Abraham, Dean?s Representative Professor John C. Haltiwanger Professor Deb A. Niemeier Professor Erkut Y. Ozbay ? Copyright by Aref Darzi 2022 Acknowledgments I am so grateful to all who have supported me throughout this journey. I would like to express my deepest appreciation to my advisor and committee chair, Professor Lei Zhang, for his continuous support, vast knowledge, and inspiring advices throughout my Ph.D. study at the University of Maryland. Without his guidance and persistent help, this dissertation would not have been possible. I would also like to thank my committee members, Professor Katharine Abraham, Professor John C. Haltiwanger, Professor Deb A. Niemeier, and Professor Erkut Y. Ozbay for their encouragement, insightful comments, and valuable suggestions to my research. I have learned a lot during the past several years from all of you and I would not be where I am without you all. Thank you. This dissertation work would not have been possible without the help and guidance of my colleagues, Dr. Sepehr Ghader, Dr. Chenfeng Xiong, Dr. Yixuan pan, Mofeng Yang, Qianqian Sun, Aliakbar Kabiri, and Guangchen Zhao. I am truly grateful to them, and all of those with whom I have had the pleasure to work during the past several years at the University of Maryland. I would like to also use this opportunity to thank all my friends who have been like family to me, especially Shahrzad Saffari who has been there to support me since the first day of my Ph.D. journey. I would also like to thank Maryland Transportation Institute at the University of Maryland for their financial support. ii I could not have done this without the unconditional love and support of my family. Special thanks to my parents Ali and Khadijeh, and my siblings Hadi, Hamed, and Maedeh. Thank you for always being there for me and showing me what is truly important in life. iii Table of Contents Acknowledgments......................................................................................................... ii Table of Contents ......................................................................................................... iv List of Tables ............................................................................................................... vi List of Figures ............................................................................................................. vii Chapter 1: Introduction ................................................................................................. 1 1.1. Overview ....................................................................................................... 1 1.2. Objectives ..................................................................................................... 3 1.3. Contributions................................................................................................. 4 1.4. Organizations ................................................................................................ 6 Chapter 2: Literature Review ........................................................................................ 8 2.1. Evolution of Mobile Device Location Data ....................................................... 8 2.2. Extracting Device- and Trip-level Information from MDLD .......................... 12 2.3. Impact of Mobility Behavior during Pandemic ............................................... 19 2.4. Disaster Evacuation Behavior Analysis ........................................................... 20 Chapter 3: Data ........................................................................................................... 24 3.1. Data Cleaning and Preprocessing .................................................................... 25 3.2. Data Summary ................................................................................................. 26 Chapter 4: Deducing Device- and Trip-Level Information ........................................ 29 4.1. Home and Work Location Identification ......................................................... 29 4.1.1. Comparisons with Alternative Home Identification Algorithms .............. 37 4.1.2. Home and Work Location Identification Validation ................................ 43 4.2. Tour and Trip Identification............................................................................. 48 4.2.1. Home-based Tour Identification ............................................................... 50 4.2.2. Trip Identification for Short-distance Tours ............................................. 51 4.2.3. Trip Identification for Long-distance Tours ............................................. 54 4.3. Trip Mode Detection ........................................................................................ 57 4.3.1. Data Collection for Travel Mode Imputation ........................................... 57 4.3.2. Construction of Classification Features .................................................... 59 4.3.3. Model Structure ........................................................................................ 61 4.3.4. Empirical Results ...................................................................................... 64 Chapter 5: MDLD in Action for Pandemic Studies ................................................... 69 5.1. Methodology .................................................................................................... 69 5.1.1. Weighting .................................................................................................. 70 5.1.2. Core Mobility Metrics............................................................................... 71 5.1.3. Social Distancing Index ............................................................................ 72 5.2. Results .............................................................................................................. 76 5.2.1. The effectiveness of the Social Distancing Index (SDI) ........................... 76 5.2.2. State-level Mobility Pattern Changes ....................................................... 78 5.2.3. County-level Mobility Pattern Changes.................................................... 83 5.3. Summary and Discussion ................................................................................. 85 Chapter 6: MDLD in Action for Disaster Evacuation ............................................... 88 iv 6.1. Introduction ...................................................................................................... 88 6.2. Data .................................................................................................................. 89 6.2.1. Location Data ............................................................................................ 89 6.2.2. Evacuation Zone Data ............................................................................... 89 6.2.3. Socio-Demographic Data .......................................................................... 91 6.3. Methodology .................................................................................................... 92 6.3.1. Home Location Identification ................................................................... 93 6.3.2. Evacuation Detection ................................................................................ 93 6.3.3. Historical Mobility Behavior Pattern ........................................................ 95 6.4. Constructing the Evacuation Pattern ................................................................ 95 6.4.1. Stay or Evacuate ....................................................................................... 96 6.4.2. Departure and Reentry Date Distribution ................................................. 97 6.4.3. Destination Choice: Distance to Evacuation Destination ....................... 100 6.5. Statistical Model ............................................................................................ 105 6.6. Summary and Discussion ............................................................................... 108 Chapter 7: Conclusions and Remarks for Future Work ........................................... 111 7.1. Summary of Contributions ............................................................................. 111 7.2. Future Directions ........................................................................................... 114 Bibliography???????????????????????????..116 v List of Tables Table 1. Literature review on travel mode detection methods .................................... 18 Table 2. A synthetic sample of LBS data ................................................................... 24 Table 3. Geohash cell dimensions at the equator ........................................................ 30 Table 4. Descriptive statistics on the distances between the imputed home locations 39 Table 5. Trajectory features description ..................................................................... 59 Table 6. Goodness of fit measures for different travel mode detection models ......... 66 Table 7. Confusion matrix comparison of RF model and the wide and deep learning model........................................................................................................................... 67 Table 8. List of core mobility metrics calculated to capture the COVID-19 impact on mobility ....................................................................................................................... 72 Table 9. Descriptive statistics for the core metrics ..................................................... 74 Table 10. Spearman?s rank correlation coefficient between SDI and infection rate for the top five and bottom five states regarding the cumulative number of confirmed cases. ........................................................................................................................... 82 Table 11. Spearman?s rank correlation between SDI and infection rate for the top ten counties regarding the cumulative number of confirmed cases .................................. 85 Table 12. Evacuation decision based on the evacuation order received ..................... 97 Table 13. Data description and summary for evacuation choice model ................... 106 Table 14. Logistic regression models? summary ...................................................... 107 vi List of Figures Figure 1. Sources of mobile device location data ......................................................... 2 Figure 2. Device sampling rate for the month of February 2020 (a) at the county level, (b) at the state level ..................................................................................................... 27 Figure 3. The density map of anonymized location data across the nation (brighter shades represents a higher density of sightings) ......................................................... 28 Figure 4. Calibration results for selecting the number of minimum observed hours. 35 Figure 5. Sensitivity analysis in temporal similarity ratio using MDLD .................... 37 Figure 6. County-level resident estimates from different methods and ACS ............. 39 Figure 7. County-level 90th-percentile distances between the imputed home locations ..................................................................................................................................... 40 Figure 8. County-level 95th-percentile distances between the imputed home locations ..................................................................................................................................... 41 Figure 9. County-level 99th-percentile distances between the imputed home locations ..................................................................................................................................... 42 Figure 10. County-level resident estimates comparison between MDLD and ACS .. 43 Figure 11. County-level normal commuter estimates from MDLD, ACS, and LODES ..................................................................................................................................... 44 Figure 12. County-level commuting flow estimates from ACS and LODES ............. 45 Figure 13. County-level commuting flow estimates from MDLD and LODES......... 46 Figure 14. County-level commuting flow estimates from MDLD and ACS .............. 47 Figure 15. County-level commuting distance distribution ......................................... 48 Figure 16. Tour identification and trip chaining demonstration ................................. 50 Figure 17. Recursive trip identification algorithm for short-distance tours ................ 53 Figure 18. Trip identification framework for long-distance tour ................................ 56 Figure 19. The user interface of the smartphone GPS data survey app ...................... 58 Figure 20. Multimodal transportation network of the study area ............................... 60 Figure 21. The wide and deep learning framework .................................................... 61 Figure 22. Temporal changes of state-level Social Distancing Index ........................ 77 Figure 23. Social Distancing Index heatmap for all states .......................................... 79 Figure 24. Temporal changes of Social Distancing Index in the top five and bottom five states regarding the cumulative number of confirmed cases. .............................. 81 Figure 25. Temporal changes of Social Distancing Index in the top ten counties according to the cumulative number of confirmed cases. ........................................... 84 Figure 26. Florida map by evacuation order and date during Hurricane Irma ........... 91 Figure 27. Disaster evacuation analysis framework flowchart ................................... 92 Figure 28. Departure and reentry date distribution ..................................................... 99 Figure 29. Relationship between departure date and evacuation order date ............. 100 Figure 30. Distribution of evacuation destination distance to the home locations ... 101 Figure 31. Median distance traveled to evacuation destination at county level ....... 102 Figure 32. Evacuation duration distribution across different evacuation order groups ................................................................................................................................... 103 vii Figure 33. Average evacuation duration at the county level .................................... 103 Figure 34. Elevation impacts on evacuation decisions ............................................. 104 viii Chapter 1: Introduction 1.1.Overview Understanding the mobility pattern of humans both at individual and aggregated levels is an integral part of transportation studies (1, 2). Analyzing people?s mobility behavior provides important inputs such as how, from where to where, and when people travel for planners and decision-makers. Traditionally, researchers utilized statistical analysis and modeling frameworks to digest people?s mobility patterns and forecast their behavior in the future mostly based on travel surveys and questionnaires from a limited sample size but with a good depth of information (3, 4). Obtaining such datasets are costly and the cumbersome procedure of collecting such datasets makes the traditional source of transportation data not reflect the real-world observations in a timely manner (5). In addition to these shortcomings, the low sample penetration rate, limited period of data collecting, and underreported trips are among the other issues of the traditional data sources which deterred the progress of mobility behavior analysis to some extent. With the emergence of new technologies, novel sources of data including mobile device location data (MDLD) become accessible to transportation researchers to supplement travel surveys or substitute them in the past two decades. The mobile device location data includes cell phone network location data, GPS devices, and smart mobile phones. Figure 1 shows different types of technologies used as MDLD. 1 Figure 1. Sources of mobile device location data As shown in Figure 1, devices as ordinary as cell phones without location services technology as well as dedicated GPS devices in vehicles can contribute to the MDLD data collection. A typical MDLD consists of several core elements including anonymized device ID (either temporary or persistent), the location coordinates of the device, and the timestamp of the event. In addition to these core attributes, location accuracy measurements, speed, and regional time zone offset are among other common features of the data. Several key advantages of the MDLD over the traditional data sources have attracted researchers? attention in recent years. The continuous recording of device movements passively and objectively, the unprecedented population coverage, and their lower costs compared to traditional surveys are among the most notable features of the MDLD. These advantages encouraged researchers to employ MDLD as a 2 complementary or even stand-alone data source for the human mobility analysis, including estimation of aggregate-level traffic and travel patterns such as OD tables (6, 7), spatio-temporal human activity pattern analysis (8), Monitoring, modeling, and management of human mobility behavior during extreme conditions such as natural disaster(9), and modeling the human interactions in several contexts including the spread of the disease (10). Despite all the merits that MDLD possesses, there are several limitations associated with the passively collected data that need to be carefully handled in order to fully exploit this technology. First, MDLD provides no information related to the device owner to preserve the privacy of the data subjects. In addition to no socio- demographic and behavior semantics of the devices, analyzing travel behavior from the MDLD also requires certain data processing and methodological frameworks to extract mobility behavior from the rich spatio-temporal trajectory information of each device. The increasing interest in leveraging MDLD for different applications and the abovementioned limitations of this data, generate the motivations for this research. 1.2.Objectives This study has three key objectives. The first objective is to develop a comprehensive framework to analyze and derive the human mobility pattern from MDLD. To achieve this goal, a set of advanced computational algorithms and machine learning methods have been developed to add context to the MDLD and provide necessary 3 information for further investigating human mobility behaviors through these datasets. Throughout the Coronavirus disease 2019 (COVID-19) pandemic, the importance of informed decision-making becomes more and more apparent. Therefore, as a part of this study, a framework has been developed to provide near real-time information on the mobility pattern of the communities. Based on the mobility behavior information, a new metric has been introduced to measure the social distancing practices in the communities and to provide more insights on how closely mobility behavior and the outbreak of COVID-19 are linked together. Lastly, this study seeks to develop a novel framework to investigate the evacuation behavior of individuals during a natural disaster. With the ubiquitous coverage of the MDLD, this study tries to develop a framework to construct the evacuation evolution patterns. In addition to constructing the evacuation decisions, historical mobility behaviors of the device owners were analyzed to further investigate the determinants of the evacuation decision-making procedure based on the revealed mobility characteristics of people. 1.3.Contributions The first contribution of this study is to introduce a set of new computational algorithms and machine learning methods to extract mobility behavior from MDLD. For this purpose, this study first introduces a set of data preprocessing methods to clean the raw MDLD data. Next, to infer the significant activity location of devices, a 4 framework has been developed to detect the home and work locations. For the devices with the identified home location, a novel tour-based algorithm is introduced to identify both tours and trips from MDLD. To further investigate the characteristics of the trips, a machine learning method is introduced to detect the mode of travel. The second contribution of this study is to employ the developed algorithms on a large-scale mobile device location dataset to provide near real-time insights during the COVID-19 pandemic. Non-pharmaceutical interventions (NPI) are considered one of the most effective strategies during the COVID-19 pandemic, especially before the emergence of the vaccines. Mobility restrictions play an important role in containing the virus spread and therefore, understanding how people react to control measures become increasingly important for the decision-makers. To assess the mobility behavior of the communities, this study developed a framework to measure the mobility of the people and constructs a new metric, the social distancing index (SDI), to formulate different aspects of mobility into a single metric that captures the essence of the mobility behaviors related to social distancing practices in different communities. The third contribution of this dissertation is to build upon the developed algorithms to analyze the individuals? evacuation behavior during the course of a natural disaster such as a hurricane. Complexities of human decision-making procedures and lack of timely data in these situations make the management and planning of the evacuation operations more challenging. To address this need, this study introduces a novel framework to construct the evacuation decisions of people including whether they 5 evacuate, the departure time and reentry time of evacuations, and their destination choices. Furthermore, this study investigates the determinants of evacuation decisions using statistical models. 1.4.Organizations The second chapter is dedicated to a comprehensive literature review covering the evolution of state-of-art computational algorithms and imputation methods based on MDLD. Chapter 3 describes the features of the MDLD data along with preprocessing and data cleaning steps needed for the MDLD. In chapter 4, the developed computational algorithms and machine learning methods are introduced. First, the significant activity location identification is described followed by the description of tour based trip identification algorithm. At the end of this chapter, the details of the proposed travel mode imputation algorithm are presented. Chapter 5 shows the MDLD data and discussed algorithms in chapter 4 in action for assessing human mobility during the COVID-19 pandemic. In this chapter, the development of the social distancing index is described. People?s mobility behaviors are evaluated based on SDI and the relationships between government orders and the severity of the virus outbreak with the people?s behavior are assessed. In Chapter 6, a novel framework for constructing evacuation evolution patterns is introduced. The results from implementing the framework on the MDLD are summarized in this chapter. Further investigations on how the historical mobility behavior of individuals would impact their decisions toward evacuation are also conducted in the latter part of the chapter by developing two binomial logit choice 6 models. Finally, the summary of conclusions and remarks for future works are presented in Chapter 7. 7 Chapter 2: Literature Review In this chapter, a comprehensive literature review and practice scan have been conducted to cover various topics that are discussed in this dissertation. I grouped the previous research efforts into four subsections. First I summarized the evolution of mobile device location data. In the second part, I reviewed the efforts conducted to extract trip- and device-level information from different types of MDLD. Then, I presented the studies that investigated the importance of mobility behavior in the outbreak of disease. Lastly, this chapter ended with reviewing the studies utilizing MDLD data for evacuation behavior analysis. 2.1. Evolution of Mobile Device Location Data The earliest attempts to utilize MDLD in the transportation domain started at the end of the last century. In the beginning, the Global Position System (GPS) data loggers were used to collect the longitudinal location data from the survey respondents in order to enhance the quality of the travel diaries (11). The early generation of the GPS data logger required a steady electricity flow and was designed to be implemented in vehicles only using vehicle batteries. The Lexington Area Travel Data was the first survey that utilized in-vehicle GPS technology and proved that the collected GPS data could successfully supplement the traditional approach of collecting manual input from the survey subjects. The in-vehicle GPS data collection has shown to significantly improve the spatiotemporal accuracy of travel records in the survey by capturing the origin and destination of the trips as well as the start time and end time of the trips by collecting the vehicle location second by second while the vehicle is on 8 (12-17). It has also been shown that in-vehicle GPS data could help to mitigate the issue of underreporting trips and misreporting the trip mileage and travel time estimates (18). The limitation of early GPS technology was that it could only capture the movement of vehicles. As GPS technology improved over the year, wearable and handheld GPS devices helped to record trips made by other modes of transportation by allowing the survey respondents to carry the device. The wearable GPS technology was widely used in travel surveys throughout the past decade (19-21). As travel surveys used both GPS technologies more commonly, several shortcomings remain unresolved including the possibility that users may forget to carry the wearable GPS devices or may consider carrying the device burden and the fact that for some devices, the trip information verification interface is not provided. As data collection through dedicated GPS devices gets more attention in transportation domains, several research studies investigated the means to extract travel information from the GPS data systematically. Shen and Stopher (2014) revisited methods used for GPS data processing through a review paper (22). They summarized the methodological efforts on GPS data processing for travel survey use cases into three categories: (1) trip/segment identification, (2) travel mode detection, and (3) trip purpose imputation. In addition to the studies that focused on complementing or replacing the travel surveys (23-25), the GPS data has been utilized for other transportation applications as well. Schonfelder et al. (2002) investigated the feasibility of leveraging longitudinal GPS data to analyze travel behavior. The study used GPS data obtained 9 from about 400 private and commercial vehicles over the period of two years (26). Papinski et al. (2009) explored the route choice decision-making process by comparing the planned route choice of 31 individuals in Ontario, Canada with their taken route choice observed by a person-based GPS device (27). As the in-vehicle GPS devices become more popular in everyday cars, some private-sector data vendors aggregate such data to provide travel statistics such as travel time, travel speed, link volume estimates, and origin-destination patterns (28). Several scientific reports have assessed the validity of the travel metrics estimated based on these datasets (29, 30). Since a new generation of mobile devices including mobile phones, smartphones, and tablets, have gained popularity in the past two decades, a new opportunity arises to investigate the human mobility pattern in a more practical approach. The first generation of mobile phone location data was generated using the communications between cellphones and cellular towers (31) based on two different approaches: (1) Call detail record (CDR) data also called event-driven mobile phone data provides details of phone calls and messages including the user id of both sender and receiver, the type of the telecommunication transaction, duration of the transaction, timestamp, and the cell tower ID(s); (2) Network driven mobile phone data that is mainly used by the network carriers to monitor the loads on cell towers or a group of towers named Location Area (LA) to optimize their services (32). In both approaches, the location information is recorded either based on the location of the tower which makes the location accuracy dependent on the density of cellular towers, or in a more precise approach using triangulation algorithms which provide the accuracy of 200 to 300 10 meters on average (7). Both types of network-based datasets have been used widely to study human mobility patterns in the past two decades. Gonzalez et al. (2008) employed two sets of CDR data to understand human mobility patterns at the individual level (1). In their study, they used CDR data composed of six months of records from 100,000 anonymous individuals selected randomly from a dataset of more than 6 million mobile phone users along with a second dataset that records the location of 206 mobile phone users for every two hours in an entire week. Further studies have been conducted to continue the exploration of human mobility behaviors using a similar dataset (33-38). The CDR datasets are also applied to other research domains such as social network analysis, residential location and population estimation, and predicting socioeconomic levels (39-41). Despite a high penetration rate in the CDR, the data has limitations on both spatial and temporal regards. The spatial accuracy is either confined by the cell towers? density in the network or the accuracy of the triangulation methods. The temporal frequency of observations are also limited by the frequency of the communication transactions such as call and messages. Location-based service (LBS) data is another source of MDLD which collects spatial and temporal information when a mobile device application updates the device?s location by using the most accurate sensor among the existing sensors such as embedded GPS sensor, Bluetooth, Wi-Fi, or cell tower (42, 43). Compared to the CDR, the LBS data possess a higher location accuracy and therefore provide invaluable location information to analyze the individual-level mobility pattern (44). The technology has been used in various transportation-related applications recently. 11 Resource System Group (RSG) has conducted a smartphone-enhanced travel survey using a mobile application developed by their team, rMove (45). AirSage developed a traffic platform based on LBS data which estimates traffic characteristics of the vehicle movements such as traffic flow, speed, and congestion along with the road user sociodemographic information (46). In brief, the MDLD sources used in the transportation field are different in several aspects including spatiotemporal coverage of population and their mobility, data quality, e.g. spatial accuracy and location recording interval (LRI), and ease of access to the data (47, 48). The GPS data has the highest horizontal location accuracy (e.g, 10 meters) and the lowest LRI (usually 1 second) while its population coverage is usually very limited and thus cannot represent the mobility behaviors of the entire population. The cellular and LBS data have significantly higher spatio-temporal coverage compared to the GPS data due to the large penetration rate of cellphone and smart mobile devices. However, the data is limited to the spatio-temporal attributes and the LRI for both datasets is high and biased toward users that have more interactions with their devices. 2.2. Extracting Device- and Trip-level Information from MDLD As the MDLD becomes more accessible to researchers and along with the new developments in the technology, many studies have investigated the extraction of trip information from the raw MDLD. Gong et al. (2014) summarized the methodological attempts conducted to derive personal trip information from GPS data (49). Their reviews included four aspects of the data processing to extract reliable trip 12 information including trip identification, trip mode imputation, trip purpose detection, and data error recognition that may influence the algorithms. To accurately obtain the trip ends, the first set of algorithms developed used the rule-based trip identification methods that mainly relied on designed rules and corresponding parameters based on the domain knowledge. The rules consider the location data either point by point or several consecutive points at the same time to examine the status of the points whether they are dynamic or stationary. The attributes used in the rule-base models are mostly considering the dwell time, speed, and distance (50-60). Recently, supervised learning machine learning methods are also utilized to supplement the rule-based models to classify the sightings as moving or static (61-63). Unsupervised learning methods such as spatiotemporal clustering algorithms have also been employed for trip end detection. Yao et al. utilized a spatiotemporal clustering method with three layers of optimization models to identify trip ends (64). With the emergence of the LBS data, additional attempts have been made to identify trips. Wang et al. introduced the ?Divide, Conquer and Integrate (DCI)? framework to extract trip ends from multi-sourced data to analyze mobility patterns (44). In their proposed framework, they combined a rule-based algorithm with an incremental clustering method to handle the LBS data with bi-modal nature. After trip identification, imputation of non-observable attributes is important in order to add context to the identified trips. Significant activity locations such as home and work, trip mode, and trip purpose are among the most important missing attributes. 13 Home and work location identification are developed based on activity location identification methods. In CDR datasets, as the location records mainly correspond to the cell tower, the area covered by observed cell towers with specific conditions are considered as the significant activity locations. However, for the datasets providing location sightings such as LBS data, the latitude and longitude of each sighting are recorded. Therefore, to analyze the significant activity locations, clustering algorithms have been employed to aggregate static sightings and to identify the home and work area. The algorithms developed to identify the significant activity locations can be categorized into seven classes: threshold-based methods, supervised machine learning, distance-based clustering, model-based clustering, incremental along with K-means clustering, density-based clustering, bi-level modeling framework, and agglomerative clustering approach. Wolf et al. developed a spatial and temporal threshold-based method to detect moving and non-moving sightings by checking all pairs of consecutive points using GPS data (15). Yang et al. and Zhou et al. trained supervised machine learning models to detect static and moving sightings by constructing a feature set from their training datasets (63, 65). Ye et al. and Calabrese et al. investigated a distance-based clustering algorithm by detecting significant stops as a group of consecutive location points that the maximum distance between any pair of points is not larger than the distance threshold and the dwell time is not smaller than the temporal threshold (66, 67). Chen et al. explored the model-based clustering approach to detect significant stops using a Gaussian Mixture Model (68). Wong and Chen developed an incremental approach along with the k-mean clustering method to cluster sightings based on the distance threshold. After identifying clusters they used 14 a duration threshold in a later step to detect activity locations. The two thresholds were found by trial and error in their investigation (43). Unsupervised machine learning algorithms such as density-based clustering methods have also been investigated to identify the activity location. These sets of algorithms require the number of minimum points and spatial distances to form the cluster. These two parameters are usually selected via trial and error or observations from the raw trajectories (69-72). Wang et al. introduced a bi-level modeling approach by dividing the dataset into two subsets based on their quality. They applied the distance-based clustering algorithm to the high-quality subset and employed the incremental clustering approach to the low-quality subset. The two subsets were integrated by the spatiotemporal relationship at the end (44). The agglomerative clustering method has also been used to complement the previous methods. In this approach, the algorithm consolidates activity locations that are spatially close to each other but may be far away in time (73). Once the significant activity locations are identified, the activity type such as home and work location should be imputed for each place. The behavior-based and context- based methods are among the most used approaches that have been developed for activity type inference (42). The behavior-based approach classifies the home and work location based on the visiting frequency of the place, the dwell time of each activity location, and the time of day pattern observed in each location (7, 74). On the other hand, the context-based approach utilizes features of the location mainly including the land use type and nearby point of interest (POI) to infer the activity types with predefined empirical rules (75-78). 15 The behavioral approach has been considered the most widely used method to identify daily life centers such as home and work locations. To determine the daily life centers, Flamm and Kaufmann proposed the criteria of individuals spending at least 20 percent of their time based on their investigation on the Moby drive dataset that contains six-week survey period information (79). Calabrese et al. proposed grids of 500*500 meters (1640.42*1640.42 ft) to label the activity location. They considered grids with most night-time observation, the period from 6 p.m. to 8 a.m., as the home location. The work locations were similarly identified as the most frequent observed grid on weekdays between 8 a.m. and 10 a.m. They validated their results against the Census Transportation Planning Products (CTPP) (80). In addition to the data-driven approaches, supervised learning methods have also been considered in identifying the activity location. Isaacman et al. developed a feature set of five observable attributes and derived 15 factors by ranking and calculating the percentage of the observable attributes. A logistic regression model has been trained based on the feature of 15 factors using a labeled dataset collected from 18 volunteers (81). After identifying trips, the mode of the trip is another important aspect of the mobility behavior that needs to be imputed. Travel mode imputation can be categorized into two approaches mainly: trip-based approach; and segment/point-based approach. The trip-based approach is based on the already identified trips to detect a single travel mode for the entire trip while in the segment/point-based approach, the travel mode for each segment or point is being imputed separately (48). Then the segments/points with the same travel mode are merged to form a trip with a single mode. To 16 distinguish the mode, both approaches have used similar features. Table 1 summarized the feature sets used in the travel mode imputation previously (82). 17 Table 1. Literature review on travel mode detection methods Author LRI Model* Main Features Modes Acc. Speed, Acceleration, Transit Drive, Train, Bus, Walk, Gong et al. 2012 / Rules 82.6% Stations, Transit Network Bike, Static Speed, Acceleration, Heading Drive, Bus, Train, Walk, Stenneth et al. 2011 30 s RF change, 93.7% Bike, Static Bus location, Transit Network Speed, Acceleration, Drive, Bus, Train, Walk, Bruunauer et al. 2013 1-10 s MLP 92.0% Bendiness Bike Speed, Acceleration, Trip Drive Bus, Walk, Bike, E- Xiao et al. 2015 1 s BN 92.0% Distance Bike Drive, Bus, Motorcycle, Speed, Acceleration, Nitsche et al. 2014 1 s DHMM Train, Tram, Subway, Walk, 65% - 95% Direction Bike Speed, Acceleration, Jerk, Drive, Bus, Train, Walk, Dabiri and Heaslip. 2018. 1-5 s CNN 84.8% Bearing Rate Bike Bachir et al. 2019 / BI Road and Rail Trip Counts Road, Rail - Speed, Trip Distance, Land Drive, Bus, Active (Walk, Vaughan et al. 2020 / DNN 87% Use, Time of Day Bike) 1 s subsampled Speed, Public Transport Stops Drive, Train, Tram, Bus, Burkhard et al. 2020 KNN, RF etc. - to 5 min and Lines Walk, Bike Road and Train Route Breyer et al. 2021 / KNN etc. Road, Train 95.5% Geometry * RF: Random Forest; MLP: Multi-Layer Perceptron; BN: Bayesian Network; DHMM: Discrete Hidden Markov Model; CNN: Convolutional neural Network; BI: Bayesian Inference; DNN: Deep Neural Network. 18 Based on the literature review conducted by Huang et al. and Burkhard et al (47, 48), speed and acceleration are among the typical features of mode imputation studies (48, 58, 83-91). Especially, when the location recording interval (LRI) is less than 10s, the speed variation, and acceleration features are more important to differentiate between various travel modes. On the other hand, when the LRI becomes relatively higher (e.g. more than 30 seconds), the importance of additional features is becoming higher to maintain the same level of accuracy. Real-time transit information (83), multimodal transportation network (48, 58, 83, 92), and socio-demographic information (88, 91) are among additional features that have been investigated in past studies. 2.3. Impact of Mobility Behavior during Pandemic As MDLD gain popularity in studying human mobility behavior in recent years, the application of this data source has been proven to be a great asset for decision-makers amid the current COVID-19 pandemic. The effect of mobility patterns and non-pharmaceutical interventions such as social distancing has been well-studied for preventing virus spread (93-95). Empirical analysis utilizing airline travel revealed the significant influence of international air travel on the progress of influenza outbreaks, as well as the impacts of domestic air travel on the evolution of disease spread across the United States (94). Later on, studies utilized more comprehensive mobility data to investigate the influence of mobility patterns and travel restrictions on containing the epidemic spread (10, 95). As one of the major non-pharmaceutical interventions, social distancing is considered 19 an effective way to reduce COVID-19 infections, especially in the pre-vaccine period. Researchers have highlighted the important role of social distancing in disease prevention through modeling and simulation (96-99). The simulation models assume a level of compliance based on the generated synthetic population (100), estimated contact patterns using survey data (101, 102), or collect people?s behavior reactions through dedicated surveys (103). Furthermore, artificial intelligence (AI) techniques, along with big data, have also been largely applied in several different aspects of managing the COVID-19 pandemic, such as early detection and diagnosis, monitoring the treatment, contact tracing of individuals, and projection of case and mortality (104, 105). The lack of timely contributions from real-world observations became apparent at the beginning of the pandemic as the studies tried to model the evolution of the outbreak. Many companies such as Google, Apple, and Cuebiq started to produce valuable information about mobility and economic trends (106- 108). These analyses mainly focus on a single indicator of the mobility aspect such as distance traveled or visitations to various business sectors. 2.4. Disaster Evacuation Behavior Analysis There is a wide range of research studies focused on various types of disasters. In this section, I mainly focus on evacuation behavior studies. Several studies reviewed the literature on evacuation behavior (109), evacuation modeling (110), and common transportation practices during evacuation (111). Many studies focused on a specific disaster or set of disasters to analyze the important factors in evacuation behavior, evaluate the disaster planning and preparation, or 20 assess disaster management and logistics. Collier et al. (2019) studied major transportation and logistics issues and summarized lessons learned from the two major hurricanes in the U.S., Hurricane Katrina, and Hurricane Harvey. In their study, they provided recommendations for future hurricanes considering the evacuation planning, information provision, infrastructure management, and disaster preparation aspects (112). Simulation models are widely used in disaster planning and management studies (113-119). Feng and Lin (2019) used a hurricane-prediction demand generation model in a fast agent-based modeling framework calibrated with traffic observations to study evacuation during Hurricane Irma (116). The evacuation behavior studies traditionally relied on surveys (120-124). These post-hurricane surveys are traditionally used to collect information regarding various evacuation decisions i.e., evacuating or not, departure time of the evacuation, destination choice, primary travel mode used for the evacuation, route choice, and reentry time decisions (125, 126). For instance, Kontou et al. collected telephone survey data from commuters affected by Hurricane Sandy and estimated a hazard- based model to identify the parameters that affect the duration of commute behavior changes (124). Wong et al. collected an online survey from individuals impacted by Hurricane Irma and studied their evacuation behavior. In their study, their summarized descriptive statistics and developed statistical models for various decisions made during Hurricane Irma (120). Although these surveys are usually rich in terms of recording evacuee?s decisions and revealing their preferences during the disaster, such surveys are costly, implemented for a small number of respondents, time-consuming, and not capable of providing real-time information. 21 With the increasing availability and popularity of big data, new approaches are now available for studying long-lasting questions. Robinson et al. identified two main challenges in studying disaster evacuation; the first was the complexity of human behavior and the second was data deficiency for traffic information and household decisions (127). Both issues can be resolved to some extent by utilizing MDLD. MDLD does not provide detailed individual-level information, but with its significant sample size, and proper data processing, it can reveal valuable information for many critical evacuation-related behaviors. Besides the larger sample size, MDLD has other advantages over traditional surveys. First, the phenomenon known as the observer effect (128), which suggests that individuals may modify their behavior when being observed or studied can be addressed by MDLD due to its passively collected nature. Passively collected data capture the normal behavior of subjects, free of any study- related observer error. The second is related to known survey design errors such as sampling error, measurement error, response error (129), and survey response biases (130). Even though MDLD may have its own biases (such as bias toward higher- income populations) and errors (such as inaccurate sightings), it records the actual behavior of subjects, not recalled or stated behavior. The third aspect is specific to disaster-related surveys. Surveying individuals about traumatic events may sometimes be undesired for the respondents. Passive data collection does not put any emotional burden on the respondents. Considering all the advantages of the MDLD, more recent studies are taking advantage of big data for evacuation behavior studies. Social media data was among the first MDLDs that has been utilized in the evacuation analysis. Kumar and 22 Ukkusuri (2018) utilized geo-tagged tweets from New York City at the time of Hurricane Sandy to study the evacuation behavior of affected residents (131). Their study showed a strong relationship between social connectivity and the decision to evacuate. Roy and Hasan (2021) collected Twitter data related to Hurricane Irma and developed a Hidden Markov framework to model the dynamics of hurricane evacuation and infer evacuation decisions (132). Wang and Taylor (2014) also used Twitter data to study the correlation between movement patterns under steady-state and perturbed state during Hurricane Sandy (133). Compared to the social media generated data, LBS data has a higher penetration rate and smaller demographic biases. However, the application of LBS data in evacuation studies remains very limited. Yabe et al. (2020) collected LBS data for five disastrous events (1.9 million devices in total) to study recovery patterns at the macroscopic population level and showed similarity in recovery patterns of these events despite differences in population characteristics (134). 23 Chapter 3: Data The emergence of mobile device location technologies such as cellphone, GPS, and LBS made MDLD a prominent asset in various application areas including human mobility behavior analysis. This section describes the methodology for assessing raw location data quality. A typical MDLD record from LBS technology contains information about timestamp, anonymized device ID, location of the device (latitude and longitude coordinates), a measure of spatial accuracy. In some cases, additional information such as the device operating system (OS) and time zone offset of the position of the device are also provided. A synthetic sample of data is provided in Table 2 to demonstrate the raw data. Entries presented in Table 2 are modified to preserve privacy. Table 2. A synthetic sample of LBS data Location Time Device Timestamp Device ID Latitude Longitude Accuracy Zone Type (m) Offset 1504068337 e07941996a2ffd303021914 1 28.4302 -81.6065 5 -14400 1504068342 e07941996a2ffd303021914 1 28.4303 -81.6053 25 -14400 1504068351 e07941996a2ffd303021914 1 28.4302 -81.6042 5 -14400 1504068360 e07941996a2ffd303021914 1 28.4305 -81.6046 100 -14400 1505096982 F258069021658ssd132548e 0 28.4313 -81.6037 5 -14400 In some cases in the raw data, because of privacy protection, the location information may be reported in an aggregated or transformed form. 24 3.1. Data Cleaning and Preprocessing Although mobile device location datasets are rich in terms of spatio-temporal characteristics, certain treatments and data cleaning steps are needed before extracting any information from the data. Removing outliers, checking for potential consistency issues in the data (e.g. unreasonable high-speed records), identifying duplicate observations for the same device, and merging them are among the state-of-practice methods for cleaning raw data and controlling its quality. The data cleaning approach proposed in this study first investigates the four well-known aspects of the data quality assessment framework: consistency, accuracy, completeness, and timeliness (135). To ensure the consistency of the data, certain semantic rules have been defined such as integrity constraints, to be checked through the entire raw data. At this step, all data entries are evaluated to identify observations with invalid values. For example, the latitude and longitude information of a location should follow a reasonable range, so integrity constraint removes all records with invalid entries. The other check is to identify duplicate records to reduce data redundancy and size to facilitate the computational process. Since one device should only be present in no more than one location at the same time, this procedure keeps only one data entry with the highest spatial accuracy at a certain time for one device. Accuracy is another important dimension of data quality assessment, covering both syntactic and semantic accuracies. The semantic accuracy evaluates the closeness of a value to its real-world observation while syntactic accuracy ensures the closeness of a value to the elements of its corresponding definition domain. In this application, a 25 spatial accuracy of 50 meters indicates that the device should be within 50 meters of the reported location with a certain confidence interval, for example, 95%. Thus entries with extremely poor spatial accuracy (i.e. location accuracy attribute of higher than 2 miles) are removed from the dataset based on the semantic accuracy rule. The completeness aspect requires prior knowledge of the actual movement patterns and mobile device usage, which is not available in this application. Therefore, this dimension has not been incorporated into the data cleaning procedure. For the timeliness dimension due to the timely nature of the applications introduced in this dissertation, an attempt is made to consider it by incorporating daily feeds of location in the data pool. 3.2. Data Summary After conducting data cleaning and preprocessing checks on the raw data, the cleaned data covers more than 270,000,000 Monthly Active Users (MAU) for February 2020 representing movement information across the nation. Figure 2 depicts the coverage of the raw sighting data at different geographical levels. 26 (a) Device sampling rate at the county level (b) Device sampling rate at the state level Figure 2. Device sampling rate for the month of February 2020 (a) at the county level, (b) at the state level 27 Figure 3 demonstrates the heatmap of sighting density for the continental U.S (136). Figure 3. The density map of anonymized location data across the nation (brighter shades represents a higher density of sightings) 28 Chapter 4: Deducing Device- and Trip-Level Information This section describes the methodological advances this dissertation proposes to enhance extracting device- and trip-level information from large-scale mobile device location data sources. 4.1. Home and Work Location Identification Due to privacy protection, the mobile device location datasets are generally anonymized and do not contain any personally identifiable information (PII). Therefore, researchers should develop home and work location identification algorithms to add context to the extracted information from the MDLD. In this dissertation, a behavior-based method has been proposed that evaluates the temporal patterns of places observed for every device and ranks the frequently visited location to identify the home and work location at a monthly cadence. To efficiently process the tremendous amount of MDLD, the algorithm utilizes the geohash notion, a public domain geocode system that encodes a geographic location into a short string of letters and digits, to aggregate the latitude and longitudes into candidate clusters for significant activity location. Geohash cell dimensions vary with the latitude of the location. Table 3 summarizes geohash sizes at the equator. 29 Table 3. Geohash cell dimensions at the equator Geohash Geohash Width Height Width Height string length string length 1 5,009.4 km 4,992.6 km 7 152.9 m 152.4 m 2 1,252.3 km 624.1 km 8 38.2 m 19 m 3 156.5 km 156 km 9 4.8 m 4.8 m 4 39.1 km 19.5 km 10 1.2 m 59.5 cm 5 4.9 km 4.9 km 11 14.9 cm 14.9 cm 6 1.2 km 609.4 m 12 3.7 cm 1.9 cm Considering the location uncertainty of sightings and activities conducted near the home location, the algorithm identifies the significant activity location in a bi-level approach. First, home and work locations are identified at the level-6 geohash to minimize the effect of the noises, and then to derive a more precise representation of the home and work locations, the algorithm searches for the best candidate at level-7 geohash cells within the identified level-6 geohash. As suggested in the literature, people spend most of their time, especially nighttime, at home and some fixed and regular hours during daytime at the workplace. To determine the nighttime, the time activity pattern from American Time Use Survey (ATUS) has been reviewed. According to 2017, 2018, and 2019 ATUS, more than 80% of full-time and part-time workers, who are observed to visit home at least once during the survey day, stay at home during the 21:00-5:59 period. Therefore, the nighttime window is defined as 21:00-5:59. Identifying home location at geohash level-6 follows the following steps: ?????? ?? ???????? ???? 1) Observed on at least max {3, ??????? ( ) + 1} days; 2 30 2) Observed on average more than ? (? 2) hours daily; 3) Sort the home candidates by observed number of days, average daily number of observed hours, and average number of hourly sightings; 4) Keep 3 top-ranked home candidates and sort them by observed number of nights, average daily number of observed nighttime hours, and average number of hourly sightings during nighttime; 5) Select the top-ranked level-6 geohash as the home location; in case of need for a tie-breaker, select based on step 3. The first 2 rules were implemented to ensure the minimum quality needed for keeping a device in our data pool. Once the home location has been identified at geohash level 6, the best level-7 geohash candidate selects based on the following rules: 1) Filter observations for all corresponding level-7 geohashesh within the identified level-6 home geohash; 2) Sort the level-7 geohash candidates by observed number of days, average daily number of observed hours, and average number of hourly sightings; 3) Keep 3 top-ranked candidates; 4) Sort the home candidates (level-7 geohashes) by observed number of nights, average daily number of observed nighttime hours, and average number of hourly sightings during nighttime; 5) Select the top-ranked level-7 geohash as the home location; in case of need for a tie-breaker, select based on step 2. 31 The objective of work location identification is to determine an individual?s major work location that is not the same as their home location. Therefore, level-6 geohashes that are not one?s home geohash have been considered. In addition, the algorithm introduces a temporal similarity ratio on top of the commonly used attributes in behavior-based methods such as the frequency of visits, dwell time, and regularity. The motivation for utilizing the temporal similarity ratio is two- fold. First, since the algorithm is adopting geohash grid-based geocode system instead of a spatial or spatio-temporal cluster of sightings due to computational efficiency, in case a device dwells around the borders of geohash zones, a neighboring geohash zone can record frequent observations. This one or more than one neighboring geohashes ? twin zones- could become a competitive candidate for the actual workplace zone in terms of visiting frequency, duration, and regularity. Second, although a minimum commute distance may seem to be an intuitive alternative to address the aforementioned issue, selecting a universal minimum distance may compromise workplaces that are close to one?s identified home location. Based on the assumption that one shall commute from home to work and work for consecutive hours before arriving back home, the temporal similarity ratio imposes a condition that home and work location shall not be frequently observed at the same hours. Hence, the temporal similarity ratio is defined as follows. For all the unique hours when a workplace candidate was observed during the month, i.e. ?? for candidate i, count the number of unique hours overlapping with all the unique hours when the imputed home location was observed ?. The ratio between the overlapped 32 hours and the total number of hours in ?? is then calculated. The ratio, referred to as temporal similarity ratio S, measures the temporal similarity between home and workplace observations. The formula is given as follows. |?? ? ?| ?? = (1) |??| In an ideal case where the daily location observations are complete for one device 2 with a fixed workplace, the ratio should be considering the ?????? ?? ????? ???? ????? departure time of the commute and when the commute time is shorter than one hour, and zero when the commute time is longer than one hour. However, considering that the complete location observation is not available for most of the devices in MDLD throughout the month, imposing a small temporal similarity ratio would lead to exclusion of actual work locations. To address this, the algorithm is designed to favor work candidates with smaller temporal similarity ratios while imposing a maximum temporal similarity ratio threshold to exclude the inefficient large ratios to distinguish between the actual work location and the twin zones of home location. The algorithm identifies level-6 geohash work location based on the following rules: ?????? ?? ???????? ???????? 1) Observed on at least ??? {3, ??????? ( ) + 1} 2 workdays; 2) Observed on average more than W (? 2) hours daily; 33 3) Sort the work candidates by observed number of workdays, average workday number of observed hours, and average workday number of hourly sightings; 4) Keep the three top-ranked candidates 5) Calculate temporal similarity ratio, S, following equation (1); 6) Sort the three work candidates (level-6 geohashes) by similarity ratio in the ascending order; 7) Select the top-ranked level-6 geohash with a similarity ratio smaller than the maximum temporal similarity threshold as the work location. Once the work location is selected at level-6 geohash, for a more precise representation of work location, the following set of rules are defined to search for the best level-7 geohash candidate among all the level-7 geohashesh within the identified level-6 geohash work location. 1) Start from all the corresponding level-7 geohashes within the level-6 geohash workplace; 2) Sort the level-7 geohash candidates by observed number of workdays, average workday number of observed hours, and average workday number of hourly sightings; 3) Select the top-ranked level-7 geohash as the work location. There are two major parameters to be calibrated in the introduced algorithm, the minimum observed daily hours for home, H(? 2) hours, and workplace, W(? 2) hours. To calibrate the H parameter, the Pearson correlation between the county- level number of imputed residents and the population over 16 reported by the 34 American Community Survey (ACS) (137) is calculated for different values of H (see the dark green line in Figure 4). For workplace calibration, the Pearson correlation between the county-level number of imputed commuters and the number and the number of workers reported by Longitudinal Employer Household Dynamics (LEHD) Origin Destination Employment Statistics (LODES) (138) is calculated for different combinations of H and W (see the black dotted line in Figure 4). Figure 4 implies that increasing the minimum observed hours for home and work leads to a decrease in the Pearson correlation. Therefore, the combination of two for H and two for W is selected to yield the best performance in imputing home and work location identification. Correlation Test at County-level Between unweighted resident estimates and ACS population estimates Between unweighted commuter estimates and LODES worker estimates 0.97 0.96 0.95 0.94 0.93 0.92 H 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 W 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 Minimum daily observed hour parameters for home (H) and work (W) Figure 4. Calibration results for selecting the number of minimum observed hours. 35 Pearson Correlation In addition to the minimum observed hours, two reasons lead to selecting the maximum temporal similarity ratio of 0.6. First, the workplace should be observed for at least one specific hour in each visit excluding the home location observations besides the two shared observed hours during the two commute trips (with the consideration of short commute trips and departure time of commutes). Second, a sensitivity analysis regarding the maximum threshold was conducted considering the county-level Pearson correlation between the imputed number of workers and the reported number of workers in LODES (see the dark green line in Figure 5) and the percentage of devices with imputed workplace over devices with identified home (white bars in Figure 5). Figure 5 shows that by increasing the similarity ratio parameter, the Pearson correlation decreases with a platoon between 0.2 and 0.6 while the number of devices with imputed work location increases at a steady pace. Therefore, 0.6 as a similarity ratio balances the tradeoff between Pearson correlation and avoiding failing to identify the work location for many devices. 36 Correlation Test at County-level Percentage of unweighted residents with primary workplace identified Correlation between unweighted commuter estimates and LODES worker estimates 0.970 60% 0.969 50% 0.968 40% 0.967 0.966 30% 0.965 20% 0.964 10% 0.963 0.962 0% Similarity Ratio Figure 5. Sensitivity analysis in temporal similarity ratio using MDLD 4.1.1. Comparisons with Alternative Home Identification Algorithms In addition to the proposed method, this study examines two alternative home identification algorithms and compares their performances using a mobile device location sample dataset. The first algorithm (referred to as the ?nighttime method? in the following context) is a widely used state-of-the-practice method, which identifies the home location as the place with the highest observed hours from 6 p.m. to 7 a.m (139). The second alternative is a conservative method (referred to as the ?all-day method?). It first applies a strict filter to the home candidates. Each level-7 geohash 37 Pearson Correlation 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Percentage candidate must be observed for at least 14 days and at least 60 distinct hours within the study month. Then, it identifies the home location as the level-7 geohash with the highest observed hours. When a tie exists, the level-7 geohash with the most sightings is selected. The results show that the nighttime method yields the most imputed residents, i.e., 74% of all the devices in the raw data, followed by the proposed method (12%) and the all-day method (8%). Next, the county-level Pearson correlations between the imputed residents and the ACS population are 0.966 for the nighttime method, 0.969 for the proposed method, and 0.962 for the all-day method, where the proposed method slightly outperforms the other two approaches. Moreover, the distances between the home locations imputed from the three methods are calculated and summarized in Table 4. Each column is based on the imputed home locations of the same imputed residents shared by two methods. It can be observed that the discrepancy between the home location starts at 90th-percentile for the nighttime to proposed comparison, and 95th-percentile distances for all three cases are smaller than 1 mile. Although the nighttime method yields similar home locations to the all-day method for their shared imputed residents, the distances between its imputed home locations and the proposed method?s home locations are the largest. By jointly considering the sample size reduction, the Pearson correlation to the ground truth population, and the differences in the imputed home locations, the proposed method yields the overall best results. 38 Table 4. Descriptive statistics on the distances between the imputed home locations Measure Nighttime to All- Proposed to All- Nighttime to Proposed (Miles) Day Day Mean 4.46 1.90 1.40 75% 0 0 0 90% 0.07 0.00 0.00 95% 0.85 0.09 0.29 99% 30.09 17.66 17.26 Max 5892.71 5098.57 4972.37 To further dig into the comparisons, Figure 6 shows the scatter plot of the county- level resident estimates between each of the introduced algorithms and the ACS. 12 10 8 6 4 2 0 MDLD Imputed Residents (Unweighted) Nighttime Proposed All-day Figure 6. County-level resident estimates from different methods and ACS Figure 7, Figure 8, and Figure 9 demonstrate the 90th-percentile, 95th-percentile, and 99th-percentile distances between the home locations imputed from each pair of the three introduced algorithms at the county level, respectively. Each figure first 39 ACS Millions displays the entire three-dimensional scatter plot, followed by a zoom-in plot. All distances are measured in mile. (a) All counties (b) More zoom-in plot Figure 7. County-level 90th-percentile distances between the imputed home locations 40 (a) All counties (b) More zoom-in plot Figure 8. County-level 95th-percentile distances between the imputed home locations 41 (a) All counties (b) More zoom-in plot Figure 9. County-level 99th-percentile distances between the imputed home locations 42 4.1.2. Home and Work Location Identification Validation Since the mobile device location dataset used in this research does not contain any ground truth information on the home and work locations, the identified daily life centers are validated against the ground truth population and employment statistics. With the calibrated parameters, the MDLD sample devices are aggregated at the county level based on the imputed home locations for further analysis. The spatial distribution of the unweighted MDLD resident estimates is compared with that of the 2019 ACS 5-year population estimates (137) in Figure 10, which shows similar spatial distributions estimated from MDLD and ACS with a Pearson correlation of 0.970. 12 10 8 6 4 2 0 Unweighted MDLD Imputed Residents Figure 10. County-level resident estimates comparison between MDLD and ACS Similarly, the MDLD sample devices with both imputed home and fixed workplaces are considered normal commuters and are aggregated at the county level based on the imputed home locations. The ground truth data from the 2019 LODES estimates 43 ACS Population Millions (138) and 2019 ACS 5-year estimates (140) have been adjusted to the 2020 estimates with a national-level population inflation factor of 1.005. The spatial distribution of the unweighted commuter estimate is then compared with the two ground truth datasets in Figure 11. Figure 11 shows similar spatial distributions of unweighted normal commuter estimates from MDLD and ACS with a Pearson correlation of 0.969 and from MDLD and LODES with a Pearson correlation of 0.967. 5 4 3 2 1 0 MDLD Normal Commuters (Unweighted) ACS LODES Figure 11. County-level normal commuter estimates from MDLD, ACS, and LODES Next, the commuting flow estimates from MDLD are validated against 2019 LODES (138) and 2015 ACS 5-year commuting flow estimates (141). Following the spatial resolution of the ACS estimates, the MDLD and LODES estimates are aggregated at the county level from level 7 geohashes and census block groups, respectively. Due to the differences in the data collection and coverage, the two ground truth data sources 44 Millions can produce distinctive results for some queries (142). Related to the commuting origin and destination (OD) pairs, the two data products have different home and work location definitions. In the ACS data, the work location is provided by the survey respondents as the specific work address during last week. On the other hand, the work location in the LODES data is reported by the employers which can be an administrative address instead of the actual worksite. Meanwhile, the residence location in the LODES data is based on a residence synthesizer and can be outdated if a worker moves during the year. As a result, there are 815,941 unique OD pairs from LODES and only 135,904 unique pairs from ACS. Figure 12 compares the commuting flow estimates for the shared OD pairs between LODES and ACS. It can be observed that the ACS data have higher estimates due to fewer unique pairs. 5 4 3 2 1 0 0 1 2 3 4 5 Millions ACS Figure 12. County-level commuting flow estimates from ACS and LODES 45 LODES Millions In Figure 13, the commuting flow estimates from MDLD are compared to LODES estimates. From MDLD, there are only 120,458 unique OD pairs, which may be due to the fact that the estimates are from a relatively short period of time (January 2020). In addition, the MDLD home and work locations are imputed based on the actual location observations, which are more similar and consistent with the definitions in the ACS data. With all that considerations, a similar trend is observed for the shared OD pairs with a Pearson correlation of 0.951. 4 3 2 1 0 MDLD (Unweighted) Figure 13. County-level commuting flow estimates from MDLD and LODES Figure 14 further compares the unweighted commuting flow estimates from MDLD and ACS estimates. It can be observed that the MDLD estimates share a closer pattern with ACS estimates. The Pearson correlation is 0.965 for this comparison. 46 LODES Millions 5 4 3 2 1 0 MDLD (Unweighted) Figure 14. County-level commuting flow estimates from MDLD and ACS In addition, the commuting distance distributions from the three sources are compared in Figure 15. The commuting distance is calculated as the mileage between the centroids of the home and work counties for consistencies. In general, the MDLD distribution has very similar patterns to the ACS estimates while both of them have higher estimates for shorter distance bands. It suggests that LODES observes more long-distance OD pairs and more long-distance commuters than ACS and MDLD, which can result from the aforementioned definition for home and work locations. 47 ACS Millions 0.8 0.06 0.7 0.05 0.04 0.6 0.03 0.5 0.02 0.4 0.01 0.3 0 [50, 75) [75, 100) [100, 150) [150, 300) [300, Inf) 0.2 0.1 0 [0, 5) [5, 10) [10, 25) [25, 50) [50, 75) [75, 100) [100, [150, [300, Inf) 150) 300) ACS LODES Unweighted MDLD Est. Figure 15. County-level commuting distance distribution In summary, the validation results demonstrate the reliable performance of the proposed home and work location identification algorithms. 4.2. Tour and Trip Identification Trips are the unit of analysis for almost all transportation applications. Traditional data sources, such as travel surveys, record the details of trip information. The mobile device location datasets, on the other hand, do not directly provide trip information. Location sightings can be continuously recorded while a device moves, stops, or stays static. However, these changes in status are not recorded in the raw MDLD. As a result, researchers must rely on trip identification algorithms to extract trip information from raw data. 48 Density While the literature review and practice scan reveal many methods to identify trips, a key issue complicates the trip identification process and affects the accuracy and credibility of the algorithms, which is ignorance of the difference between linked and unlinked trips. Existing trip identification methods can only identify unlinked trips but not linked trips. For instance, a single transit commute trip with longer than five minutes of waiting at the origin and transfer transit stations would be identified as three unlinked trips with existing methods: (1) a walking trip from home to the origin transit station; (2) a transit trip from the origin transit station to the transfer station; and (3) another transit trip from the transfer station to the final destination. However, for the purpose of tracking individual mobility behavior, the tour and linked trip notions would provide additional useful information to enhance monitoring of the mobility behavior of individuals. Additionally, being able to determine the tour and linked trip information provides a great opportunity to compare the statistic derived from MDLD with traditional travel surveys more accurately. Also, the tour information can be utilized to improve the current travel mode imputation algorithms using MDLD data. It should also be noted that the tour-based approach is necessary to identify the true origins and destinations of long-distance trips. Figure 16 illustrates how the proposed tour-based algorithm can be used to link the trips together. The four unlinked trips from Figure 16 (a), i.e., a driving trip from home to the metro station (O1 to D1), the first leg of a metro trip to the transfer point (O2 to D2), the second leg of the metro trip on another metro line (O3 to D3), and a walking trip to the work location (O4 to D4), form one linked trip from home to work 49 in Figure 16 (b). The linked trip from home to work and an additional linked trip from work to home construct one complete home-based-work tour in this case. D D 4 D O 3 4 O D 2 1 O 3 D O 2 O 1 (a). Multiple Unlinked Person Trips (b). One Linked Person Home-to-Work Trip Figure 16. Tour identification and trip chaining demonstration 4.2.1. Home-based Tour Identification The algorithm requires devices? identified home locations as input. The home-based tour identification processes a device?s locations every day, from 4 a.m.-3:59 a.m. the next day, or ?trip day?. All sightings between two at-home observations will be considered as a home-based tour. As long-distance trips demonstrate distinct spatio- temporal characteristics compared to short-distance trips, the tours are classified based on their distance feature. Long-distance tours are defined as tours in which a device is observed equal to or more than 50 miles away from its home location. To be consistent with the common practice in travel surveys, the device starts and ends the trip day at home. In the next step, the sightings of each device are separated into two groups: sightings on short-distance tours and sightings on long-distance tours. Finally, 50 short-distance tours go through a daily short-distance trip identification and long- distance tours go through a monthly long-distance trip identification. 4.2.2. Trip Identification for Short-distance Tours It is possible that some sightings do not belong to any trips (i.e. stationary points). For each sighting within the same tour, a recursive algorithm based on the decision tree model is utilized to identify if the sighting is stationary or moving. The decision tree considers six attributes, i.e. the great circle distance, time interval, and speed between the current sighting and the previous and next sightings. The decision tree has three hyper-parameters: a distance threshold of 300 meters, a time threshold of 5 minutes, and a speed threshold of 3 miles per hour. The speed threshold is used to identify if a sighting is recorded on the move, and the distance and time thresholds are used to identify trip ends. The recursive algorithm checks every sighting to identify if they start a new trip or belong to the same trip as the previous sighting (Figure 17). If the previous sighting is not on a trip (i.e. a stationary sighting) the current sighting starts a trip if it has a speed faster than 3 mph to the next sighting. If the previous sighting is on a trip, the following rules are checked to identify if the current sighting belongs to the same trip, stops the trip, or starts a new trip: ? If a sighting has a speed greater than 3 mph from the previous sighting, the sighting belongs to the same trip as its previous sighting. 51 ? If a sighting has a speed slower than 3 mph from the previous sighting and is more than 300 meters away from the previous sightings, the sighting does not belong to the same trip as its previous sighting. If the speed to the next sighting is also slower than 3mph, the current sighting simply terminates the trip; otherwise, it becomes the start of a new trip. ? If a sighting has a speed slower than 3 mph from the previous sighting and is within 300 meters from the previous sighting, the cumulative dwell time for all the consecutive sightings meeting such criteria is computed and checked: 1) if the cumulative dwell time is less than five minutes, the current sighting belongs to the same trip, 2) otherwise, it terminates the trip if the speed to the sighting is slower than 3 mph or starts a new trip if the speed to the next sighting is faster than 3 mph. The algorithm may identify a local movement as a trip if the device moves within a stay location. To filter out such trips, all trips shorter than 300 meters are removed as a post-processing step. 52 Figure 17. Recursive trip identification algorithm for short-distance tours 53 4.2.3. Trip Identification for Long-distance Tours Trip identification for long-distance tours follows a different procedure due to the different nature of long-distance trips. To start, all device sightings on long-distance tours for the entire month are filtered. 4.2.3.1. Stop and primary destination identification A recursive trip identification, similar to that described in section 4.2.2, is applied, but with a larger time threshold of 30 minutes instead of 5 minutes, meaning that a trip ends only if the device stays in a location for more than 30 minutes. In this step, all the trip ends are identified and named as ?secondary stops?. Primary stops are then defined from the secondary stops. Primary stops on a long-distance tour are places where the device stays for a significant amount of time and/or from which the device makes local trips. In order to identify the primary stops, each secondary stop is checked against the following criteria: ? The duration of stay in the secondary stop is longer than two hours and during the stay, the device exits and reenters the secondary stop ? The duration of stay at a location is longer than 24 hours ? The secondary stop is the home location Furthermore, the primary destination of a tour is defined as the farthest stop that is located at least 50 miles away from the home location of the device. The primary destination is unique in each long-distance tour and is identified from the primary 54 stops. If no primary stop fulfills the requirement, the primary destination is then identified from the secondary stops. 4.2.3.2. Subtour identification A subtour is considered a segment of a long-distance tour that falls between two primary stops. Therefore, all sightings between two primary stops are considered to be on the same subtour. 4.2.3.3. Trip extraction If a long-distance tour does not have a primary destination or has the same primary destination as the identified work location, the short-distance trip identification algorithm (with a time threshold of five minutes) is applied to all the sightings in the tour. If a tour has a primary destination different from the fixed work location, the long-distance trip identification algorithm with a time threshold of 30 minutes is applied to sightings between two different primary stops, and the short-distance trip identification recursive algorithm with a time threshold of 5 minutes is applied to sightings around the same primary stop (local trips around a primary stop on a long- distance tour). Finally, all the tour, subtour, and trip information are consolidated to provide a complete travel diary of a device. The complete framework of long-distance trip identification is presented in Figure 18. 55 Figure 18. Trip identification framework for long-distance tour 56 4.3. Trip Mode Detection Following the trip identification algorithm, a framework has been proposed to impute the travel mode of the trips based on the characteristics of the trip (143). The major contribution of this proposed algorithm is to combine the advantages of a single-layer model and deep neural network to accurately detect the travel mode of the trips. 4.3.1. Data Collection for Travel Mode Imputation A ground truth dataset with true labels is required to train the proposed supervised learning algorithm. This study used smartphone GPS survey data collected from 300 Washington D.C. urban travelers through a smartphone application that records trips for each survey subject. The survey app functions are illustrated in Figure 19: ? GPS location tracking: the app automatically records users? location information. The frequency of recording was automatically adjusted based on whether the user was moving or static in order to save battery consumption. Typically, the time interval between two location records was 30 seconds when users were moving and between 10 to 30 minutes when users were static depending on the battery status. ? Opt-in trip information survey: the app periodically popped up survey questions to record trip purposes and the travel modes for the users? recorded trips. This information was verified by a follow-up travel diary survey and used as the ground-truth travel mode dataset with labels to train the mode detection model. 57 ? Data uploading: for the sake of battery and cellular data usage, the app did not automatically upload data to the online database unless the device was plugged in and connected to a Wi-Fi network. Alternatively, the user could manually upload survey records by pressing the button ?Press to Upload? Figure 19. The user interface of the smartphone GPS data survey app A total of 1009 validated trips were specified with travel mode information. Of these 1009 trips, 19.3% were auto trips 15.9% were bus trips, 52.9% were metro or rail trips, and 11.9% trips were walk/bike trips. Since the survey was targeted toward urbanized areas, a higher percentage of metro and bus trips were captured. This additional bus and rail evidence helps to enhance the understanding of their characteristics and improve the goodness-of-fit of the model for those travel modes. 58 4.3.2. Construction of Classification Features Table 5 summarizes the trajectory features that are considered in this study. These features are selected to differentiate the modes as much as possible, For instance, the average speed can be used to distinguish walk mode from other modes. The maximum speed further helps differentiate walk trips from auto or bus trips that encounter severe traffic congestion making their average speed close to non- motorized trips. The overall data recording frequency can be utilized to identify metro trips as other travel modes typically do not suffer from significant GPS disruptions. Table 5. Trajectory features description Variables Descriptions Trip distance The trip distance is computed as the sum of the distances between two successive location points in this trip Trip time The difference between the timestamps of the trip start and the trip end. OD Euclidean distance The shortest Euclidean distance between the origin and destination of the trip Average speed The average speed is calculated as the trip distance divided by the trip time Max. instantaneous The maximum value in the set of instantaneous speeds speed directly collected by the smartphone app during the trip. Speed quantiles The 5th, 25th, 50th, 75th, 95th percentiles of speed are also calculated for each trip. Average data record The number of data points recorded during the trip divided by the trip time. In addition to these features, this study used the available metro, rail, and bus networks to construct additional features (Figure 20). In specific, the average distances to transportation networks were added as geographic features. From a 59 location point in a trip trajectory, the nearest metro and rail line was first identified using the network shown in Figure 20. Figure 20. Multimodal transportation network of the study area The shortest Euclidean distance for each trajectory location point in the trip is calculated then. These distances are then averaged to measure the average adjacency of the trip to the metro and rail systems. Similarly, the average distance to the nearest bus line network was calculated which is deemed essential in improving the accuracy of the mode detection. To comprehensively assess the network effect, the rail network was extracted from the National Transportation Atlas Database (NTAD). The General Transit Feed Specification (GTFS) bus shapefiles have also been collected from 31 60 regional and local agencies and bus services to construct the bus network. The predictive power of adding these network features is assessed in Section 4.3.4. 4.3.3. Model Structure This study proposes a mode detection algorithm based on a wide and deep learning approach as illustrated in Figure 21 (143). Trajectory features: (trip time, trip distance, OD Euclidean distance, average speed, maximum speed, speed quantiles) Input Layer Network features: (average distance to the nearest Metro line; average distance to the nearest bus line) L11 L12 L1m Generalized Linear Model Deep Neural Network Hidden Layers L21 L2n Walk Bus Rail Car Bike Output Layer Travel Mode Detection Figure 21. The wide and deep learning framework A generalized linear model and a deep neural network are jointly trained based on the features constructed using the passively collected data gathered from the survey. Because of the structure of the model, the model is capable of generalizing rules and memorizing specific exceptions at the same time which leads to a superior prediction accuracy compared to stand-alone generalized linear models, and stand-alone deep neural network (DNN) models. To further examine the performance of the proposed 61 model, benchmark ensemble models and Random Forest have also been trained for comparison purposes. All models were trained and fine-tuned using the TensorFlow platform in Python. Both trajectory features and network features are used in the Wide and Deep model. These features are all continuous and were normalized to the range of [0,1]. Two hidden layers in the DNN are illustrated in Figure 21 with m neurons and n neurons, respectively. The number of layers and the number of neurons in each layer can be fine-tuned. In the empirical test of this study, three hidden layers have been used and different numbers of neurons were also tested. Denoting y as the label for travel mode, x as the vector of prediction features, beta as the vector of model parameters, and b as the unobservable heterogeneity, the wide component of the model is formulated as a generalized linear model. In this case, a multinomial logit model is considered: exp (????? + ??) ??(? = ?) = (2) ?? exp(? ? ??? + ??) Where Y is the prediction, xy is a vector of d features for mode y, ? is a d- dimensional vector of model parameters, and b is the bias. Then a three-layer DNN has been specified as the deep component. The variables were fed into the hidden layers of the DNN to perform the following computation in each hidden layer (144). ?(?+1) = ?(?(?) ? ?(?) + ?(?)) (3) 62 Where a, ?, and b denote the activations, DNN parameters, and heterogeneity at the l- -th layer respectively. f denotes the activation function, which defines the output of the neuron node given an input. RELU (rectified linear units) has been used as the activation function, f(z) = max (0, z). In practice, the RELU function works robust and has a better computational efficiency in comparison with the other activation functions (144) although it is not differentiable when z = 0. The combination of the generalized linear model and the DNN represents a model of wide and deep learning that can be jointly trained using the weighted sum of the log-odds as the objective function. The prediction function for the wide and deep learning model is: ??(? = ?) = ?(????? + ? (??) ? ?(??) + ?) (4) where Pr denotes the prediction of the joint model, ??? denotes the vector of parameters for the linear model component, and ?(??) denotes the finalized parameters on the final activations of the DNN component, labeled as ?(??). ?(?) is the sigmoid function. Back-propagation of the gradients was employed to jointly train the model. Gradients were defined from the mode detection to the generalized linear model and the DNN hidden layers based on the weighted sum of the log-odds from both models (144). A number of optimization algorithms were tested to reach the optimal level of training loss and reasonable training time at the same time, including AdaGrad (145), RMSProp (146), and Adam Optimization (147). RMSProp seems to yield the highest 63 goodness of fit with acceptable computational efficiency. The models reported in this study were trained within 20~60 seconds on a regular Macintosh machine. AdaGrad algorithm employs adaptive learning rates with a decay factor (145). The rates can adapt to different gradients, which makes the algorithm suitable for high- dimensional problems. However, the descent of AdaGrad can be too fast and the algorithm can get trapped in a local optimum. RMSProp and Adam algorithms address the issue by introducing an exponential decay of past gradients, so that the most recent gradient will have a higher influence on the gradient used in the current iteration. These adaptive optimization algorithms are all tested in this research to compare their performance on the mode detection application. Finally, with a Random Forest model or a Wide and Deep model trained, a 10-fold cross-validation was conducted to test the performance. To ensure randomness and reasonable stability of the results, a subset of the dataset was randomly sampled using 10 random seeds, and then each subset was partitioned into ten equal-sized subsamples. In each fold of the 10-fold validation, one subsample was retained as the hold-out test sample, and the model was trained using the remaining nine subsamples. 4.3.4. Empirical Results Several state-of-practice and state-of-art algorithms including ensemble models (AdaBoost and Bagging have been tested, Bagging is reported in this section because of its better performance), Random Forest, generalized linear model, and wide and deep neural model (various optimizers have been tested, with AdaGrad and RMSProp 64 reported) were trained and compared using the collected dataset. The prediction accuracy of 10-fold cross-validation has been used to measure the performance of the candidate models. For each round of the validation, 10 random seeds were used to ensure the stability of the validation results. Grid search and random search have been used to fine-tune the hyper-parameters in the candidate models. Table 6 summarizes the performance measures of the models. The first finding is that the addition of multimodal network features has significantly boosted the model performance. Both the ensemble model and Random Forest have shown improved model prediction accuracy after the inclusion of network features. From the 10-fold cross-validation with 10 random seeds, the Random Forest model can get 89.6% of the travel modes in the testing data accurately detected. Also, the benchmark Random Forest model outperforms the Generalized Linear model, suggesting that rule-based generalization using features such as the maximum speed or the distance to nearby transit stations could play a significant role in travel mode detection. 65 Table 6. Goodness of fit measures for different travel mode detection models Model Total Average Average Loss Loss Accuracy Generalized Linear Model 26.0 0.299 0.867 Ensemble (Bagging, without network features) 104.4 1.060 0.755 Ensemble (Bagging, with network features) 84.0 0.860 0.804 Random Forest (RF, without network features) 52.2 0.600 0.808 Random Forest (RF, with network features) 17.4 0.193 0.894 Wide and Deep Model 6.7 0.076 0.957 (AdaGrad Optimizer, with network features) Wide and Deep Model (RMSProp Optimizer, 17.2 0.197 0.921 without network features) Wide and Deep Model (RMSProp Optimizer, with 4.0 0.045 0.976 network features) The wide and deep model combines the advantages of the DNN and the Generalized Linear Model, and can boost the prediction accuracy to above 95%. With 400 neuron nodes coded in the first hidden layer and a default optimizer, AdaGrad, the average prediction accuracy of the model reaches 95.7%. Equivalently, the reduction of prediction errors achieved by using a joint Wide and Deep model is more than 50%. The best Wide and Deep model with RMSProp optimizer can reach 97.6% prediction accuracy. A deeper look at the confusion matrices (Table 7) offers more insights into the performance of the model. The sums of rows and columns may differ due to the random sees used. In total, a comparison of four models, RF and Wide and Deep with and without network features was conducted. From the confusion matrix, the prediction accuracy for each mode can be evaluated separately. For instance, the first row of Table 7 66 suggests that 195 car trips were reported in the testing dataset while 135 of them were classified correctly by the RF model without network features. Table 7. Confusion matrix comparison of RF model and the wide and deep learning model RF without network 10-Fold Cross-Validation: Detected Travel Mode features Car Metro Bus Walk Recall: Reported Car 135 34 23 3 69.2% Travel Metro 23 479 25 7 89.7% Mode Bus 23 42 90 5 56.3% Non- 92.5% 1 4 4 111 motorized Precision: 74.2% 85.7% 63.4% 88.1% 80.8% RF with network 10-Fold Cross-Validation: Detected Travel Mode features Car Metro Bus Walk Recall: Reported Car 181 4 7 3 92.8% Travel Metro 7 507 13 7 95.0% Mode Bus 15 43 101 1 63.1% Non- 0 5 2 113 motorized 94.2% Precision: 89.2% 90.7% 82.1% 91.1% 89.4% Wide-Deep, without 10-Fold Cross-Validation: Detected Travel Mode network features Car Metro Bus Walk Recall: Reported Car 172 8 13 2 88.2% Travel Metro 8 508 16 2 95.1% Mode Bus 11 14 132 3 82.5% Non- 0 2 1 117 motorized 97.5% Precision: 90.1% 95.5% 81.5% 94.4% 92.1% Wide-Deep, with 10-Fold Cross-Validation: Detected Travel Mode network features Car Metro Bus Walk Recall: Reported Car 194 1 0 0 99.5% Travel Metro 0 525 8 1 98.3% Mode Bus 1 10 149 0 93.1% Non- 1 1 1 117 motorized 97.5% Precision: 99.0% 97.8% 94.3% 99.2% 97.6% 67 By adding the network features, the precision and recall accuracies were significantly increased. Overall, one of the benchmark models, Random Forest with network features, did a decent job in detecting car, Metro, and non-motorized modes. However, the precision of detecting bus mode still falls short. Comparing the Random Forest with the Wide and Deep model, it is clear that the latter did extremely well in the detection of Metro and bus trips. Even without the network features, the Wide and Deep model can get to a similar level of accuracy to the RF model with the network features. The Wide-Deep model without network features achieves a precision accuracy of 82.5% for the bus mode, compared to 56.3% in the RF model. By adding the network features to the Wide-Deep model, the precision/recall accuracies rocket to above 93%. It is worth noting that this study only conducted a standard grid search in combination with optimizers. By researching the fine-tuning of the joint model, the accuracy could be further improved. This could direct the path of future studies. 68 Chapter 5: MDLD in Action for Pandemic Studies Since the first case of the novel coronavirus disease (COVID-19) was confirmed in Wuhan, China, social distancing has been promoted worldwide, including in the United States, as a major community mitigation strategy. However, our understanding remains limited in how people would react to such control measures, as well as how people would resume their normal behaviors when those orders were relaxed. This dissertation proposes a framework to quantify the impact of COVID-19 on mobility and provide insights to analyze human mobility behavior throughout the pandemic (136, 148). 5.1. Methodology After cleaning the data, identifying the home and work locations, and extracting the trip information, based on the methodologies described in chapters 3 and 4, this study investigated the mobility behavior of communities throughout the COVID-19 pandemic. To fully leverage the near real-time mobility insights from the MDLD, two additional methodological steps were needed to be introduced. First, a weighting method that can convert the sample movements observed in the MDLD to population-level statistics. Next, introducing an index that could summarize different aspects of communities? mobility patterns into a single metric that could capture the impact of COVID-19 on mobility. 69 5.1.1. Weighting In spite of MDLD?s high penetration rate among the population, statistics derived from the MDLD still need to be weighted to represent population-level statistics. The devices available in the dataset are a sample of all individuals in the population, so it is necessary to consider device-level weights. In addition to the device-level weights, MDLD might only capture a sample of all trips conducted by the individuals in the data. Therefore, trip-level weights are also needed. As the goal of this study was to provide near real-time mobility statistics updates, a simple county-level device weighting has been applied to obtain weights for devices. To derive device-level weights, the home county for each device has been specified based on the identified home location. The weight for each device was calculated based on the number of devices observed in the device?s imputed home county divided by the population of the county, so all devices residing in a county would have the same device-level weights. For instance, if the sample includes 100 devices in a county with a population of 2,000, each device would be assigned a weight of 20. The population of each county has been obtained from the U.S. Census Bureau. For the trip level weights, the number of trips per person (trip rate) has been calculated for each state during an average weekday in the first two weeks of February 2020 from the sample with the assumption that the February travel behavior was not impacted by the COVID-19 pandemic. Then the trip rate number has also been calculated for each state from the most recent national household travel survey, 2017 NHTS. Then a state-level trip rate has been calculated by dividing the NHTS 70 trip rate by the observed trip rate during the pre-pandemic period. These weights are used for the entire study period. 5.1.2. Core Mobility Metrics After completing the extraction of population-level trips from MDLD, all information was summarized into several core mobility metrics that are critical for a better understanding of the national mobility pattern before and during the pandemic. Table 8 shows the list of metrics calculated at the county, state, and national levels. 71 Table 8. List of core mobility metrics calculated to capture the COVID-19 impact on mobility Current Metrics Description Percentage of residents staying at home (i.e., no trips % staying home more than one mile away from home) trips/person Average number of trips taken per person. % out-of-county trips The percent of all trips taken that travel out of a county. % out-of-state trips The percent of all trips taken that travel out of a state. Average person-miles traveled on all modes per person miles traveled/person per day (car, train, bus, plane, bike, walk, etc.) Number of daily work trips per person (where a ?work #work trips/person trip? is defined as going to or coming home from work) #non-work Number of daily non-work trips per person. (e.g. trips/person grocery, restaurant, park, etc.). 5.1.3. Social Distancing Index In addition to calculating the core mobility metrics, this dissertation explored the construction of a single index that could capture the mobility changes and portray individual efforts in social distancing by considering the various measurements of human mobility. To properly design the structure of the Social Distancing Index (SDI), the existing indices from various fields have been reviewed. There are two main types of indices: category-based indices and score-based ones. The category-based indices explain the proposed objective by categories. For example, the Pandemic Severity Index (PSI) classified the case fatality ratio (CFR) of disease into five categories (from one to 72 five) (149), and the Modified Mercalli Intensity Scale evaluates the severity of an earthquake by categorizing it into twelve levels from I to XII (150). On the other hand, score-based indices usually define a score from zero to one hundred to differentiate objectives and rank them in order. For example, the US. News State ranking creates a score that covers eight topics on people?s needs in each state and assigns different weights to those topics based on the survey data (151). Bloomberg Global Health Index is another score-based index that ranks countries in terms of healthiness by giving them a rate between zero and one hundred (152). In short, category-based indices are usually built upon a single variable and the score-based ones are more capable of integrating multiple metrics to be more informative. In this effort, SDI was designed as a score-based index, which gives a 0-100 score to each geographical area, e.g. a state or county, and measures to what extent area residents and visitors practice social distancing in terms of mobility aspects. Zero indicates no social distancing and one hundred indicates perfect social distancing compared with the benchmark days before the COVID-19 outbreak. The benchmark values for the core metrics are computed using data from the weekdays (Monday to Friday) during the first two weeks of February. Thereafter, the changes in people?s mobility patterns are captured by the percentage reduction of the corresponding metrics in Table 9 (noted as X2,?, X5) as input. The absolute changes in the percentage of residents staying home (noted as X1) also serve as input. The percentage reductions are absolute values between 0 and 100%. Any increase is standardized as 0% in the calculation. 73 Table 9. Descriptive statistics for the core metrics Index Metric Min Max Mean Median 26.1 1 % staying home 13.0 58.0 25.0 SD: 7.6 0.48 2 #work trips/person 0.14 1.49 0.46 SD: 0.18 2.64 3 #non-work trips/person 1.39 3.90 2.65 SD: 0.37 52.3 4 miles traveled/person 15.6 113.4 52.1 SD: 14.3 Out-of-county trips (in 5339 5 7 28845 3597 thousands) SD: 5299 By jointly considering the travel behaviors of region residents and visitors, the equation for computing SDI is given as follows: ??? = [(?1?1 + 0.01 ? (100 ? ?1) ? (?2?2 + ?3?3 + ?4?4)] ? (1 ? ?5) + (5) ?5?5 Where ?1 = 1 and ?2 + ?3 + ?4 = 1. The first part of the equation focuses on resident level and the second part on out-of- county trips. ?5 is thus the weight assigned to behavior changes regarding out-of- county trips. For the resident trips, we use the percentage of residents staying home to account for residents who do not make trips longer than 1 mile from home, so the weight is simply one (?1 = 1). For people not staying home (travelers), the percentage of which is 100-X1, I use a weighted sum of percentage reductions in the number of work and non-work trips made daily and the average distance traveled per 74 person. When individuals make more work and non-work trips, and travel longer distances, they are considered to practice less social distancing. The weights for each variable should sum up to one (?2 + ?3 + ?4 = 1) so that resident travelers are comparable to residents staying at home. To assign appropriate weights to each variable, both actual observations and conceptual guidelines are consulted. Firstly, the relative ratio between resident trips and out-of-county trips nationwide is about four to one. Hence, a weight of 0.2 was assigned to ?5. Secondly, it is widely observed that people have significantly reduced travel distances so the index should not give the large percentage reduction in distances traveled the same weight as the reduction in the number of trips. Meanwhile, the reductions in the number of trips are more informative with regards to people?s reaction to the stay-at-home mandates. Thus, the reduction in the number of trips is considered twice as important as that in distance traveled and a weight of 0.3 was assigned to ?4. Moreover, as suggested by government agencies, people are highly encouraged to reduce non-essential trips. Therefore, the index should be designed to factor in the reduction in non-essential trips, which is estimated twice as important as the reduction in essential trips. Work trips are intuitively considered essential trips and non-work trips could include both essential and non-essential. Based on the 2017 National Household Travel Survey (NHTS) Travel Profile (153), the traveler ratio between essential and non-essential non-work trips is approximately 1:2. Therefore, the relative ratio between the percentage reduction of work and non- work trips is 1:1.67. According to the constraint ?2 + ?3 + ?4 = 1, 0.25 and 0.45 were assigned to ?2 and ?3 accordingly. The SDI is eventually computed as follows: 75 ??? = [(?1 + 0.01 ? (100 ? ?1) ? (0.25?2 + 0.45?3 + 0.3?4)] ? 0.8 (6) + 0.2?6 It should be noted that the weights are partially determined by certain assumptions. For example, the reduction of trips is considered more important than the reduction of travel distances when measuring the social distancing strength. The sensitivity of SDI scores was evaluated as the relative weights between the trip and distance reduction estimates changed. It was observed that assigning a higher weight to the distance reduction estimates (?4) lead to larger absolute values and standard deviations of SDI scores. When ?4 = 1, the largest absolute values and standard deviations of SDI scores are observed. Although the magnitude of SDI scores has changed, both spatial and temporal trends stayed the same in general. Therefore, such changes in weight assignments shall not yield inconsistent inferences when comparing the social distancing practices between different regions and periods. 5.2. Results To add more context to the observed mobility changes during the COVID-19 outbreak, the mobility metrics are integrated with COVID-19 case data (154). 5.2.1. The effectiveness of the Social Distancing Index (SDI) The effectiveness and reasonableness of the proposed SDI were examined by reviewing its temporal change from February 2, 2020, to May 30, 2020, and the spatial variation by states for the entire nation (Figure 22). 76 Figure 22. Temporal changes of state-level Social Distancing Index The proposed SDI is sensitive to people?s behavior changes and is capable of reflecting the mobility changes accordingly. The SDI changes clearly indicate that people stay home more and travel less on weekends, especially on Sundays, and people traveled less on Memorial Day (May 25, 2020) compared with a normal Monday. During the study period, people practiced significantly more social distancing nationwide after President Trump declared a national emergency concerning the COVID-19 outbreak. The national emergency declaration immediately triggered people?s responses on weekdays beginning March 16 and on weekends of the following weeks: March 22, March 29, and April 5. In addition, the range of the index became wider after March 16, indicating that people from different states were having distinct responses to the national emergency announcement. After the week of March 23, a general plateau was observed in terms of social distancing practices. Beginning April 6, there was a tendency toward less social distancing in some states. One week later, a similar trend appeared across the entire nation. The possible reasons are twofold. First, people became less attentive to the 77 outbreak as the outbreak persisted at the time. Moreover, because of the widespread economic impacts of the pandemic, some people could no longer afford to maintain social distancing. As people reduce social distance measures, there was no significant slowdown in the number of reported COVID-19 cases. 5.2.2. State-level Mobility Pattern Changes Following the national emergency declaration, the mandatory stay-at-home orders issued by most states triggered a second wave of strengthened social distancing. This influence of government mandates on human behavior can also be seen when some states began reopening: states that chose to lift stay-at-home mandates early saw an acceleration in social distancing relaxation. The SDI is computed for all states for thirteen consecutive weeks from March 1 to May 30, 2020, in Figure 23. Five stages are defined based on the general trend from all states: pre-pandemic (before March 13), behavior change (March 13 to March 22), government orders and holding steady (March 23 to April 12), quarantine fatigue (April 13 to April 26), and partial reopening and stay-at-home order lifting (April 27 till the end of the study period). 78 Figure 23. Social Distancing Index heatmap for all states Figure 23 shows the level of SDI scores for all states during the study period. Each pixel in the graph indicates the level of social distancing for one specific state on a specific day, where blue stands for more social distancing practiced and red for less. The ?X? marker indicates the start date of state-wide stay-at-home orders. The ?O? marker indicated the order lifting date. The ?I? marker indicates the start date of state- wide partial reopening if different from the order lifting date. The states are sorted in descending order by their SDI scores on the last weekday (May 29, 2020). The top five regions that were performing more social distancing are the District of Columbia, Hawaii, New York, New Jersey, and Maryland, all of which issued stay-at-home orders. Meanwhile, the states practicing less social distancing are Wyoming, North Dakota, South Dakota, Arkansas, and Montana, most of which did not issue stay-at- 79 home mandates. One other consideration is that on the East and West Coasts, it is possible that people practice more social distancing because they were exposed to the infection risk for a longer period and were aware of higher infection risk with higher population density. In Figure 24, the cumulative number of confirmed cases on May 30 2020 for the top five and bottom five states were examined. After the stay-at-home orders were issued, all 10 states experienced an increase in SDI, but the bottom five states generally had lower scores of SDIs. This implies that the local severity of the COVID-19 outbreak played a significant role in people?s decision-making. Although all ten states experienced a decrease in SDI after April 13, a sharp decline was observed following the partial re-opening and/or stay-at-home order lifting in New York, Massachusetts, and Alaska. This implied that people in those states were willing to maintain more social distancing for a longer period, but the early reopening discouraged social distancing behavior. The influence of early reopening in Alaska appeared after two weeks when the increase in confirmed cases accelerated. Similar impacts of reopening can be observed in California, Montana, Oregon, and West Virginia, where the low level of SDI and increasing trend of confirmed cases raised concerns about a second local outbreak. In Figure 24, the blue dots stand for SDI scores on weekdays and the orange dots for SDI scores on weekends. The red triangular dots stand for the daily cumulative number of confirmed COVID-19 cases. The grey line stands for the start date of the 80 state stay-at-home order. The green line marks the stay-at-home order lifting date and the green dashed line marks the date of state partial reopening. Figure 24. Temporal changes of Social Distancing Index in the top five and bottom five states regarding the cumulative number of confirmed cases. 81 The Spearman?s rank correlation coefficient between the infection rates and the SDI scores for those ten states has also been evaluated for the entire study period. Table 10 summarizes the results. Since the SDI scores on weekends are systematically higher than those on weekdays, only the weekdays' observations were used to compute the correlation coefficients. Table 10. Spearman?s rank correlation coefficient between SDI and infection rate for the top five and bottom five states regarding the cumulative number of confirmed cases. Infection Rate Infection Rate Top five states Bottom five states Cumulative New Cumulative New New York 0.658 0.663 Hawaii 0.744 0.713 New Jersey 0.689 0.669 Montana 0.611 0.604 Illinois 0.573 0.582 Alaska 0.660 0.661 California 0.594 0.599 Oregon 0.619 0.594 Massachusetts 0.614 0.619 West Virginia 0.651 0.643 The cumulative infection rate is defined as the cumulative number of confirmed COVID-19 cases per thousand population, and the new infection rate as the number of new confirmed cases daily per thousand population. In Table 10, a stronger correlation was observed between SDI and new infection rate than that between SDI and cumulative infection rate, suggesting that people were paying close attention to the outbreak development and have been practicing less social distancing. The stronger correlation between SDI and new infection rates in Hawaii, New Jersey, Massachusetts, and New York implies that people in those states were more attentive during the pandemic compared to other states. Those states also 82 have a flatter curve of the cumulative number of confirmed cases at the end of the study period. 5.2.3. County-level Mobility Pattern Changes SDI is also informative at the county level. Figure 25 demonstrates the temporal changes of SDI for the top ten counties with regard to the cumulative number of confirmed cases on May 30, 2020. The counties in New York performed strict social distancing, which helped ?flatten the curve? of cumulative confirmed cases. The high levels of SDI in Middlesex County, MA, Wayne County, MI, and Hudson County, NJ have also slowed down the outbreak. However, a relaxation of social distancing was observed after the partial reopening and the expiration of stay-at-home orders. In the meantime, Los Angeles County, CA, and Philadelphia County, PA were among regions that needed to strengthen their social distancing practices as their SDI scores were lower than other counties in similar circumstances and their confirmed cases showed an increasing trend at a rapid pace. 83 Figure 25. Temporal changes of Social Distancing Index in the top ten counties according to the cumulative number of confirmed cases. The correlation between the infection rates and the SDI scores was also evaluated for the top ten counties with regard to the cumulative number of confirmed cases. Table 84 11 summarizes the results. In general, stronger correlations between the infection rates and the SDI scores were observed in the counties with higher SDI scores. Moreover, the counties with smaller correlation coefficients between SDI and new infection rates tended to have an increasing trend in the cumulative number of confirmed cases at the end of the study period. Table 11. Spearman?s rank correlation between SDI and infection rate for the top ten counties regarding the cumulative number of confirmed cases Top ten Infection Rate Top ten Infection Rate counties Cumulative New counties Cumulative New New York Westchester 0.734 0.746 0.709 0.721 County, NY County, NY Cook County, Philadelphia 0.590 0.608 0.695 0.655 IL County, PA Los Angeles Middlesex 0.636 0.651 0.708 0.705 County, CA County, MA Nassau Wayne 0.706 0.715 0.698 0.679 County, NY County, MI Suffolk Hudson 0.689 0.670 0.730 0.732 County, NY County, NJ 5.3. Summary and Discussion During the COVID-19 pandemic, data-driven tools that can proved insight into human mobility behavior have been of paramount importance. This dissertation introduced the real-world observation of human movements from MDLD, to study the impact of non-pharmaceutical interventions. By studying the travel behaviors of people across the United States, a score-based Social Distancing Index (SDI) was developed to capture people?s actual social distancing behaviors. Monitoring the SDI patterns, both spatially and temporally, enables policymakers to evaluate the effectiveness of related policies and to involve data-informed decision-making for 85 public health. In addition, SDI boosts public and community awareness regarding the ongoing situation for where they are living. People can use insights from SDI to evaluate the potential risks in their neighborhoods. Being exploratory research, this study could be further improved in several directions. Firstly, the basic mobility metrics could be generated considering regional differences. Specifically, the current definition of the stay-at-home population may introduce some bias due to different individual behaviors between residents in rural and urban areas. For example, many people living in rural regions still must make long trips to shop for essential goods while people in urban areas have a higher chance of obtaining essential items nearby (within 1 mile from home) and thus are more likely to be identified as staying at home. Secondly, adding more mobility metrics to the SDI could contribute to the comprehensiveness of the index. For instance, the trip purposes could be inferred by integrating MDLD and point of interest (POI) data. Identifying where people visit could provide the opportunity to distinguish between essential and non-essential trips, in addition to distinguishing between work and non-work trips. Thirdly, variables measuring the relationship between human movements and disease transmission could be extremely valuable. Although it may be difficult to retrieve details such as contact tracing information from MDLD, the aggregated measurements can also be significant indicators, such as trips from and to the heavily infected areas that yield potential exposure and disease transmission in the study, on top of out-of-county trips that are currently included. Moreover, an expert survey on improving the weight assignments to different variables in SDI may also contribute to better construction of the index. 86 Another future research direction is to integrate SDI with existing epidemiological frameworks, such as compartment models. A variable of interest in these frameworks is to understand how the input variables evolve during the course of the outbreak. Certain policies, such as mobility restrictions, can significantly reduce certain input variables like the reproduction factor of the disease. SDI can be employed in these models to enhance the input prediction in compartmental models. 87 Chapter 6: MDLD in Action for Disaster Evacuation Understanding individuals? behavior during natural disasters is of paramount importance for the local, state, and federal government agencies hoping to be prepared for these extreme situations. In this study, a novel framework is introduced to construct evacuation patterns and analyze individuals? decisions (155). Hurricane Irma and the state of Florida have been selected as the case study for implementing the framework and testing the results. 6.1. Introduction In September 2017, Hurricane Irma prompted officials to issue one of the largest evacuation orders in U.S. history. Over six million people were ordered to evacuate their residences due to Irma?s landfall in Florida, Georgia, and South Carolina. Mandatory and voluntary evacuation orders were issued before the landfall of the storm, on both the Atlantic and Gulf coasts. 84 deaths were reported just in the state of Florida due to either direct effects of Hurricane Irma such as drowning or indirect causes such as vehicle accidents during the evacuation. The immense scale of hurricanes and the dependence of the evacuation management on how people behave during these disasters highlighted the importance of studying the evacuation patterns of the people in such situations. 88 6.2. Data 6.2.1. Location Data The primary dataset used in this study is the MDLD of anonymized devices from LBS data sources. Based on meteorological history, Irma developed from a tropical wave near Cape Verde on August 30 and quickly intensified into a category 3 hurricane by August 31 due to the climate condition. On September 4, the storm kept intensifying, making it a Category 5 hurricane. Therefore, based on the timeline of Hurricane Irma?s evolution, the month of August 2017 is chosen to identify the home location of the users within the state of Florida with the assumption that users? behavior had not been impacted by the news of Hurricane Irma yet. For the analysis of the mobility behavior and to understand the evacuation pattern of the residents in Florida, the data from the entire month of September 2017 were analyzed. 6.2.2. Evacuation Zone Data In addition to the location data, gathering information regarding evacuation order evolution was necessary to understand the individuals? behavior. The Florida Division of Emergency Management provided the spatial polygon of evacuation zones for the counties with defined evacuation zones. However, for the information regarding evacuation orders by county and zones, no single source provided comprehensive details. The webpage of Florida governor, Rick Scott, had one of the 89 most complete information regarding the issuance of evacuation orders as of 9/9/2017. However, several counties, particularly in the north of Florida, issued evacuation orders on 9/10/2017. Also, many counties upgraded evacuation orders from voluntary to mandatory on or after 9/9/2017. Therefore, data from several sources has been compiled to provide a complete picture of the evacuation orders. The final Florida map by evacuation order and date during Hurricane Irma is shown in Figure 26 (156). Besides the evacuation map, open-source parcel-level information for the entire state of Florida was obtained. The data were gathered by the Florida Department of Revenue, County Property Appraisers, and the University of Florida GeoPlan Center. This layer contains residential home type information that has been used in the parameter selection process for the home location identification algorithm. Also, to measure the impact of living in low-lying residences on the evacuation decision, the elevation information was obtained from the digital elevation model (DEM) provided by the University of Florida GeoPlan Center for the entire state of Florida. 90 Figure 26. Florida map by evacuation order and date during Hurricane Irma 6.2.3. Socio-Demographic Data The socio-demographic information such as income, age, and race information was gathered for statistical modeling purposes. To collect this information at the census tract level, 2017 American Community Survey (ACS) 5-year estimates conducted by the United States Census Bureau have been used. 91 6.3. Methodology To construct the evacuation behavior pattern, three main steps are designed. The first step is to identify the home location of all devices. Next, a framework is proposed to determine devices that evacuated and to construct their evacuation behavior. Lastly, mobility metrics of devices are calculated to examine the relationship between the evacuation decision and the mobility behavior of the individuals. Figure 27 illustrates the framework structure. Figure 27. Disaster evacuation analysis framework flowchart 92 6.3.1. Home Location Identification For this application, as the scope of the study was limited to devices within the state of Florida only for one month, a more computationally intensive home location identification algorithm has been developed. To cluster the sightings of each device, the Density-based spatial clustering of applications with noise (DBSCAN) clustering approach was used. DBSCAN is a clustering algorithm relying on a density-based notion of clusters, designed to discover clusters of sightings regardless of their shapes (72). In addition to a more computationally intensive method, a longer nighttime window from 7 pm to 7 am was considered for the home location identification. Among all the identified clusters determined by algorithm, the home location was defined as the center of the cluster with the highest dwell time and the highest frequency observation, respectively. 6.3.2. Evacuation Detection After filtering the devices with the inferred home located within the state of Florida, the sighting data of these devices for the entire month of September were extracted to study the evacuation pattern of the residents of Florida during Hurricane Irma. First, to ensure the persistency and accuracy of the identified home location in August, only devices that have been observed at least once in their August home location during the month of September were kept for further analysis. This check removes devices without any information in September, along with devices that have changed their home location or were observed in Florida during August on a trip. 93 Next, the identified August home location of each device was intersected with the augmented shapefile to specify the corresponding county, evacuation zone, elevation information, and socio-demographic attributes of each device. The census-tract level socio-demographic attributes were added to all devices that resided in the census tract. The next step was to define evacuation based on the observed trajectories for each device. An evacuation identification method was developed based on the distance of the users? sightings to their inferred home location during the landfall of Hurricane Irma. For this purpose, the daily minimum distance between the device?s sightings and their identified August home location was calculated for each device for the entire month of September. A 1-mile threshold was selected as the evacuation criterion to determine whether each individual evacuated. If individuals were not observed within a 1-mile radius of their home locations within the hurricane study period, they were considered as individuals who evacuated their home location. The former Florida Governor, Rick Scott, declared a state of emergency on September 4, and within the next six days, 57 of the 67 counties issued evacuation orders. Eventually, Hurricane Irma made landfall on Cudjoe Key on September 10 as a category 4 hurricane and exited Florida into Georgia on September 11, after being significantly weakened. Thus, the period between September 4 and September 12 was chosen as the hurricane study period for determining the evacuation decision of the individuals. 94 6.3.3. Historical Mobility Behavior Pattern In addition to constructing the evacuation pattern, in this dissertation, the relationship between individuals? mobility behavior before the disaster and its impact on their evacuation decisions have been investigated. In particular, two important mobility aspects of the individuals, the number of trips and convex hull set information of each individual have been calculated daily for the entire month of August. The convex hull is defined as the smallest convex set that contains all the spatial sightings. Convex hull has been widely used for understanding human mobility behavior based on location trajectories in the literature (39, 157). 6.4. Constructing the Evacuation Pattern In addition to the evacuation decision, departure and reentry dates are of paramount importance in disaster evacuation management. Therefore, the minimum daily distance to home measure has been used to investigate the distribution of the departure and reentry dates. For the individuals who evacuated, the latest day before the evacuation in which they were seen in the 1-mile radius of their identified home was chosen as their departure date. Similarly, the earliest day after the evacuation, in which they were seen within the 1-mile radius of their identified home was selected as their re-entry date. Estimating the departure and reentry date provides the opportunity to further investigate the relationships between departure dates and other influential factors such as the evacuation order date. 95 Destination choice is another important decision component. While an increase in short-distance evacuations increases the demand for sheltering resources, it reduces the stress on the transportation network as well as the overall cost of the evacuation operation. In this study, the maximum of the minimum daily distances from the inferred home location was used as a proxy for the evacuation destination. Also, the impact of living in a low-lying residential area on individuals? evacuation decisions was empirically examined by controlling for the type of evacuation order received. 6.4.1. Stay or Evacuate By implementing the home location identification algorithm discussed in section 6.3.1 on more than 6 billion observations for the devices that were observed in Florida during August, the home location of 1,050,472 devices was identified. Among this set of devices, 1,002,858 devices resided within the state of Florida. Extracting the information of these devices for September, 5,677,549,347 sightings were filtered from the MDLD data for further investigations. The persistency checks were conducted to remove inactive devices during September as well as eliminate devices that did not have any sightings in the vicinity of their identified home location. The final list of devices includes 807,623 active devices. The minimum distance from the identified home location was calculated daily for all users. Then the proposed framework for evacuation identification was employed to determine the evacuation decision, departure and reentry dates of the evacuees. A summary of the rate of evacuation by each evacuation order type is shown in Table 12. 96 Table 12. Evacuation decision based on the evacuation order received No Evacuation Voluntary Mandatory Entire State Order Evacuation Order Evacuation Order Number Ratio Number Ratio Number Ratio Number Ratio Evacuated 187285 32.98 38524 33.68 72628 57.92 298437 36.9 Not 380547 67.02 75868 66.32 52771 42.08 509186 63.1 Evacuated Total 567832 100 114392 100 125399 100 807623 100 Based on the results summarized in Table 12, 57.92% of the individuals who received mandatory evacuation orders evacuated their homes while this ratio was considerably lower for people who received voluntary evacuation or no evacuation order (33.68% and 32.98%, respectively). These results are in accordance with the results of a telephone poll conducted on October 17, 2017, that showed 57% of people followed the mandatory evacuation order and in general, 33% of Floridians evacuated their homes (158). 6.4.2. Departure and Reentry Date Distribution Departure and reentry date choices are becoming increasingly important for emergency and transportation practitioners as well as state and government agencies. I tried to estimate the departure and reentry date distribution by employing the method discussed in section 6.4. It should be acknowledged that this approach might lead to some inaccuracy in capturing the actual departure and reentry dates for devices that may have lost their connections to the network either due to power outages or losing cell network services during and after the hurricane landfall. However, comparing the results with the conducted survey for the same region show a consistent pattern (120). A summary of the results is presented in Figure 28. 97 Based on the results, the majority of the evacuations occurred from September 8 to September 9, with September 9 being the peak with 26.27%. Although the majority of evacuations happened in the last three days before Irma?s landfall, the results showed that a considerable number of individuals evacuated their homes 5 days or earlier in advance, with 7.04% of people evacuated on September 5 and 10.28% evacuated before September 5. This high rate of early evacuation might be due to the fact that some counties started to issue evacuation orders as early as September 5. Increased implementation of time-phased evacuation plans can be another reason for this observation. Finally, only 2.13% of the evacuees left their homes after September 10. Departure Date Distribution 30 25 20 15 10 5 0 9/4 and 5-Sep 6-Sep 7-Sep 8-Sep 9-Sep 10-Sep 11-Sep before Departure Date (a) Departure date distribution 98 Percent Reentry Date Distribution 30 25 20 15 10 5 0 9/10 and 11-Sep 12-Sep 13-Sep 14-Sep 15-Sep 16-Sep 9/17 and earlier later Reentry Date (b) Reentry date distribution Figure 28. Departure and reentry date distribution On the other hand, reentry date distribution was smoother in comparison to the departure date, with a peak of 24.65% observed on September 11. This was expected since regions do not become livable at once after a disaster. Besides, agencies do not provide returning plans for the impacted areas. Therefore, people usually decide to re- enter their residence in a way that minimizes any impedance such as traffic. Moreover, the results indicated that about 12.89% of the evacuees returned to their homes on September 10 or earlier. This observation has also been observed in a survey as well mainly due to the updates on the hurricane path. Individuals who evacuated earlier may have concluded that their residences were no longer at risk (120). To delve more into the departure date distribution, the effect of the evacuation order date on the departure date was investigated for all the regions. The majority of the 99 Percent individuals who received evacuation orders on September 6 departed their homes on September 7 and September 8 while individuals who received evacuation orders on September 7 mostly chose to leave their homes from September 7 to September 9. The same trend can be observed for the people who were ordered to evacuate their homes on September 8. 34.53% of them decided to leave their residences on the following day. As it got closer to the landfall of the hurricane, the impact of the evacuation order date on the individuals? actual departure date decision diminished. The majority of evacuees who were ordered to evacuate on September 9 and September 10 had already left their residences before the receiving of the evacuation order. Figure 29 is color-coded by the evacuation order date. Departure Date Based on the Date of Evacuation Order 40 30 20 10 0 9/4/17 and 9/5/2017 9/6/2017 9/7/2017 9/8/2017 9/9/2017 9/10/2017 9/11/2017 earlier Departure Date 9/6/2017 9/7/2017 9/8/2017 9/9/2017 9/10/2017 Figure 29. Relationship between departure date and evacuation order date 6.4.3. Destination Choice: Distance to Evacuation Destination The overall distribution of distance to evacuation destination followed a similar trend among evacuees regardless of the evacuation orders. However, on average, evacuees who received mandatory evacuation orders sought farther locations. The trend is 100 Percent shown in Figure 30. While about 43% of the evacuees who received voluntary or no evacuation orders decided to choose a destination within a 20-mile radius of their residential locations, 35.47% of evacuees who received mandatory evacuation orders stayed within the 20-mile radius of their home. The distance distribution also suggests that evacuees tend to choose either a close evacuation destination within their neighborhood or travel farther away to reach a location they perceive safe. Distance Distribution to Evacuation Destination 50 40 30 20 10 0 Less than 20 miles Between 20 to 50 Between 50 to 100 Between 100 to Farther than 500 miles miles 500 miles miles Distance No Evacuation Order Voluntary Evacuation Order Mandatory Evacuation Order Figure 30. Distribution of evacuation destination distance to the home locations To dig more into the trend of the evacuation distance, the spatial distribution of the evacuation distance is also illustrated in Figure 31. Evacuees living near the shores tend to travel to farther destinations. This observation is expected as those individuals may perceive a higher risk compared to the people living in the midland. 101 Percent Figure 31. Median distance traveled to evacuation destination at county level 6.4.4. Evacuation Duration Distribution In terms of evacuation duration, as it is shown in Figure 32, evacuees who received mandatory evacuation orders had a slightly longer evacuation duration. To better understand the spatial trend of the evacuation duration, the average evacuation duration at the county level is also presented in Figure 33. People living in the southern part of Florida had a longer evacuation duration which can be a result of more severe damages to the properties and infrastructures in those specific regions. 102 Evacuation Duration Distribution 30 25 20 15 10 5 0 2 days 3 days 4 days 5 days between 5 to more than 10 10 days days Duration No Evacuation Order Voluntary Evacuation Order Mandatory Evacuation Order Figure 32. Evacuation duration distribution across different evacuation order groups Figure 33. Average evacuation duration at the county level 103 Percent 6.4.5. Impact of Low-Lying Residential Area The impact of low-lying residential areas on individuals? evacuation decisions has also been investigated. Since there is no strict definition for the low-lying area, three categories were introduced based on the elevation of the residential area; elevation less than 10 meters, between 10 meters to 50 meters, and more than 50 meters. Also to control for the effect of the evacuation orders on individuals? decisions, the evacuation orders were considered. Evacuation rates for each group are presented in Figure 34. It can be seen that the elevation of residential areas has a strong association with people?s decision to evacuate. 36.59% of people who had not received any evacuation order but were living in low-lying residential areas decided to leave their homes, while this rate was 28.43% for those in areas with elevation more than 50 meters. Impact of Elevation on Evacuation Decision 80 60 40 20 0 No Evacuation Order Voluntary Evacuation Order Mandatory Evacuation Order Evacuation Group less than 10 meter between 10-50 meter more than 50 meter Figure 34. Elevation impacts on evacuation decisions 104 Percent 6.5. Statistical Model After extracting the evacuation behavior of individuals and comparing the results against existing polls and surveys, this study investigates the statistical linkage between mobility patterns of individuals and their evacuation decisions. The evacuation decision has been well studied in the literature and its importance and implications for agencies have been highlighted. Previous studies revealed the importance of socio-demographic variables such as age, income, and race as well as evacuation orders and the perceived worries and concerns in evacuation decisions. In this dissertation, in addition to those metrics, the importance of individuals? mobility behavior in their decision has also been examined. The individual-level mobility measures including the daily number of trips and the convex hull of each active device were calculated during the entire month of August. The mobility measures were incorporated into the logistic regression model to examine whether those measures are statistically significant in the evacuation decision choice model and whether they can improve the evacuation decision model?s accuracy. Table 13 summarizes the list of variables considered for modeling purposes. To develop the statistical model, 3,937 devices were removed from the dataset due to missing socio- demographic attributes. 105 Table 13. Data description and summary for evacuation choice model Metric Definition Descriptive Statistics Categorical Variable Count Percentage 0 = did not Evacuation Evacuation 507605 63.16 evacuate, Decision decision 1 = evacuate 296081 36.84 0 = none 565178 70.32 Evacuation order Evacuation order 1 = voluntary, 114038 14.19 received 2 = mandatory 124470 15.49 Continuous Variable Min Median Max SD Elevation Residential location elevation -1 6 102 13.86 Median age of the residential census Median age 11.9 41.4 83.3 9.71 tract Median income of the residential census Median income 8804 54279 2500001 22951 tract Vehicle Percentage of households with at least 28.4 96.1 100 5.82 availability one vehicle in the census tract Percentage of white population in the Race - white 0 0.83 1 0.17 census tract Average number Average number of trips taken by the 1 5.5 51.4 3.82 of trip individual per day during August Average of Average daily convex hull area of 0 48.57 57274.8 510.31 convex hull area individuals during August As no evacuation was the base choice in the decision variable, positive coefficients indicate that an increase in variables? value increases the likelihood of evacuation, while a negative sign denotes a decrease in the likelihood of evacuation. The summary of the results is presented in Table 14. 106 Table 14. Logistic regression models? summary Model#1 ? logistic model without Model#2 ? logistic model with mobility behavior metrics mobility behavior metrics Variable Estimated Estimated p-value p-value coefficient coefficient Intercept 3.61E-01 <0.001 *** 4.45 E-01 <0.001 *** Evacuation order 4.06 E-01 <0.001 *** 4.08 E-01 <0.001 *** Elevation -8.60 E-05 <0.001 *** -8.55 E-05 <0.001 *** Median age 8.48 E-03 <0.001 *** 8.65 E-03 <0.001 *** Median income 3.62 E-08 0.766 2.68 E-07 0.028 * Vehicle -1.57 E-02 <0.001 *** -1.88 E-02 <0.001 *** availability Race - white 2.59 E-01 <0.001 *** 2.44 E-01 <0.001 *** Average number - - 1.03E-02 <0.001 *** of trip Average of - - 4.28E-04 <0.001 *** convex hull area Number of 803686 803686 observation Log Likelihood -516912.5 (df=7) -513806.2 (df=9) AIC 1033839 1027630 McFadden R2 0.025 0.031 Models P-value (Chi) = <0.001 *** Comparison As shown in Table 14, two logistic regression models have been developed. Model#1 only includes socio-demographic information, the elevation of residential location, and evacuation order attributes while model#2 utilized mobility behavior metrics in addition to all variables in model#1. In both models, the sign of coefficients for common variables was in line and consistent with previous studies except for the vehicle availability metric. Higher vehicle availability was expected to increase the likelihood of evacuation but in this model, the coefficient was estimated negative. One possible reason for this observation might be due to the low variation of this metric in the study region (the first quantile of vehicle availability was 92.6% and the median was 96.1%). Both mobility metrics turned out to be statistically significant in model#2 and the overall accuracy of the model improved significantly. The estimated 107 sign of the coefficients was positive which indicates that individuals with more trips per day and a larger mobility footprint are more likely to evacuate their residential location during a disaster. 6.6. Summary and Discussion The intensity and the frequency of weather-related disasters are expected to increase due to climate change, increase in sea surface temperature, and other related causes (159, 160). In order to be prepared, it is crucial for the state and federal government agencies to understand individuals? behavior before, during, and after a disaster. Most of the research in the literature studied individuals? behavior during these extreme events based on post-disaster surveys. In addition to the small sample size, these surveys are typically prone to several biases, such as observer effect bias and imperfect recall of the evolution of the evacuation process. This dissertation tried to extract information from MDLD to construct several aspects of evacuation patterns by analyzing anonymized individuals? traces. In this study, the evacuation behavior of 807,623 anonymized individuals was captured by employing the proposed framework on more than 11 billion location sightings. The study results showed that type of evacuation order has a strong impact on individuals? evacuation decisions. Results showed that 57.92% of individuals who received mandatory evacuation orders left their homes while this ratio was 32.98% and 35.68% for smartphone users who received no evacuation order and voluntary evacuation order, respectively. 108 Irma made its landfall in the mainland U.S. on September 10. The departure date and reentry date analysis conducted in this study demonstrated that the majority of the evacuees left their residences in the last three days leading to the hurricane's landfall, with the peak of evacuation observed on September 9 when 26.27% of evacuees departed their home. However, the returning process was distributed more evenly among days after the landfall. The effect of evacuation orders? dates on individuals? departure date decisions was also empirically examined. It was shown that late evacuation orders (ones that were issued on September 9 and September 10) did not have a strong influence on individuals? departure decisions; while for the regions that received evacuation orders earlier (from September 6 to September 8) an increase was observed in evacuation rate the day after the evacuation order was issued. These findings highlight the importance of issuing evacuation orders at least two days before the hurricane's landfall. The evacuation distance distribution revealed that the individuals selected to shelter either in the vicinity of their residential area or decided to go to farther away destinations (more than 100 miles away from their home location). It has also been shown that the elevation of residential areas had a strong effect on individuals? evacuation decisions. People living in low-lying regions showed a higher evacuation rate in comparison to people living in mid- and high-elevation regions after controlling for the evacuation order type. This study also showed that the observed mobility pattern of individuals can play a significant role in improving the accuracy of evacuation decision models. Having 109 access to historical MDLD provides unique information to the agencies and decision- makers to have a better understanding of the evacuation evolution in their region. Although analyzing the behavior of smartphone users provides an opportunity to observe the actual behavior of millions of individuals during disasters, several limitations still exist. While the sample size of the MDLD is enormous, it should still be noted that these type of data have their own biases. The other limitation is the fact that post-disaster surveys usually provide a rich set of socio-demographic information and stated preferences of the individuals while MDLD lacks any such information. 110 Chapter 7: Conclusions and Remarks for Future Work Understanding people?s mobility behavior-i.e., where, when, why, and how people travel is of paramount importance for making decisions and policymaking regarding traffic management and operations, resource allocations, responding to natural disasters, and infrastructure planning. For decades, planners have been relying on two major data sources, i.e. travel survey data and traffic monitoring data (such as roadway traffic volumes, transit ridership information, etc.). The inherent issues and shortcomings of the two data sources, such as small sample size, the cumbersome procedure of obtaining such dataset, and inadequate coverage of travel modes, make the understanding of human mobility patterns costly and prone to known biases. With the emergence of mobile networks and positioning technologies, mobile device location data have drawn decision-makers and researchers' attention due to their unique potential in analyzing human mobility behavior and understanding travel characteristics. This dissertation constructed a set of frameworks and developed novel algorithms to derive mobility metrics from nationwide MDLD. The remainder of this chapter begins by summarizing the research contributions and findings of this dissertation, followed by a discussion of the future work directions. 7.1. Summary of Contributions In chapter 2 first I conducted a comprehensive literature review and practice scan regarding the evolution of mobile device location data and the related advancements in positioning technologies. Then I summarized the research efforts conducted to 111 extract device- and trip-level information from the MDLD. The literature review is followed by presenting studies that investigated the importance of human mobility behavior in two different study cases, the outbreak of disease and evacuation behavior analysis during natural disasters. In chapter 3 I introduced the mobile device location data utilized in this study and discussed the data cleaning and preprocessing steps required prior to extracting mobility information from MDLD. The chapter ended by providing a national-level data summary. Chapter 4 discussed the methodological advancement in inferring device-level and trip-level information from MDLD. A computationally efficient home and work location identification algorithm was introduced in section 4.1. The algorithm was compared with other state-of-practice algorithms and was proven to be both efficient and effective In identifying home and work locations at the national level. In the absence of the individual-level information, the algorithm?s outputs were examined against the aggregate level ground truth datasets including ACS estimates and LODES data. Then a novel tour-based trip identification algorithm was introduced to overcome the shortcomings of the existing trip identification algorithms. The tour- based trip identification algorithm leverages the identified home and work location of devices to form tours and enables researchers to differentiate between the long- distance and short-distance tours and link trips together with higher accuracy. The last section of this chapter proposed a new method to impute the travel mode of the trips based on the feature set constructed from both trip trajectory information and the 112 transportation networks? information of different modes. The empirical results from the proposed algorithm successfully demonstrated its superior performance compared to other state-of-practice and state-of-art algorithms, especially for the modes that are more difficult to be differentiated such as car and bus modes. In chapter 5, this dissertation developed a framework to quantify the impact of the COVID-19 pandemic on human mobility patterns. The framework was built upon the methodologies described in chapter 4 along with two additional methodological steps (i.e., bi-level weighting and social distancing index construction) to portray a more complete mobility pattern evolution of the communities before and during the pandemic. The national-level, state-level, and county-level mobility pattern trends were investigated to demonstrate the effectiveness and usefulness of such timely data in providing insights to communities and decision-makers. Chapter 6 extended the human mobility behavior analysis to extreme conditions such as natural disasters. In this chapter, different aspects of evacuation behavior such as evacuation decision, departure time and reentry time, evacuation distance, evacuation duration, and determinant of evacuation decision were studied during a natural disaster. The proposed framework was applied to MDLD for the residents of Florida during the landfall of Hurricane Irma. The proposed framework successfully constructed the evacuation decisions and showed the significance of individuals? historical mobility behavior in their evacuation decisions. 113 7.2. Future Directions The applications of MDLD in the transportation domain have grown exponentially since the MDLD data made its debut in the late 1990s. However, there is still room for improvements in the methodologies that are being used to infer human mobility information. I propose the following research directions for future studies: (1) Preparing an accessible data sandbox with the true device- and trip-level labels with data privacy considerations for the transportation research community. There is a lack of standard and reliable data for transportation researchers to test and develop their algorithms and report consistent accuracy measures for their proposed algorithms. In other fields such as computer science, it is a common practice to use standard datasets to develop and test the performance of different algorithms. This practice has led to significant progress in algorithm development as well as higher transparency in the methodologies. (2) Human mobility pattern analysis during a pandemic. Chapter 5 only scratches the surface of how insights from human mobility patterns can be used during a pandemic. The core mobility metrics developed for this analysis could be further improved to be tailored toward different communities. For instance, the current definition of the stay- at-home population could be modified in a way to distinguish the inherent differences between the individual mobility behavior in different living environments such as densely urbanized areas versus rural areas. Further research efforts could also be conducted to integrate the mobility measures into the existing epidemiological frameworks such as compartment models as important input variables of the models. 114 (3) Applications of MDLD in disastrous events. This dissertation shows the feasibility of constructing the evacuation behavior of individuals during a hurricane. Further studies could be conducted to further validate the results of the MDLD and explore the feasibility of providing real-time evacuation information. Improvements in individual-level socio-demographic imputation could also add more context to the MDLD-based outcomes and enables a more in-depth analysis of different evacuation behavior. (4) Investigating the impact of changes on the mobile device location data streams. The mobile device location data coverage and information collection methods change from time to time due to updates on the privacy protection practices or changes in the technology. A more comprehensive analysis of the impact of these changes should be conducted for a better understanding of the robustness of the derived mobility behavior analysis over time. 115 Bibliography 1. Gonzalez MC, Hidalgo CA, Barabasi A-L. Understanding individual human mobility patterns. nature. 2008;453(7196):779-82. 2. Xu Y, Shaw S-L, Zhao Z, Yin L, Fang Z, Li Q. Understanding aggregate human mobility patterns using passive mobile phone location data: a home-based approach. Transportation. 2015;42(4):625-46. 3. Levinson D, Kumar A. Activity, travel, and the allocation of time. Journal of the American Planning Association. 1995;61(4):458-70. 4. McNally MG. The four-step model: Emerald Group Publishing Limited; 2007. 5. Wang Z, He SY, Leung Y. Applying mobile phone data to travel behaviour research: A literature review. Travel Behaviour and Society. 2018;11:141-55. 6. Calabrese F, Colonna M, Lovisolo P, Parata D, Ratti C. Real-time urban monitoring using cell phones: A case study in Rome. IEEE transactions on intelligent transportation systems. 2010;12(1):141-51. 7. Alexander L, Jiang S, Murga M, Gonz?lez MC. Origin?destination trips by purpose and time of day inferred from mobile phone data. Transportation research part c: emerging technologies. 2015;58:240-50. 8. Hasan S, Schneider CM, Ukkusuri SV, Gonz?lez MC. Spatiotemporal patterns of urban human mobility. Journal of Statistical Physics. 2013;151(1):304-18. 9. Yabe T, Ukkusuri SV. Effects of income inequality on evacuation, reentry and segregation after disasters. Transportation Research Part D: Transport and Environment. 2020:102260. 10. Frias-Martinez E, Williamson G, Frias-Martinez V, editors. An agent-based model of epidemic spread using human mobility and social network information. 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing; 2011: IEEE. 11. Batelle. Global Positioning Systems for personal travel surveys: Lexington area travel data collection test. Final Report, Office of Highway Policy Information and Office of Technology Applications, Federal highway Administration, Batelle Transport Division, Columbus.; 1997. 12. Yalamanchili L, Pendyala RM, Prabaharan N, Chakravarthy P. Analysis of global positioning system-based data collection methods for capturing multistop trip- chaining behavior. Transportation Research Record. 1999;1660(1):58-65. 116 13. Wolf J. Using GPS data loggers to replace travel diaries in the collection of travel data: Georgia Institute of Technology; 2000. 14. Pearson D, editor Global Positioning System (GPS) and travel surveys: Results from the 1997 Austin household survey. Eighth Conference on the Application of Transportation Planning Methods, Corpus Christi, Texas; 2001. 15. Wolf J, Guensler R, Bachman W. Elimination of the travel diary: Experiment to derive trip purpose from global positioning system travel data. Transportation Research Record. 2001;1768(1):125-34. 16. Ojah M, Pearson D. Austin/San Antonio GPS-Enhanced Household Travel Survey,?. Texas Transportation Institute. 2008. 17. Wolf J, Lee M, editors. Synthesis of and statistics for recent GPS-enhanced travel surveys. Paper submitted to the Eighth Int Conf Survey Methods in Transport: Harmonization and Data Comparability, Annecy, France; 2008. 18. Wolf J, Oliveira M, Thompson M. Impact of underreporting on mileage and travel time estimates: Results from global positioning system-enhanced household travel survey. Transportation research record. 2003;1854(1):189-98. 19. Council TCM. 2010-2012 Minneapolis - St. Paul Travel Behavior Inventory. 2012. 20. Commission DVRP. 2012-2013 Delaware Valley Household Travel Survey. 2013. 21. Westat R. 2014 Southern Nevada Household Travel. Final Report.; 2015. 22. Shen L, Stopher PR. Review of GPS travel survey and GPS data-processing methods. Transport reviews. 2014;34(3):316-34. 23. Itsubo S, Hato E. Effectiveness of household travel survey using GPS- equipped cell phones and Web diary: Comparative study with paper-based travel survey. 2006. 24. Krygsman SC, Nel J. The use of global positioning devices in travel surveys-a developing country application. SATC 2009. 2009. 25. Stopher P, Wargelin L, editors. Conducting a household travel survey with GPS: reports on a pilot study. 12th World Conference on Transport Research; 2010. 26. Sch?nfelder S, Axhausen KW, Antille N, Bierlaire M. Exploring the potentials of automatically collected GPS data for travel behaviour analysis: A Swedish data source. Arbeitsberichte Verkehrs-und Raumplanung. 2002;124. 117 27. Papinski D, Scott DM, Doherty ST. Exploring the route choice decision- making process: A comparison of planned and observed routes obtained using person-based GPS. Transportation research part F: traffic psychology and behaviour. 2009;12(4):347-58. 28. INRIX Traffic, https://inrix.com/ 2021 29. Haghani A, Hamedi M, Sadabadi KF. I-95 Corridor coalition vehicle probe project: Validation of INRIX data. I-95 Corridor Coalition. 2009;9. 30. Schrank D, Eisele B, Lomax T. 2014 Urban mobility report: powered by Inrix Traffic Data. 2015. 31. Horak R. Telecommunications and data communications handbook: John Wiley & Sons; 2007. 32. Pinelli F, Di Lorenzo G, Calabrese F, editors. Comparing urban sensing applications using event and network-driven mobile phone location data. 2015 16th IEEE International Conference on Mobile Data Management; 2015: IEEE. 33. Kang C, Ma X, Tong D, Liu Y. Intra-urban human mobility patterns: An urban morphology perspective. Physica A: Statistical Mechanics and its Applications. 2012;391(4):1702-17. 34. Pappalardo L, Simini F, Rinzivillo S, Pedreschi D, Giannotti F, Barab?si A-L. Returners and explorers dichotomy in human mobility. Nature communications. 2015;6(1):1-8. 35. Song C, Qu Z, Blumm N, Barab?si A-L. Limits of predictability in human mobility. Science. 2010;327(5968):1018-21. 36. ?olak S, Lima A, Gonz?lez MC. Understanding congested travel in urban areas. Nature communications. 2016;7(1):1-8. 37. Bachir D, Khodabandelou G, Gauthier V, El Yacoubi M, Puchinger J. Inferring dynamic origin-destination flows by transport mode using mobile phone data. Transportation Research Part C: Emerging Technologies. 2019;101:254-75. 38. Fekih M, Bellemans T, Smoreda Z, Bonnel P, Furno A, Galland S. A data- driven approach for origin?destination matrix construction from cellular network signalling data: a case study of Lyon region (France). Transportation. 2021;48(4):1671-702. 39. Williams NE, Thomas T, Dunbar M, Eagle N, Dobra A, editors. Measurement of human mobility using cell phone data: developing big data for demographic science. Population Association of America Annual Meeting; 2013: Citeseer. 118 40. Frias-Martinez V, Virseda J, Rubio A, Frias-Martinez E, editors. Towards large scale technology impact analyses: Automatic residential localization from mobile phone-call data. Proceedings of the 4th ACM/IEEE international conference on information and communication technologies and development; 2010. 41. Soto V, Frias-Martinez V, Virseda J, Frias-Martinez E, editors. Prediction of socioeconomic levels using cell phone records. International conference on user modeling, adaptation, and personalization; 2011: Springer. 42. Chen C, Ma J, Susilo Y, Liu Y, Wang M. The promises of big data and small data for travel behavior (aka human mobility) analysis. Transportation research part C: emerging technologies. 2016;68:285-99. 43. Wang F, Chen C. On data processing required to derive mobility patterns from passively-generated mobile phone data. Transportation Research Part C: Emerging Technologies. 2018;87:58-74. 44. Wang F, Wang J, Cao J, Chen C, Ban XJ. Extracting trips from multi-sourced data for mobility pattern analysis: An app-based data example. Transportation Research Part C: Emerging Technologies. 2019;105:183-202. 45. Flake L, Lee M, Hathaway K, Greene E. Use of smartphone panels for viable and cost-effective GPS data collection for small and medium planning agencies. Transportation Research Record. 2017;2643(1):160-5. 46. AirSage, https://www.airsage.com/ 2021 47. Huang H, Cheng Y, Weibel R. Transport mode detection based on mobile phone network data: A systematic review. Transportation Research Part C: Emerging Technologies. 2019;101:297-312. 48. Burkhard O, Becker H, Weibel R, Axhausen KW. On the requirements on spatial accuracy and sampling rate for transport mode detection in view of a shift to passive signalling data. Transportation Research Part C: Emerging Technologies. 2020;114:99-117. 49. Gong L, Morikawa T, Yamamoto T, Sato H. Deriving personal trip data from GPS data: A literature review on the existing methodologies. Procedia-Social and Behavioral Sciences. 2014;138:557-65. 50. Axhausen K, Schonfelder S, Wolf J, Oliveria M, Samaga U, editors. Eighty weeks of gps traces, approaches to enriching trip information. Transportation Research Board Annual Meeting; 2004: Citeseer. 51. Stopher PR, Jiang Q, FitzGerald C. Processing GPS data from travel surveys. 2nd international colloqium on the behavioural foundations of integrated land-use and transportation models: frameworks, models and applications, Toronto. 2005. 119 52. Tsui SYA, Shalaby AS. Enhanced system for link and mode identification for personal travel surveys based on global positioning systems. Transportation Research Record. 2006;1972(1):38-45. 53. McGowen P, McNally M, editors. Evaluating the potential to predict activity types from GPS and GIS data. Transportation Research Board 86th Annual Meeting; 2007: Citeseer. 54. Du J, Aultman-Hall L. Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets: Automatic trip end identification issues. Transportation Research Part A: Policy and Practice. 2007;41(3):220-32. 55. Stopher P, FitzGerald C, Zhang J. Search for a global positioning system device to measure person travel. Transportation Research Part C: Emerging Technologies. 2008;16(3):350-69. 56. Schuessler N, Axhausen KW. Processing raw data from global positioning systems without additional information. Transportation Research Record. 2009;2105(1):28-36. 57. Bohte W, Maat K. Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: A large-scale application in the Netherlands. Transportation Research Part C: Emerging Technologies. 2009;17(3):285-97. 58. Gong H, Chen C, Bialostozky E, Lawson CT. A GPS/GIS method for travel mode detection in New York City. Computers, Environment and Urban Systems. 2012;36(2):131-9. 59. Safi H, Assemi B, Mesbah M, Ferreira L. Trip detection with smartphone- assisted collection of travel data. Transportation Research Record. 2016;2594(1):18- 26. 60. Patterson Z, Fitzsimmons K. Datamobile: Smartphone travel survey experiment. Transportation Research Record. 2016;2594(1):35-43. 61. Gong L, Sato H, Yamamoto T, Miwa T, Morikawa T. Identification of activity stop locations in GPS trajectories by density-based clustering method combined with support vector machines. Journal of Modern Transportation. 2015;23(3):202-13. 62. Gong L, Yamamoto T, Morikawa T. Identification of activity stop locations in GPS trajectories by DBSCAN-TE method combined with support vector machines. Transportation research procedia. 2018;32:146-54. 63. Zhou C, Jia H, Juan Z, Fu X, Xiao G. A data-driven method for trip ends identification using large-scale smartphone-based GPS tracking data. IEEE Transactions on Intelligent Transportation Systems. 2016;18(8):2096-110. 120 64. Yao Z, Zhou J, Jin PJ, Yang F. Trip End Identification based on Spatial- Temporal Clustering Algorithm using Smartphone GPS Data. 2019. 65. Yang X, Sun Z, Ban XJ, Holgu?n-Veras J. Urban freight delivery stop identification with GPS data. Transportation Research Record. 2014;2411(1):55-61. 66. Ye Y, Zheng Y, Chen Y, Feng J, Xie X, editors. Mining individual life pattern based on location history. 2009 tenth international conference on mobile data management: Systems, services and middleware; 2009: IEEE. 67. Calabrese F, Pereira FC, Di Lorenzo G, Liu L, Ratti C, editors. The geography of taste: analyzing cell-phone mobility and social events. International conference on pervasive computing; 2010: Springer. 68. Chen C, Bian L, Ma J. From sightings to activity locations: how well can we guess the locations visited from mobile phone sightings. Transp Res Part C. 2014;46(10):326-37. 69. Zhou C, Frankowski D, Ludford P, Shekhar S, Terveen L. Discovering personally meaningful places: An interactive clustering approach. ACM Transactions on Information Systems (TOIS). 2007;25(3):12-es. 70. Chen W, Ji M, Wang J. T-DBSCAN: A Spatiotemporal Density Clustering for GPS Trajectory Segmentation. International Journal of Online Engineering. 2014;10(6). 71. Yin M. Activity-Based Urban Mobility Modeling from Cellular Data: University of California, Berkeley; 2018. 72. Ester M, Kriegel H-P, Sander J, Xu X, editors. A density-based algorithm for discovering clusters in large spatial databases with noise. kdd; 1996. 73. Jiang S, Fiore GA, Yang Y, Ferreira Jr J, Frazzoli E, Gonz?lez MC, editors. A review of urban computing for mobile phone traces: current methods, challenges and opportunities. Proceedings of the 2nd ACM SIGKDD international workshop on Urban Computing; 2013. 74. Phithakkitnukoon S, Horanont T, Di Lorenzo G, Shibasaki R, Ratti C, editors. Activity-aware map: Identifying human daily activity pattern using mobile phone data. International workshop on human behavior understanding; 2010: Springer. 75. Xie K, Deng K, Zhou X, editors. From trajectories to activities: a spatio- temporal join approach. Proceedings of the 2009 International Workshop on Location Based Social Networks; 2009. 121 76. Huang L, Li Q, Yue Y, editors. Activity identification from GPS trajectories using spatial temporal POIs' attractiveness. Proceedings of the 2nd ACM SIGSPATIAL International Workshop on location based social networks; 2010. 77. Spinsanti L, Celli F, Renso C, editors. Where you stop is who you are: understanding people?s activities by places visited. the proceedings of Behaviour Monitoring and Interpretation (BMI) workshop; 2010. 78. Gong L, Liu X, Wu L, Liu Y. Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartography and Geographic Information Science. 2016;43(2):103-14. 79. Flamm M, Kaufmann V. The concept of personal network of usual places as a tool for analysing human activity spaces: a quantitative exploration. Lausanne: EPFL. 2006:23. 80. Calabrese F, Di Lorenzo G, Liu L, Ratti C. Estimating Origin-Destination flows using opportunistically collected mobile phone location data from one million users in Boston Metropolitan Area. 2011. 81. Isaacman S, Becker R, C?ceres R, Kobourov S, Martonosi M, Rowland J, et al., editors. Identifying important places in people?s lives from cellular network data. International conference on pervasive computing; 2011: Springer. 82. Yang M, Pan Y, Darzi A, Ghader S, Xiong C, Zhang L. A data-driven travel mode share estimation framework based on mobile device location data. Transportation. 2021:1-45. 83. Stenneth L, Wolfson O, Yu PS, Xu B, editors. Transportation mode detection using mobile phones and GIS information. Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems; 2011. 84. Brunauer R, Hufnagl M, Rehrl K, Wagner A, editors. Motion pattern analysis enabling accurate travel mode detection from GPS data only. 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013); 2013: IEEE. 85. Nitsche P, Widhalm P, Breuss S, Br?ndle N, Maurer P. Supporting large-scale travel surveys with smartphones?A practical approach. Transportation Research Part C: Emerging Technologies. 2014;43:212-21. 86. Xiao G, Juan Z, Zhang C. Travel mode detection based on GPS track data and Bayesian networks. Computers, Environment and Urban Systems. 2015;54:14-22. 87. Shafique MA, Hato E. Travel mode detection with varying smartphone data collection frequencies. Sensors. 2016;16(5):716. 122 88. Wang B, Gao L, Juan Z. Travel mode detection using GPS data and socioeconomic attributes based on a random forest classifier. IEEE Transactions on Intelligent Transportation Systems. 2017;19(5):1547-58. 89. Dabiri S, Heaslip K. Inferring transportation modes from GPS trajectories using a convolutional neural network. Transportation research part C: emerging technologies. 2018;86:360-71. 90. Broach J, Dill J, McNeil NW. Travel mode imputation using GPS and accelerometer data from a multi-day travel survey. Journal of Transport Geography. 2019;78:194-204. 91. Vaughan J, Imani AF, Yusuf B, Miller EJ. Modelling cellphone trace travel mode with neural networks using transit smartcard and home interview survey data. European Journal of Transport and Infrastructure Research. 2020;20(4):269-85. 92. Breyer N, Gundleg?rd D, Rydergren C. Travel mode classification of intercity trips using cellular network data. Transportation Research Procedia. 2021;52:211-8. 93. Group WHOW. Nonpharmaceutical interventions for pandemic influenza, international measures. Emerging infectious diseases. 2006;12(1):81. 94. Brownstein JS, Wolfe CJ, Mandl KD. Empirical evidence for the effect of airline travel on inter-regional influenza spread in the United States. PLoS Med. 2006;3(10):e401. 95. Bajardi P, Poletto C, Ramasco JJ, Tizzoni M, Colizza V, Vespignani A. Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic. PloS one. 2011;6(1):e16591. 96. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID- 19) outbreak. Science. 2020;368(6489):395-400. 97. Kelso JK, Milne GJ, Kelly H. Simulation suggests that rapid activation of social distancing can arrest epidemic development due to a novel strain of influenza. BMC public health. 2009;9(1):1-10. 98. Greenstone M, Nigam V. Does social distancing matter? University of Chicago, Becker Friedman Institute for Economics Working Paper. 2020(2020-26). 99. Li D, Lv J, Botwin G, Braun J, Cao W, Li L, et al. Estimating the scale of COVID-19 epidemic in the United States: Simulations based on air traffic directly from Wuhan, China. MedRxiv. 2020. 123 100. Koo JR, Cook AR, Park M, Sun Y, Sun H, Lim JT, et al. Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study. The Lancet Infectious Diseases. 2020;20(6):678-88. 101. Prem K, Liu Y, Russell TW, Kucharski AJ, Eggo RM, Davies N, et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. The Lancet Public Health. 2020;5(5):e261-e70. 102. Prem K, Cook AR, Jit M. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS computational biology. 2017;13(9):e1005697. 103. Cowling BJ, Ali ST, Ng TW, Tsang TK, Li JC, Fong MW, et al. Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: an observational study. The Lancet Public Health. 2020;5(5):e279-e88. 104. Bragazzi NL, Dai H, Damiani G, Behzadifar M, Martini M, Wu J. How big data and artificial intelligence can help better manage the COVID-19 pandemic. International journal of environmental research and public health. 2020;17(9):3176. 105. Vaishya R, Javaid M, Khan IH, Haleem A. Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2020;14(4):337-9. 106. Google. See how your community is moving around differently due to COVID-19. [Available from: https://www.google.com/covid19/mobility/.] 107. Apple. Mobility Trends Reports. 2021 [Available from: https://covid19.apple.com/mobility.] 108. Cuebiq. Mobility Insights. [Available from: https://www.cuebiq.com/visitation-insights-covid19/.] 109. Huang S-K, Lindell MK, Prater CS. Who leaves and who stays? A review and statistical meta-analysis of hurricane evacuation studies. Environment and Behavior. 2016;48(8):991-1029. 110. Murray-Tuite P, Wolshon B. Evacuation transportation modeling: An overview of research, development, and practice. Transportation Research Part C: Emerging Technologies. 2013;27:25-45. 111. Wolshon PB. Transportation's role in emergency evacuation and reentry: Transportation Research Board; 2009. 124 112. Collier J, Balakrishnan S, Zhang Z. From Hurricane Katrina to Hurricane Harvey: Actions, Issues, and Lessons Learned in Transportation and Logistics Efforts for Emergency Response. 2019. 113. Yin W, Murray-Tuite P, Ukkusuri SV, Gladwin H. An agent-based modeling system for travel demand simulation for hurricane evacuation. Transportation research part C: emerging technologies. 2014;42:44-59. 114. Brown C, White W, van Slyke C, Benson JD. Development of a strategic hurricane evacuation?dynamic traffic assignment model for the Houston, Texas, Region. Transportation research record. 2009;2137(1):46-53. 115. Wang H, Mostafizi A, Cramer LA, Cox D, Park H. An agent-based model of a multimodal near-field tsunami evacuation: Decision-making and life safety. Transportation Research Part C: Emerging Technologies. 2016;64:86-100. 116. Feng K, Lin N. Simulation of hurricane Irma evacuation process. 2019. 117. Mostafizi A, Wang H, Dong S. Understanding the multimodal evacuation behavior for a near-field tsunami. Transportation research record. 2019;2673(11):480- 92. 118. Robinson RM, Collins AJ, Jordan CA, Foytik P, Khattak AJ. Modeling the impact of traffic incidents during hurricane evacuations using a large scale microsimulation. International journal of disaster risk reduction. 2018;31:1159-65. 119. Zhang Z, Wolshon B, Herrera N, Parr S. Assessment of post-disaster reentry traffic in megaregions using agent-based simulation. Transportation research part D: transport and environment. 2019;73:307-17. 120. Wong S, Shaheen S, Walker J. Understanding evacuee behavior: a case study of hurricane Irma. 2018. 121. Wu H-C, Lindell MK, Prater CS. Logistics of hurricane evacuation in Hurricanes Katrina and Rita. Transportation research part F: traffic psychology and behaviour. 2012;15(4):445-61. 122. Liu S, Murray?Tuite P, Schweitzer L. Incorporating household gathering and mode decisions in large?scale no?notice evacuation modeling. Computer?Aided Civil and Infrastructure Engineering. 2014;29(2):107-22. 123. Yang H, Morgul EF, Ozbay K, Xie K. Modeling evacuation behavior under hurricane conditions. Transportation research record. 2016;2599(1):63-9. 124. Kontou E, Murray-Tuite P, Wernstedt K. Duration of commute travel changes in the aftermath of Hurricane Sandy using accelerated failure time modeling. Transportation Research Part A: Policy and Practice. 2017;100:170-81. 125 125. Hasan S, Ukkusuri S, Gladwin H, Murray-Tuite P. Behavioral model to understand household-level hurricane evacuation decision making. Journal of Transportation Engineering. 2011;137(5):341-8. 126. Smith SK, McCarty C. Fleeing the storm (s): An examination of evacuation behavior during Florida?s 2004 hurricane season. Demography. 2009;46(1):127-45. 127. Robinson RM, Foytik P, Jordan C. Review and analysis of user inputs to online evacuation modeling tool. 2017. 128. McCarney R, Warner J, Iliffe S, Van Haselen R, Griffin M, Fisher P. The Hawthorne Effect: a randomised, controlled trial. BMC medical research methodology. 2007;7(1):1-8. 129. Groves RM. Survey errors and survey costs: John Wiley & Sons; 2005. 130. Furnham A. Response bias, social desirability and dissimulation. Personality and individual differences. 1986;7(3):385-400. 131. Kumar D, Ukkusuri SV, editors. Utilizing geo-tagged tweets to understand evacuation dynamics during emergencies: A case study of Hurricane Sandy. Companion Proceedings of the The Web Conference 2018; 2018. 132. Roy KC, Hasan S. Modeling the dynamics of hurricane evacuation decisions from twitter data: an input output hidden markov modeling approach. Transportation research part C: emerging technologies. 2021;123:102976. 133. Wang Q, Taylor JE. Quantifying human mobility perturbation and resilience in Hurricane Sandy. PLoS one. 2014;9(11):e112608. 134. Yabe T, Tsubouchi K, Fujiwara N, Sekimoto Y, Ukkusuri SV. Understanding post-disaster population recovery patterns. Journal of the Royal Society Interface. 2020;17(163):20190532. 135. Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR). 2009;41(3):1-52. 136. Zhang L, Darzi A, Ghader S, Pack ML, Xiong C, Yang M, et al. Interactive covid-19 mobility impact and social distancing analysis platform. Transportation Research Record. 2020:03611981211043813. 137. Bureau. USC. 2019 American Community Survey (ACS) 5-Year Estimates Table DP05: ACS Demographic and Housing Estimates. [Available from: https://data.census.gov/cedsci/all?q=acs.] 126 138. Bureau USC. Longitudinal-Employer Household Dynamics (LEHD) Origin- Destination Employment Statistics Data (2019) [Available from: https://lehd.ces.census.gov/data/#lodes.] 139. SafeGraph. Home Algo v1 ?Monthly Batched? [Available from: https://docs.safegraph.com/docs/monthly-patterns#section-algorithms.] 140. Bureau. USC. 2019 American Community Survey (ACS) 5-Year Estimates Table DP03: Selected Economic Characteristics. August 2021. [Available from: https://data.census.gov/cedsci/all?q=acs.] 141. Bureau. USC. 2011-2015 5-Year American Community Survey (ACS) Commuting Flows, Table 1. Residence County to Workplace County Commuting Flows for the United States and Puerto Rico Sorted by Residence Geography. [Available from: https://www.census.gov/data/tables/2015/demo/metro- micro/commuting-flows-2015.html.] 142. Graham MR, Kutzbach MJ, McKenzie B. Design comparison of LODES and ACS commuting data products. 2014. 143. Xiong C, Darzi A, Pan Y, Ghader S, Zhang L. A Data-Driven Analytical Framework of Estimating Multimodal Travel Demand Patterns using Mobile Device Location Data. arXiv preprint arXiv:201204776. 2020. 144. G?ron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems: O'Reilly Media; 2019. 145. Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research. 2011;12(7). 146. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 2012;4(2):26-31. 147. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014. 148. Pan Y, Darzi A, Kabiri A, Zhao G, Luo W, Xiong C, et al. Quantifying human mobility behaviour changes during the COVID-19 outbreak in the United States. Scientific Reports. 2020;10(1):1-9. 149. Qualls N, Levitt A, Kanade N, Wright-Jegede N, Dopson S, Biggerstaff M, et al. Community mitigation guidelines to prevent pandemic influenza?United States, 2017. MMWR Recommendations and Reports. 2017;66(1):1. 127 150. Wood HO, Neumann F. Modified Mercalli intensity scale of 1931. Bulletin of the Seismological Society of America. 1931;21(4):277-83. 151. U.S. News & World Report. Best states 2021: how they were ranked. [Available from: https://www.usnews.com/news/best-states/articles/methodology.] 152. World Population Review. Healthiest Countries Population 2021. [Available from: https://worldpopulationreview.com/country-rankings/healthiest-countries.] 153. Federal Highway Administration. 2017 National Household Travel Survey Travel Profile: United States. (U.S. Department of Transportation, Washington, D.C.) [Available from: https://nhts.ornl.gov/assets/2017_USTravelProfile.pdf.] 154. Johns Hopkins University. COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. [Available from: https://coronavirus.jhu.edu/map.html.] 155. Darzi A, Frias-Martinez V, Ghader S, Younes H, Zhang L. Constructing Evacuation Evolution Patterns and Decisions Using Mobile Device Location Data: A Case Study of Hurricane Irma. arXiv preprint arXiv:210212600. 2021. 156. Younes H, Darzi A, Zhang L. How effective are evacuation orders? An analysis of decision making among vulnerable populations in Florida during hurricane Irma. Travel behaviour and society. 2021;25:144-52. 157. Cs?ji BC, Browet A, Traag VA, Delvenne J-C, Huens E, Van Dooren P, et al. Exploring the mobility of mobile phone users. Physica A: statistical mechanics and its applications. 2013;392(6):1459-73. 158. POLL M-DF. Hurricane Irma 2017 [Available from: https://media.news4jax.com/document_dev/2017/10/26/Mason- Dixon%20Hurricane%20poll_1509043928726_10861977_ver1.0.pdf.] 159. Webster PJ, Holland GJ, Curry JA, Chang H-R. Changes in tropical cyclone number, duration, and intensity in a warming environment. Science. 2005;309(5742):1844-6. 160. Elsner JB, Kossin JP, Jagger TH. The increasing intensity of the strongest tropical cyclones. Nature. 2008;455(7209):92-5. 128