ABSTRACT Title of Dissertation: A NATIONAL TRAVEL DEMAND MODEL FOR THE U.S.: A PERSON-BASED MICROSIMULATION APPROACH Yijing Lu, Doctor of Philosophy, 2017 Directed By: Lei Zhang, Professor Department of Civil & Environmental Engineering Understanding long distance travel behavior and forecasting reliable long distance travel demand are critical in evaluating intercity or regional transportation improvements and infrastructure investment projects. As the nation and various states engage in funding transportation infrastructure improvements to meet future long- distance passenger travel demand, it is imperative to develop effective and practical modeling methods for long-distance passenger travel analysis. This dissertation proposes the first integrated activity-based travel demand model system for individual’s quarterly/yearly long distance or national activities and travel in the U.S at the Metropolitan Statistical Area (MSA)/Non-MSA level. The model system is developed based on a rigorous behavioral framework in long distance travel planning, and takes into account the specific attributes of the long distance travel such as low frequency, long activity duration, different sets of mode alternatives, etc. The system includes three tiers: 1) the yearly long distance activity pattern level estimating the number of different activities a person will choose during one year; 2) the tour level which consists of tour destination choice, time of year choice, tour duration, and tour mode choice; 3) the stop level estimating the intermediate stop frequency, purpose and location. According to the different decision-making processes for different types of long distance activities (business, personal business, and pleasure), two tour-level model structures were developed, one for long distance business/personal business activities and the other for long distance pleasure activity. Econometric model developments are conducted for the multiple model components. And estimation results are obtained based on the 1995 American Travel Survey data, transportation origin-destination (OD) skim data, and economic/demographic data. With-out-sample validation is performed for each model component and system-wide model calibration is conducted using optimization method prior to model implementation and future year policy analysis. The model system is implemented in our developed micro-simulation platform which simulates each individual’s yearly long distance activities and travel in the U.S with the input of the population data, the associated transportation OD skim data and economic/demographic data. The travel demand in the year of 2040 is forecasted and two more scenarios including national-level fuel price increase and high speed rail operation are analyzed based on the calibrated long distance travel demand model. The contributions of the dissertation lie in the following three aspects: 1). The first national travel demand model which employs a person-based microsimulation approach is developed for the U.S. for long distance passenger travel analysis; 2). the developed person-based travel demand model enables us to conduct the travel demand analysis of high speed rail in selected inter-regional corridors in the U.S and the national-level fuel price increase; 3). a post-processing learning system which can estimate the missing information such as trip purpose for the passively collected long distance travel survey data is proposed and tested. A NATIONAL TRAVEL DEMAND MODEL FOR THE U.S.: A PERSON-BASED MICROSIMULATION APPROACH By Yijing Lu Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirement for the degree of Doctor of Philosophy 2017 Advisory Committee: Professor Lei Zhang, Chair Professor William Rand Professor Casey Dawkins Professor Qingbin Cui Professor Paul Schonfeld © Copyright by Yijing Lu 2017 ii Acknowledgements It would not have been possible for me to finish this research without the help and support from so many people. It is a great pleasure to thank all the people that contributed to the preparation and the completion of this dissertation. I am deeply indebted to my advisor, Dr. Lei Zhang, for his continuous support, inspiring advice and enthusiastic encouragement throughout the years of my Ph.D. study at the University of Maryland, College Park. I am grateful for the opportunity he gives me to work on this emerging and interesting topic. He is not only an ideal advisor in research, but also a great mentor in my pursuit of career and a good friend that cares about my personal life. I would like to extend my thanks to the rest of my dissertation committee: Professor William Rand, Professor Casey Dawkins, Professor Qingbin Cui and Professor Paul Schonfeld, for their encouragement, insightful comments and valuable suggestions to my research. My sincere thanks also go to Dr. Carlos Carrion, for the knowledge and insight he has shared. I thank all my colleagues in the Transportation Systems Research Lab at the University of Maryland. The ideas and suggestions they shared with me during our group meetings have greatly contributed to the completion of this research. I would also like to thank my fellow graduate students from other groups in the transportation program. I appreciate the opportunities to take class, cooperate on projects and have discussions with them. iii Finally, I would like to express my gratitude towards my family and dedicate this dissertation to them. They have been encouraging me, supporting me and loving me all the time. I am especially thankful to my husband, Bo Sun, for his understanding, love, support and help throughout the years of my Ph.D. study and my dissertation research. iv Table of Contents Acknowledgements ....................................................................................................... ii Table of Contents ......................................................................................................... iv List of Figures ............................................................................................................. vii List of Tables ............................................................................................................... xi List of Abbreviations ................................................................................................. xiv List of Mathematical Symbols ................................................................................... xvi Chapter 1: Introduction ............................................................................................. 1 1.1 Background .................................................................................................... 1 1.2 Objectives and Contributions ......................................................................... 5 Chapter 2: Literature Review.................................................................................. 10 2.1 Long Distance Travel Demand Modelling ................................................... 10 2.2 Long Distance Travel Survey Data .............................................................. 19 Chapter 3: Data ....................................................................................................... 23 3.1 Zone System ................................................................................................. 23 3.2 Travel Survey Data....................................................................................... 24 3.3 Transportation OD Skim and Economic/Demographic Data ....................... 32 3.4 Public Use Microdata Sample Data ............................................................. 36 Chapter 4: Model System Analysis Framework ..................................................... 38 4.1 Activity Pattern Level Model ....................................................................... 40 4.2 Tour Level Structure .................................................................................... 43 4.2.1 Travel Mode Choice Model .................................................................. 45 4.2.2 Time of Year Choice ............................................................................. 49 v 4.2.3 Tour Duration Choice Model ................................................................ 55 4.2.4 Travel Party Size Choice Model ........................................................... 61 4.2.5 Tour Destination Choice Model............................................................ 65 4.3 Stop Level Structure ..................................................................................... 69 4.3.1 Stop Frequency Choice Model.............................................................. 70 4.3.2 Stop Purpose Choice Model.................................................................. 72 4.3.3 Stop Location Choice Model ................................................................ 75 4.4 National Travel Demand Model Flow and Key Assumptions ..................... 81 Chapter 5: Preliminary Base Year OD Estimations ............................................... 89 Chapter 6: Model Calibration ............................................................................... 100 Chapter 7: Future Year Policy Analysis ............................................................... 106 7.1 Future Year Policy Scenarios ..................................................................... 106 7.2 Future Year Population Synthesis .............................................................. 112 7.3 Future Year Scenario Results Analysis ...................................................... 119 7.3.1 Scenario1: Base Scenario .................................................................... 120 7.3.2 Scenario 2: Fuel Price Increase ........................................................... 129 7.3.3 Scenario 3: High Speed Rail ............................................................... 139 Chapter 8: Long Distance Travel Survey Instrument ........................................... 155 8.1 Methodology for Long Distance Trip Purpose Classification ................... 158 8.1.1 Decision Tree Learning....................................................................... 159 8.1.2 Metaleaner........................................................................................... 162 8.2 Data for Long Distance Trip Purpose Classification ................................. 163 8.3 Trip Purpose Classification Results ........................................................... 165 vi Chapter 9: Conclusions and Future Research ....................................................... 174 9.1 Conclusions ................................................................................................ 174 9.2 Recommendations for Future Research ..................................................... 176 Bibliography ………………………………………………………………………..179 vii List of Figures Figure 2-1: Categorization of Long Distance Travel Demand Analysis Methods ..... 10 Figure 3-1: National Travel Demand Model Traffic Analysis Zone System ............. 24 Figure 3-2: Percentages of Long Distance Tour Travel Modes .................................. 27 Figure 3-3: Percentages of Long Distance Trip Purposes .......................................... 28 Figure 3-4: Travel Mode Usage by Trip Purpose ....................................................... 29 Figure 3-5: Trip Distribution by Time of Year ........................................................... 30 Figure 3-6: Trip Distribution by Purpose and Time of Year ...................................... 30 Figure 3-7: The number of Inbound/Outbound Stop Distribution .............................. 31 Figure 3-8: Outbound/Inbound Stop Purpose Distribution ......................................... 32 Figure 4-1: Long distance travel illustration ............................................................... 38 Figure 4-2: Activity-Based Long Distance Travel Demand Model System ............... 40 Figure 4-3: Yearly Long Distance Activity Pattern Level .......................................... 41 Figure 4-4: Tour Level Procedure and Model Components ....................................... 45 Figure 4-5: Tour Mode Choice Validation for Business Purpose............................... 48 Figure 4-6: Tour Mode Choice Validation for Pleasure Purpose ............................... 48 Figure 4-7: Tour Mode Choice Validation for Personal Business Purpose ............... 49 Figure 4-8: Re-simulating Time of Year Choice Model ............................................. 51 Figure 4-9: Time of Year Choice Validation for Business Purpose ........................... 53 Figure 4-10: Simple Time of Year Choice Validation for Pleasure Purpose ............. 54 Figure 4-11: Full Time of Year Choice Validation for Pleasure Purpose .................. 55 Figure 4-12: Observed Duration Distribution of Long Distance Pleasure Activities . 58 viii Figure 4-13: Baseline Hazard Rate for Business and Personal Business Duration Model .......................................................................................................................... 60 Figure 4-14: Validation results for Business Duration Model .................................... 61 Figure 4-15: Validation results for Personal Business Duration Model ..................... 61 Figure 4-16: Travel Party Size Choice Validation for Business Purpose ................... 63 Figure 4-17: Travel Party Size Choice Validation for Personal Business .................. 64 Figure 4-18: Travel Party Size Choice Validation for Pleasure Purpose .................... 65 Figure 4-19: Destination Choice Validation for Business Purpose ............................ 68 Figure 4-20: Destination Choice Validation for Pleasure Purpose ............................. 68 Figure 4-21: Destination Choice Validation for Personal Business Purpose ............. 68 Figure 4-22: Stop Level Procedure and Model Components...................................... 69 Figure 4-23: Inbound Stop Frequency Model Validation ........................................... 71 Figure 4-24: Outbound Stop Frequency Model Validation ........................................ 72 Figure 4-25: Outbound Stop Purpose Model Validation ............................................ 74 Figure 4-26: Inbound Stop Purpose Model Validation ............................................... 75 Figure 4-27: LOS estimation for the first stop during outbound tour leg ................... 76 Figure 4-28: LOS estimation for the jth stop during outbound tour leg ..................... 77 Figure 4-29: Outbound Stop Location Choice Validation .......................................... 79 Figure 4-30: Inbound Stop location Choice Validation .............................................. 80 Figure 5-1: MSA/Non-MSA Population ..................................................................... 90 Figure 5-2: Trip Distribution by Travel Mode ............................................................ 92 Figure 5-3: Trip Distribution by Travel Mode and Time of Year .............................. 93 Figure 5-4: Trip Distribution by purpose .................................................................... 93 ix Figure 5-5: Trips Originate/Destinate at MSA/Non-MSA level................................. 95 Figure 5-6: Yearly Car Trip Distribution by TAZ originating from Washington D.C 96 Figure 5-7: Year Train Trip Distribution by TAZ Originating from Washington D.C ..................................................................................................................................... 96 Figure 5-8: Yearly Air Trip Distribution by TAZ Originating from Washington D.C ..................................................................................................................................... 97 Figure 5-9: Number of trips of different categories for different model run ............. 98 Figure 7-1: Retail Gasoline Price Changes ............................................................... 107 Figure 7-2: Crude Oil Price Projections to 2040 ...................................................... 109 Figure 7-3: Interface of PopGen ............................................................................... 114 Figure 7-4: Comparison between Synthetic Population and Control Margins by Control Variable........................................................................................................ 116 Figure 7-5: MSA/Non-MSA Population in 2040..................................................... 117 Figure 7-6: Gender distribution of synthetic 2040 population.................................. 118 Figure 7-7: Income group distribution of synthetic 2040 population ....................... 118 Figure 7-8: Age group distribution of synthetic 2040 population ............................ 119 Figure 7-9: Trip distribution by travel mode ............................................................ 121 Figure 7-10: Trip Distribution by Time of Year ....................................................... 122 Figure 7-11: Trip distribution by Travel Mode and Time of Year ........................... 122 Figure 7-12: Trip distribution by trip purpose .......................................................... 123 Figure 7-13: Trip Distribution by Income level and Travel Mode ........................... 125 Figure 7-14: Average number of trips/person during one year by income group and travel mode................................................................................................................ 125 x Figure 7-15: Trip distribution by Gender and Travel Mode ..................................... 126 Figure 7-16: Average number of trips/person during one year by gender and travel mode .......................................................................................................................... 126 Figure 7-17: Trip distribution by travel mode and age group ................................... 127 Figure 7-18: Average number of trips/person during one year by age group and travel mode .......................................................................................................................... 127 Figure 7-19: Trips Originate/Destinate at MSA/Non-MSA level ............................. 128 Figure 7-20: Comparison of trip distribution by travel mode and time of year ........ 131 Figure 7-21: Comparison of trip distribution by travel mode ................................... 132 Figure 7-22: Comparison of trip distribution by purpose ......................................... 132 Figure 7-23: Comparison of trip distribution by purpose and travel mode .............. 133 Figure 7-24: Comparison of trip distribution by income group and travel mode ..... 136 Figure 7-25: Comparison of trip distribution by gender and travel mode ................ 136 Figure 7-26: Comparison of trip distribution by age group and travel mode ........... 137 Figure 7-27: Comparison of miles by car per person during one year by income group ................................................................................................................................... 137 Figure 7-28: Comparison of miles/car trip by income group ................................... 138 Figure 7-29: Comparison of train trip distribution by time of year .......................... 142 Figure 7-30: Comparison of Train Trips by trip purpose ......................................... 142 Figure 8-1: Trip Purpose Learning System ............................................................... 158 xi List of Tables Table 3-1: Encoding Reported Trip Purposes ............................................................. 27 Table 3-2: Part of PUMA and MSA/Non-MSA Correspondence Table .................... 37 Table 4-1: Comparison between long distance and short distance travel ................... 39 Table 4-2: Trip Rate for Long Distance Business Travel ........................................... 42 Table 4-3: Trip Rates for Long Distance Pleasure Travel .......................................... 42 Table 4-4: Trip Rates for Long Distance Personal Business Travel .......................... 43 Table 4-5: Tour Mode Choice Model Estimation Results .......................................... 47 Table 4-6: Full Time of Year Choice for Business Trip ............................................. 52 Table 4-7: Simple Time of Year Choice for Pleasure Trip ........................................ 53 Table 4-8: Full Time of Year Choice for Pleasure Trip .............................................. 54 Table 4-9: Tour duration choice model estimation results ......................................... 59 Table 4-10: Travel Party Size Choice Model Estimation for Business Tour ............. 63 Table 4-11: Travel Party Size Choice Model Estimation for Personal Business Tour ..................................................................................................................................... 64 Table 4-12: Travel Party Size Choice Model Estimation for Pleasure Tour .............. 65 Table 4-13: Primary Destination Choice Model Estimation at Tour Level ................ 67 Table 4-14: Stop frequency model estimation for tour inbound leg ........................... 71 Table 4-15: Stop frequency model estimation for tour outbound leg ......................... 71 Table 4-16: Purpose estimations for outbound stops .................................................. 74 Table 4-17: Purpose estimations for inbound stops .................................................... 74 Table 4-18: Stop location model estimation for tour outbound leg ............................ 79 Table 4-19: Stop location model estimation for tour inbound leg .............................. 79 xii Table 4-20: Long distance travel demand model input data ...................................... 85 Table 4-21: Output of each model component ........................................................... 85 Table 4-22: Output of long distance travel demand model system for each person ... 86 Table 6-1: 60-Iteration Calibration Hyper-Parameter............................................... 105 Table 6-2: 60-Iteration Calibration Hyper-Parameter............................................... 105 Table 6-3: 60-Iteration Calibration Results .............................................................. 105 Table 7-1: Comparison of trips between Base Scenario and Fuel Price Increase Scenario..................................................................................................................... 134 Table 7-2: Comparison of trips between High Speed Rail Scenario and Base Scenario ................................................................................................................................... 141 Table 7-3: Percentage of trip changes by travel mode between HSR scenario and Base scenario ..................................................................................................................... 144 Table 7-4: Trip Changes between TAZs by Trip purpose ........................................ 145 Table 7-5: Changes of number of train trips by trip purpose between TAZs ........... 148 Table 7-6: Changes of number of car trips by trip purpose between TAZs ............. 149 Table 7-7: Changes of number of air trips by trip purpose between TAZs .............. 149 Table 7-8: Comparison of trip shares by travel mode between HSR and Base scenario ................................................................................................................................... 151 Table 7-9: Comparison of percentage of trips and trip changes between HSR and Base ........................................................................................................................... 154 Table 8-1: Six Sets of Long Distance Trip Purpose Categorization Schemes .......... 159 Table 8-2: Model Variables Used for Long Distance Trip Purpose Estimation ....... 165 Table 8-3: Model 1 Results ....................................................................................... 166 xiii Table 8-4: Model 2 Results ....................................................................................... 167 Table 8-5: Model 3 Results ....................................................................................... 167 Table 8-6: Model 4 Results ....................................................................................... 168 Table 8-7: Model 5 Results ....................................................................................... 169 Table 8-8: Model 6 Results ....................................................................................... 169 Table 8-9: Compared Model Developments ............................................................. 170 Table 8-10: Full Model Estimation Results .............................................................. 170 Table 8-11: Reduced Model Results ......................................................................... 171 Table 8-12: Minimized Model Results ..................................................................... 171 xiv List of Abbreviations ACS American Community Survey ATS American Travel Survey BTS Bureau of Transportation Statistics CEDDS Complete Economic and Demographic Data Source CPI Consumer Price Index CSTDM California Statewide Travel Demand Model DB1B Airline Origin and Destination Survey EIA U.S Energy Information Administration FHWA Federal Highway Administration FLSWM Florida statewide travel demand model FP Fuel price scenario GDP Gross Domestic Product GSP Gross State Product GTC Generalized Travel Cost GUI graphical user interface HI Household Interview HSR high speed rail IIA Independency of Irrelevant Alternatives IPF iterative proportional fitting IPU iterative proportional updating ISTEA Intermodal Surface Transportation Efficiency Act ISTEA Intermodal Surface Transportation Efficiency Act of 1991 KYSTM Kentucky Statewide Travel Demand Model LDM Long Distance Model in UK LDPT Long Distance Personal Travel Program LOS Level of Service MCA Multiple Classification Analysis MDCEV Multiple Discrete-Continuous Extreme Value MPO Metropolitan Planning Organization MSA Metropolitan Statistical Area MSTM Maryland Statewide Transportation Model MTAS Multimodal Travel Analysis System NHTS National Household Travel Survey NHTS National Household Travel Survey xv NMS National Model System in Netherlands NRTF national road traffic forecast NRTS National Rail Travel Survey NTS U.S National Transportation Studies NUMA National Use Model Area OD Origin-Destination PETRA Danish activity-based travel demand model PUMA Public Use Microdata Areas PUMS Public Use Microdata Sample data RITA Research and Innovation Technology Administration RV Recreational Vehicle SAMPERS Sweden national travel demand model SP Stated Preference SPSA Simultaneous Perturbation Stochastic Approximation SRS Simple Random Sampling SWIM2 Oregon Statewide Integrated Model TAZ Traffic Analysis Zone TOY Time of Year TRANS-TOOLS The most recent pan-European travel demand model TSAM Transportation Systems Analysis Model TTC Total Travel Cost TTT Total Travel Time xvi List of Mathematical Symbols 𝑑𝑚𝑛 the distance between county or city m in zone i and county or city n in zone j 𝐷𝑖𝑗 the distance between zone i and zone j Uij the utility value of person choosing travel mode i for long distance activity j between a specific OD tcij.rn total travel cost of mode i for jth long distance activity when travel cost falls into the range of rn ttij total travel time using mode i for jth long distance activity αn the coefficients of total travel costs β the total travel time coefficient 𝜀𝑖𝑗 error term capturing the factors that affect utility 𝑈𝑖 the utility value of person choosing to travel during the time period i, 𝑙𝑜𝑔𝑠𝑢𝑚𝑖 mode choice logsum during time period i 𝛼 mode choice logsum coefficient 𝑋 vector of person’s characteristics 𝐵𝑖 vector of person’s characteristics coefficients for time alternative i 𝑉𝑖𝑘 representative mode utility for the tour by mode k during time i 𝑓(𝑡) the probability that a person will survive beyond the time period t 𝑆(𝑡) discrete time survival function F(t) the failure function giving the probability of the event has occurred by duration t h(t) hazard rate, the probability of an event occurs given that one has survived to that time t hit the probability of an event occurs given that one (i) has survived to the that time (t) i (1, 2,…n) individual t the discrete time αt the baseline hazard function Xit the covariates or explanatory variables of individual i at time t γ, θ, and β coefficients which need to be estimated in tour duration model Cost (Si, D) cost from stop i to tour destination Cost (O, D) cost from tour origin to tour destination xvii Cost (S, D) cost from stop to tour destination Cost (Si, Sj) cost from stop i to stop j Cost (Sj, D) cost from stop j to tour destination Cost (O, S) cost from tour origin to stop 𝑤 is a vector of weights indicating the trust of the modeler on different observed outputs in calibration 𝑂𝑚 a vector representing the observed outputs 𝑂𝑠 a vector representing the simulated outputs 𝜃 the vector of parameters from the model to be calibrated 𝑙, 𝑢 vectors of lower bounds and upper bounds for the parameters 𝐹(𝑍; 𝜃) the link between the simulated outputs and the simulation-based model 𝑍 the inputs required to run the simulation-based model ||.|| the Euclidian norm a, c, α, γ, and A hyper-parameters in SPSA algorithm Gain(S, ai) the information gain of the attribute ai in the data set S Info(S) the information value of the data set S (Lj, Vi) the leaf node Lj in subdivision Vi Info(Lj, Vi) the information value of leaf node Lj in subdivision Vi resulting from the data split on attribute ai 1 Chapter 1: Introduction 1.1 Background The increasing interest of national transportation policies from strategic infrastructure investment to infrastructure operation and management with regard to efficiency, sustainability, and safety has attracted researchers and decision makers to call for advanced and policy-sensitive tools for analysis (Lundqvist & Mattsson, 2001). The increase of national travel also requires the analysis tools beyond the urban and regional level. The highway infrastructure investment, the high-speed rail and the airport development all depend on national travel markets. To ensure that the infrastructure meets the demand growth it is imperative to model and analyze the passenger travel behavior at the national level (FHWA, 2013). Americans travel a lot including inter-city, interregional and international travel. According to the 1995 American Travel Survey on the long-distance travel of persons in the U.S. (Bureau of Transpo rtation Statistics, BTS), the U.S households made over one billion national-level long-distance trips and 41 million international trips (Zhang, et.al, 2012). National long-distance trips in the U.S can be of various purposes including business, leisure, personal business, family or friend visit and so on. All of the long-distance activities could constitute the economic and recreational opportunities that would benefit both the person and the area where the long distance activities occur. Thus, it is essential for the U.S from the economic and social perspectives to have the capability to support high level personal long-distance travel, 2 which requires that we have sufficient data and accurate analysis tools to be able to understand the long-distance travel behavior and forecast the travel patterns in the future. Without the analysis tools we could risk making inefficient and costly investments in our transportation infrastructure and management. The needs for analyzing transportation capital expenditure decisions at the national level in the 1970s led to two U.S. National Transportation Studies (NTS) in 1972 and 1974 respectively (Weiner, 1976). These early national travel studies inventoried existing and planned U.S. transportation systems; and estimated future travel demand, system costs, performance, and broader impacts under alternative funding scenarios. With the completion of major investments on the Interstate Highway System, the development of national-level long-distance passenger travel analysis tools in the U.S. has been stagnant since the 1970s, though there have been continual academic interests in improving the theory and methods for multimodal intercity passenger travel demand analysis with a focus on mode choice ( Lundqvist & Mattsson, 2001; Koppelman & Sethi, 2005; Bhat, 1995; Winston, 1985; Mannering, 1983,). The lack of a capable long-distance passenger travel analysis tool in the U.S. is in sharp contrast with important emerging needs for analyzing various national transportation policies related to long-distance passenger travel. With regard to the national and regional long-distance travel, air, train, bus and auto modes compete each other. So any infrastructure investments or operational and management improvements should be evaluated through a capable national travel analysis tool instead of region-level, corridor-level, or state-level models which are mainly used in 3 these type of analysis. The Obama administration allocated $8 billion in the 2009 stimulus funds for high-speed passenger rail, hoping that the supertrains would operate throughout the American landscape as they do in Europe and Asia (Billitteri, 2013). The U.S federal and state planners are prompted to provide the high speed rail services through selected major corridors. However, lacking a capable national long- distance passenger travel analysis tool in the U.S. has hindered the decision makers’ and politicians’ ability to systematically design and quantitatively evaluate the high speed rail in a broader view. Under this circumstance, it will be desirable to quantitatively forecast the high speed rail demand, systematically design and evaluate the operational effectiveness of the investment. Besides the high speed rail, there are also other national transportation investment strategies in need of the long-distance travel analysis tool to conduct quantitative analysis and evaluation, such as reconstructing and expanding the capacity of the Interstate Highway System, and building the next-generation air transportation system. In addition to these multimodal capacity investment needs for long-distance passenger travel, there are also urgent needs to assess a variety of operational and management strategies at the national level, which could significantly improve transportation efficiency and productivity, support and stimulate economic growth, and produce positive social and environmental impacts. Examples include: (1) Pricing, which could include congestion pricing on the Interstate and National Highway System, toll roads, air and rail fare increase, as well as fuel price increase. The fuel price change can affect the overall transportation system, The air fare and driving cost of auto are more related to the fuel price than the rail fare. (2) Congestion management at airports; (3) Separation 4 of passenger vehicles and heavy trucks on highway facilities; (4) National transportation financing options such as fuel tax increase and mileage fees; and (5) Substitution between long-distance travel and teleconference/telecommuting. In addition to enabling national-level infrastructure investment and operational analysis, a long-distance passenger travel demand model for the U.S. also has the following important benefits: (1) Analyze the impact of socio-demographic, economic, and transportation infrastructure changes on long-distance travel demand. We know that people in the U.S mainly rely on travel for their everyday lives. How much people travel, when and where they go for different purposes, and which mode they take to get the destination is dependent on various factors including household socio-demographic characteristics, land use, transportation infrastructure, and so on (Contrino & McGuckin, 2009). One factor changes would cause people’s travel behavior change, and different factors would weigh differently in determining their travel choice. How these factors influence the travel demand in different areas at the urban level or the metropolitan level has already been studied and can be answered based on the analysis tool - travel demand model at a certain geographic level. The same question arises on people’s long distance travel, which is how the impact factors such as socio-demographic characteristics, economy, and transportation infrastructure changes would influence the long-distance travel demand. Such a question cannot be addressed accurately without a long-distance travel demand model. (2) Anticipate the influence of energy (e.g. fuel price) and environmental factors (e.g. climate change and related regulations) on long-distance passenger travel. The impacts of fuel price changes on the long-distance travel behavior of the 5 population are critical in developing region-level or nation-level transportation policies and operations that can abate negative effects and increase benefits. (3) Improve the long-distance passenger travel module in statewide and even some metropolitan travel demand models and provide an authoritative tool for multi- state transportation corridor analysis. After the Intermodal Surface Transportation Efficiency Act (ISTEA) in 1991 was enacted, many state departments of transportation started to develop statewide travel demand models and use them as critical analysis tools in addressing legislative requirements in statewide planning. However, the statewide models are weak in external trips which are usually generated with information from federal and neighboring states instead of available socioeconomic data (National Travel Demand Forecasting Model Phase I Final Scope). A national long-distance travel demand model therefore can provide external trips for statewide models in base-year and future-year. Meanwhile, it can also generate the travel demand for multistate corridors based on available datasets as well as standard and rigorous procedures which can minimize the duplication and efforts of the statewide models. (4) Support large-scale evacuation planning and operations due to natural disasters or targeted attacks; and (5) Enable micro-level analysis of the spread of pandemic diseases resulting from long-distance travel. 1.2 Objectives and Contributions National long-distance passenger travel demand analysis has been an understudied area in transportation planning. The lack of multimodal long-distance origin-destination data has seriously limited planners’ ability to conduct quantitative 6 analysis for operational effectiveness and infrastructure investment. As the nation and various states engage in funding transportation infrastructure improvements (interstate highway tolling/expansion, high speed rail, next-generation passenger air transportation system relying more on smaller airports and aircrafts) to meet future long-distance passenger travel demand, developing a national Multimodal Travel Analysis System (MTAS) and an American Long Distance Personal Travel Program (LDPT) become the priorities and the fundamental work when planners conduct the national travel analysis. Therefore, this dissertation aims to develop a person-based activity national travel demand model for national travel analysis. All major behavioral dimensions of long-distance travel will be considered. Compare to the traditional four-step approach, activity-based techniques offer several advantages: (1) It is easier to consider tours, multi-day and multi-stop trips, and intermodal access/egress transfers that are important for long-distance travel modeling; (2) Households and persons are the basic units of analysis, which enables detailed behavioral representations and interactions; and (3) It provides a rich framework in which travel is analyzed as a multi-day, monthly, quarterly, or yearly pattern of behavior, derived from activity participation. There are also significant differences between long-distance trips considered in the proposed activity-based model and daily/weekly trips in metropolitan/state-level tour/activity-based models developed in previous research. For instance, the long distance trips usually take days or weeks and may involve car, airplane, train, bus, or multiple modes of the four. It is often the case that households firstly choose travel time for long-distance vacation trips based on time and money budget before 7 selecting destinations and mode. Categorization of trip purposes is also different for long-distance trips. Cost of travel for long-distance trips is not just travel disutility, but also includes lodging, food, and etc. The same applies to the total travel time for long-distance which usually covers not only in-vehicle travel time but also the ingress/egress time, transfer time, and lodge time. The much lower frequency of long- distance travel may also imply a different decision-making process. The dissertation has contributed to both methodology developments from the academic perspective and real world application from the industry perspective. First, this dissertation represents the first attempt to develop an integrated activity-based travel demand model system for individual’s quarterly/yearly long distance or national activities and travel in the U.S at the Metropolitan Statistical Area (MSA)/Non-MSA level which is the highest geographic resolution in the long distance travel survey data. The model system is developed considering the specific attributes of the long distance travel such as low frequency, long activity duration, long activity duration at intermediate stops on the tour legs, different sets of mode alternatives, etc. Therefore, the model system not only takes into account the people’s long distance travel at the tour level, but also at the stop level. Three levels of choice are modeled. The first level is the activity pattern level which determines the number of different types of activities a person will choose during one year; the second level is the tour level which contains choices of tour destination, time of year, tour duration, and tour mode; and the lowest level is the stop level model system including the number, the purpose, and the location of each intermediate stop made during the inbound and outbound legs of the tour. National-level travel survey data are used to 8 estimate the model components and provide the parameters for simulation. The model is implemented in our developed micro-simulation platform which simulates each individual’s yearly long distance activities and travel in the U.S. over the course of one year. Second, high speed rail (HSR) is expected to help alleviate the heavy load of the traffic in road and air corridors and improve the inter-regional accessibility (Börjesson, 2012). However, the construction of HSR requires large quantities of investments and efforts. In order to help decision makers and politicians evaluate and systematically design the HSR, we selected northeast corridors to conduct quantitative analysis and forecast the HSR demand based on our proposed and developed person-based national travel demand model. Meanwhile, the impact of fuel price increase at the national level on the national travel demand is also analyzed to evaluate the sensitivity of the national travel demand model. Third, as the FHWA has planned the next iteration of long distance travel survey in the U.S, advanced technologies such as GPS, smartphone, social media, etc., are explored in the travel survey methods to provide the more accurate temporal- spatial information on travel than traditional surveys. However, the passively collected travel data from these new survey methods does not provide the important travel components such as trip purpose. Therefore, in the dissertation, a trip purpose learning system based on machine learning methods is proposed and tested with valid data. This dissertation represents a more advanced academic research endeavor for national passenger travel analysis. Findings from this dissertation are expected to 9 provide important insight and help guide federal and state to make decisions on corridor-level, region-level, and nation-level infrastructure investment, design, and management, as well as to research on long-distance passenger travel demand. 10 Chapter 2: Literature Review 2.1 Long Distance Travel Demand Modelling Long distance travel analysis methods can be classified into four groups, see Figure 2-1 (Zhang et al., 2012). All the methods in the figure can be used to estimate the multimodal OD matrices. However, they are different in terms of data requirements, and whether travel behavior responses to policy scenarios (Zhang et al., 2012). Figure 2-1: Categorization of Long Distance Travel Demand Analysis Methods 1 During the middle of 20 th century, the U.S started its travel demand modeling at urban and metropolitan level. Since the passage of the Intermodal Surface Transportation Efficiency Act (ISTEA) of 1991, an increasing number of states began to develop their statewide travel demand modelling in order to meet policy and legislative needs (Zhang, et al., 2013). In the US the current long distance travel 1 Source: Zhang, L., Southworth, F., Xiong, C., & Sonnenberg, A. (2012). Methodological Options and Data Sources for the Development of Long-Distance Passenger Travel Demand Models: A Comprehensive Review. Transport Reviews, Vol. 32, No. 4, pp. 399-433. 11 demand models have been developed mainly serving as a component of statewide models, and the majority of the travel demand model developments are traditional or modified 4-step models. As the 4-step planning approach is easy to implement, and requires less data than tour-based or activity-based travel demand models. The United States long distance travel demand models effort reviewed in our research include those in California, Ohio, Oregon, Maryland, Florida, Kentucky, Tennessee, New Hampshire, and etc. The California Statewide Travel Demand Model (CSTDM) is a tour-based travel demand model. It can forecast all types of travel as well as long distance trips made by California residents, plus commercial vehicle travel. The travel modes in the model vary in different sub-models. The CSTDM (ULTRANS & HBA, 2011) adopts statewide networks for roads and transit. Multiple data including the 2012 California Household Travel Survey, 2010 United States Census data, zonal land use data, employment, and population data were employed for model calibration and base year assignment (Bureau Transportation Statistic, 2015). The CSTDM was developed expressly in order to evaluate the proposed high-speed rail (HSR) system connecting Southern and Northern California (Cambridge Systematics et al., 2006). According to the forecast analysis in the year of 2040 when the high speed rail will be operated in the CSTDM study region, there will be 7.1 million high speed rail ridership from San Francisco region to Los Angeles region (Cambridge Systematics, 2016) where the population is forecasted as 47.95 million in 2040 and the distance is about 385 miles. Compared to the predicted high-speed rail ridership from San Francisco to Los Angeles, our scenario analysis (high speed rail) in the northeast corridor based on our 12 developed national passenger travel demand model forecast a total number of 2.6 million trips between Washington D.C and New York, And in the region there will be a total number of 25 million population in 2040 and the distance between D.C and New York is about 226 miles. Ohio statewide travel demand model uses state-of-the- art tour-based modeling approach. It includes person travel for both short distance (less than 50 miles) and long distance (larger than 50 miles). The long distance travel models are developed based on the Ohio DOT’s sponsored long distance travel survey data, and can estimate the frequency and characteristic of long distance travel for assignment on transportation network (Erhardt et al., 2015). Oregon Statewide Integrated Model (SWIM2) is a second generation model and developed based on the First Generation based Statewide Model (SWIM1) and the Eugene-Springfield UrbanSim Model. SWIM2 is an integrated land use transportation model covering the whole state of Oregon. The transportation part of the SWIM2 includes the tour-based models of personal travel and commercial travel, as well as simple model of external truck travel (Parsons Brinckerhoff, 2010). Florida statewide travel demand model (FLSWM) employs the traditional 4-step model methods for the passenger and freight travel in the state of Florida (Giaimo & Schiffer, 2005). Maryland Statewide Transportation Model (MSTM) is a four-step travel demand model, and it is designed to generate link-level assignment for personal and freight travel. In terms of passenger travel, the statewide level models the short-distance or urban personal trips for the residents in the study area. The Regional level model includes a long-distance travel model for both residents and visitors with trips longer than 50 miles one-way. The freight travel module takes into account both short-distance truck travel and long 13 distance truck travel (University of Maryland & Parsons Brinckerhoff, 2011). Kentucky Statewide Travel Demand Model (KYSTM) models the long distance trips (over 100 miles) in Kentucky and part of the neighboring states. The model adopts a modified 4-step travel demand model, removing the mode choice module. The model can be used to evaluate traffic volume and economic impacts of new corridors (Bostrom, 2006). Tennessee Statewide Model was developed only for long distance trips over 75 miles, and it also employed a 3 step travel demand model (no mode choice model), Future year travel demand forecasting is conducted based on MPO forecasts of population and employment growth (Stammer, 2002), New Hampshire Statewide Travel Demand Model is developed to address the needs of New Hampshire Department of Transportation in the areas of statewide planning, congestion management, intermodal management, public transportation, air quality, financing policies, etc. It is a tour-based mircro-simulation framework of sequential multinomial logit models (Sharma et al., 1999). According to Giaimo and Schiffer’s review on statewide travel demand modeling developments, most of the statewide travel demand models in the US do not consider the long distance travel (Giaimo & Schiffer, 2005). Among the statewide models that consider long distance travel, many employ Fratar approach to estimate future year travel demand from base year OD table. Under this method, long distance travel demand has little sensitivity to policy changes. A lot of researchers have extended their efforts on long distance travel demand modeling. Koppelman (1989) developed a behavior framework and model system for intercity travel, and the 1977 National Travel Survey data was used to estimate the model. Li (2004) examined the 14 frequency and mode choice of intercity non-business trip through the developed intercity travel behavior system which uses a nested l ogit/continuous model. Forinash and Koppelman (1993), Bhat (1995), and Lee et al. (2004) studied intercity mode choice through exploring a set of discrete choice models. Baik et al. (2008) developed a travel demand model at the national level which can predict the annual county-to-county personal travel for commercial airline, air taxi, and automobile in the U.S at 1-year interval through 2030. The transportation systems analysis model (TSAM) adopted the trip-based four-step travel demand model process which includes trip generation, trip distribution, mode choice and network assignment. An intercity trip in the system is defined as the one with one- way route distance larger than 100 miles, excluding commute travel. Different from the traditional network assignment in four-step travel demand model, the network assignment in TSAM is developed for the commercial airline and air taxi. The TSAM outputs annual county-to-county person round-trips by travel mode, trip purpose (business and non-business) and household income group. The network assignment then outputs the annual flights between all the commercial and air taxi airports. Cambridge Systematics (2008) conducted a study in order to provide specifications for national travel demand forecasting model development. They proposed a trip- based four-step demand model structure (not an operational model) which focuses on providing travel information for statewide models. Data sources including network and zone system, demographic and employment data, freight data and travel behavior data are assessed prior to preparing the input data. Nostrand et al. (2013) presented a research on national long distance travel demand modeling only for leisure purpose. 15 An annual vacation destination and time choice model is developed, using the Multiple Discrete-Continuous Extreme Value (MDCEV) structure (Bhat, 2005), to estimate the destinations that a household would visit during a year and the time allocated for each of the vacation destinations. The model mainly relied on the 1995 American Travel Survey data for analysis, and a total of 210 zones were divided for the whole nation. The output of the model can be used to construct a national-level OD table for leisure travel. The funded FHWA project Foundational Knowledge to Support a Long-Distance Passenger Travel Demand Modeling Framework (Outwater et.al, 2015) which was conducted at the same time as our work, developed a long- distance passenger travel demand model for the whole U.S, using a disaggregate, tour-based approach. The model component estimations are based on different travel survey including 1995 American Travel Survey, 2012 California Statewide Household Travel Survey long-distance component, 2003 Ohio Household Travel Survey, 2010 Colorado Front Range Travel Survey, and 2001 National Household Travel Survey (NHTS) which span multiple years from 1995 to 2012. They adopted a 4570-zone structure of own developed National Use Model Area (NUMA). The long distance passenger travel demand model mainly used the MDCEV approach. In the model framework, only travel information at the tour level is captured and modeled including tour generation, scheduling, tour duration, travel party size, tour destination choice and mode choice. Travel information at the stop level is not captured. Meanwhile, in Europe, a lot of attention has been paid to national travel demand modeling during the last two decades. In 1997, the UK employed the direct demand and elasticity analysis method in the national road traffic forecast (NRTF) 16 model (Worsley & Harris, 2001) to predict the national level travel demand. Multiple direct demand models were developed and estimated for vehicle ownership/use, level of service (LOS) of transportation network, truck traffic, and traffic flows. With the models elasticity, a hierarchical set of switching rules are defined to evaluate the full impact of policy scenarios on the road network. After then in 2010, the UK Department of Transportation developed a Long Distance Model (LDM) to predict long distance passenger travel demand (trips over 50 miles) (Rohr et al., 2013; Burge et a., 2011). It can be used to examine policies including demand for high-speed rail (HSR) (Burge et al., 2010), and policies that can influence long-distance car, classic rail and air demand. In order to develop the model, multiple data sources such as 2002-06 National Travel Survey (NTS) data, 2004-06 National Rail Travel Survey (NRTS), and 2009 Household Interview (HI) data were employed. The nested logit choice model was used for demand estimation, and network models were developed for highway, rail and air. An Italian national travel demand model was developed and applied to different macroeconomic, transport supply, and HSR study and marketing scenarios (Ben-Akiva et al., 2010). The Italian national travel demand model system is consisted of three sub-models, demand model that can predict the future year OD matrices from the base year, the nested logit mode choice model that estimates the market share of inter-urban travel modes, and induced demand model that predicts the additional HSR demand as a result of improved HSR LOS. The National Model System (NMS) in Netherlands was developed in 1986 and is being updated since then (HCG & TOI, 1990; Gunn, 2001). It has served as a “prototype” disaggregate model in Europe and was built based on behaviorally 17 oriented tour-based method (Gunn, 2001). The model system consists of a series of connected choice models including license holding and car ownership models, tour frequency, tour mode and destination choice, and time-of-day. Total 345 zones were divided in the NMS system and 1302 sub-zones were used in the sub-module of mode and destination choice models. The NMS is sensitive to a variety of socio-economic, land use, transportation systems, and policy factors. Applications were observed in the rail demand prediction for railway options, impact of raising fuel prices, effect of the introduction of motorway signaling, high speed trains demand analysis, and etc (Hofman, 2001). The Norwegian national model system followed the structure of the Dutch NMS, but with the objective of emissions (CO2 and NOx) prediction at the national level (HCG & TOI, 1990). Therefore, no detailed link-loadings on a national network were needed (Gunn, 2001). Sweden started its national model development in the beginning of 1980s, and has improved it to the current new version of SAMPERS which belongs to the mainstream trip-based four-step model (Zhang et al., 2012). The new model covers the trips in Sweden and to neighboring countries in detail and trips to and from other parts of Europe in a coarser way (Widlert, 2001), which results in three different model systems: regional models, domestic long distance models and international models. All models in the three systems, such as car ownership, trip frequency, destination choice, mode choice, and departure time choice, adopted discrete choice logit model, except for an ordinary least squares trip frequency model for foreigners’ travel to Sweden (Beser & Algers, 2001). The Danish PETRA model was developed as an activity-based method to travel demand analysis (Fosgerau, 2002). In the model, a person’s daily travel is represented in terms 18 of chains of tours instead of separate trips or tours. In order to reduce the complexity, the observed chains were transformed to a simplified version of chain types. The model system first deals with the cohort effects on car ownership and license holding (Lundqvist & Mattson, 2001). Then a mode and destination choice model (nested logit model) was estimated for each tour in a chain. Finally, the choice of chain type is modeled, and the accessibility measured by the logsum from the mode and destination model is incorporated in the chain type choice model. Since there is no network assignment module in PETRA, no congestion analysis is considered. A variety of applications have been observed with the model in analyzing the effects of different policy measures on people’s travel behavior. From the perspective of the geography and the travelling population size, many of the European national travel modeling efforts are closely equal to statewide model studies in the U.S (Zhang et al., 2012). The pan-European models include the countries forming the European Union, which to some extent are close to the national model study in the U.S. The pan- European travel demand model developments started with the estimation of multimodal OD matrices (method 4 in Figure 2-1) without behavior framework in the MYSTIC (Peter Davidson Consultancy, 2000) project in the early 1990s and then the DATELINE project of 2004 (Brog et al., 2004; Davidson & Clarke, 2004). The MYSTIC project developed a heuristic harmonization procedure to merge various data sources from seven European countries to obtain pan-European OD matrices. The more recent DATELINE project refined the methodology in the MYSTIC project, and also used the data from a new pan-European long-distance travel survey for 16 countries. Then aggregate methods and joint aggregate-disaggregate methods in pan- 19 European models were employed in the 201-zone NUTS2 STREAM model (Williams, 2001) and the 1275-zone NUTS3 STEMM model (Gaudry, 2001) respectively. Finally, the most recent pan-European travel demand model is the disaggregate TRANS-TOOLS model integrating European transportation and economic models (Zhang et al., 2012; Burgess et al.,2006). In addition to the applications of the national travel demand model in the transportation field, we also found that it is used in other areas. Epstein, Parker, and et.al (Epstein et al., 2008) developed an agent-based microsimulation model for intercity travel in a research on spatial-temporal epidemic dynamics. The model simulates individual’s travel decisions on trip frequency and destination choice based on a zip-code-level OD system. Since travel demand analysis is not a focus of the study, no mode choice or assignment models were employed. Even so, this study demonstrated the benefit of long distance model in other fields in addition to the transportation area. 2.2 Long Distance Travel Survey Data The most recent survey of long distance passenger travel, the American Travel Survey, was conducted in 1995. The 2001 National Household Travel Survey has a long-distance travel component but with a relatively small sample. An up-to-date long distance travel survey is necessary and provides the latest travel data needed in statewide or national travel demand modelling which the state and federal governments need to meet their policy requirements and legislative development needs and to predict future travel demand (Cohen et al., 2008; Giaimo & Schiffer, 20 2005; Horowitz, 2006; Horowitz, 2008; Souleyrette et al., 1996). With the rapid development of technology, GPS, smartphone, social media, etc. become researchers’ and governments’ new tool to supplement or replace traditional survey methods in long distance travel data collection. However, the advanced technology-based method cannot provide all the necessary long distance trip information such as travel mode, and trip purpose. Thus, the practical post-processing methods that can generate data on these missing travel characteristics are essential in a GPS/smartphone/social media-based survey. A number of travel researchers are exploring GPS-based travel survey methods, and as a result, have developed different methods to derive trip purpose from collected trip data, mainly in the area of regular intra-regional travel. Wolf et al. (2001a, 2001b) pioneered the procedures of trip purpose derivation that are based on a set of deterministic rules. By collecting data from a sample of 19 respondents in Atlanta GA who utilized a GPS data logger and returned a completed paper trip diary, and by using data from a detailed GIS database of land use, they found that mixed-use land use parcels such as shopping center, office building and strip mall challenges the accurate trip purposes derivation. Besides the GIS land use data, respondent’s socio- economic characteristics such as household composition, possession of travel modes, and home and work addresses can contribute to trip purpose derivation as well. The procedure was further developed by Schönfelder et al. (Axhausen et al., 2003; Schönfelder et al., 2002) in Europe who used a multi-stage hierarchical matching procedure that involved calculating a cluster center of stop ends. This was done by combining trip ends, identifying trips with obvious purposes and establishing 21 relationships between trip purposes and activity temporal information as well as the socio-demographics of the respondents. Stopher et al. (Stopher et al., 2005; Stopher et al., 2008a; Stopher et al 2008b) established a set of heuristic rules to derive trip purpose for 43 trips collected in Sydney with the parcel-level land use data and the geo-coded addresses of respondents’ workplace/school and the two most frequently used grocery stores. Bohte et al. (2009) developed a GPS-based travel data collection method utilizing GPS devices, GIS technology and a web-based validation procedure. Then, based on the collected GPS data, they derived the trip purposes with the developed heuristic rules. Chen et al. (2010) employed the same approach as Schönfelder’s to cluster trip ends into activity locations. Grounded on the GIS data and respondent’s socio-demographic characteristic, deterministic rules were used to classify trip purposes for trips occurred in low-density area. For trips in high-density areas, trip purposes could not be deterministically identified, and a multinomial logit model was employed to calculate the probability that a trip served a particular purpose, where only four trip purpose categories were considered. The method of deriving trip purpose based on GPS-based data was further explored with the help of artificial intelligence and machine learning. Griffin and Huang (2005) employed the decision tree method to derive trip purposes, in which the procedure was implemented in the C4.5 environment with 50 randomly generated trips which were simulated based on a series of assumptions. Alternatively, Deng and Ji(2010) used a set of attributes (attributes from GPS data, such as time stamp, spatial- temporal indices of trips and attributes from GIS data, and the social-demographic and socio-economic characteristics of respondents) to construct a decision tree in 22 travel models and trip purposes derivation. The decision tree was implemented in the C5.0 environment with a homogenous set of 226 GPS trips collected from 36 respondents in Shanghai. Lu et al. (2013) explored the feasibility of automating trip purpose imputation using multiple machine learning methods (decision tree, Support Vector Machine, and Metalearner) with the help of geospatial location data, land use data, and the GPS-based survey conducted by University of Minnesota. A heterogeneous sample of 2238 trip records with a 7-trip purpose categorization scheme is utilized. 23 Chapter 3: Data The data employed in the dissertation contains two parts: one for model estimation and the other for the base year model simulation. The main component data source for model estimation is the travel survey data, besides the transportation OD skim data and economic/demographic information of traffic analysis zones (TAZs) which are also the input in the model simulation part. The other input data for model simulation is the population data which contains individual and household characteristics such as age, gender, employment status, household type, household income, etc. 3.1 Zone System The traffic analysis zone in our national travel demand model system is Metropolitan Statistical Area (MSA) and non-MSA that is the remaining area of a state not belonging to an MSA. Therefore, the number of the non-MSA is equal to the number of states in the U.S. Although the non-MSA usually has larger area than MSA and it will be desirable to divide the non-MSA into smaller zones or to use county or city as the zone in the national travel demand model, the finest geographic resolution in the ATS data which we mainly relied on for model estimation is MSA and non- MSA. The United States include 3202 counties and the MSA/Non-MSA is consisted of multiple counties or cities. Aggregating the counties or cities can give us the MSA and Non-MSA. 24 Figure 3-1 shows the traffic analysis zone (TAZ) system in the national travel demand model. There are a total of 380 zones (excluding Puerto Rico) covering the mainland of the United States, Alaska, and Hawaii. In the 1995 ATS sample data, there are a total of 208 zones (161 MSAs and 47 non-MSAs) used for model estimation. Figure 3-1: National Travel Demand Model Traffic Analysis Zone System 3.2 Travel Survey Data The 1995 American Travel Survey is the primary source of the national travel survey data used in the research for model estimation. It is a long distance nationwide travel survey in the United States, and was conducted by the Bureau of Transportation Statistics (BTS) between April 1995 and March 1996. The 1995 ATS data collected 25 detailed long distance travel (>100 miles) information and demographic information from more than 80,000 random selected households from the 50 states and the District of Columbia, and each household was interviewed every three months during the survey period. “One of the ATS’s main objectives is to comply with two requirements of the Intermodal Surface Transportation Efficiency Act of 1991 (ISTEA): (1) to provide information on the number of people carried in intermodal transportation by relevant classification, and (2) to provide information on patterns of movement of people carried in intermodal transportation by relevant classification in terms of origin and destination." (Hwang & Rollow, 2000). Undeniably, the 1995 ATS is a little old, but it is the most recent dataset in long distance travel over the course of one year in the U.S, and it incorporates the information on stops during the tour legs. Although the household travel survey data, 2001 National Household Travel Survey (NHTS), has a long-distance travel component, it is a relatively small sample and has less long distance information than the 1995 ATS. The 1995 ATS provides comprehensive information. To be specific, the gathered demographic information in the survey includes the characteristics of both the household (family type, household income, household size and etc.) and the household members (gender, age, employment status, race and etc.). The detailed long distance travel information contains the origin and the destination of the trip, stops along the way from/ to the destination, side trips originating at the destination, the means of transportation, the reason of the trip, the lodging type, the number of nights spent away from home, travel party size, and etc. All the location information was recorded at regional, state, and metropolitan level. Such detailed information 26 cannot be found in other U.S. national travel surveys and makes the 1995 ATS a useful source of long distance travel survey data. The 1995 ATS database includes four data sets: household trips, household characteristics, personal trips, and personal characteristics, of which the latter three datasets were utilized in this research. The personal trip data included 556,026 trip records for both domestic and abroad long distance trips, and a total of 45374 long distance trips were made by people older than 18 years old and dozens of travel modes were recorded for the entire travel (inbound trip and outbound trip), and 438022 of the 45374 trips were made domestic. In the dissertation, only adults’ domestic long distance trip will be covered. Based on the characteristics of the detailed travel modes, 18 classes of the travel modes were aggregated into 6 classes including car, air, bus, train, recreational vehicle (RV) and others. The percentage of the long distance tour travel modes in the sample can be plotted (Figure 3-2), where Others refers to the combinations of multiple travel modes and other modes such as motorcycle and bicycle. It can be observed that above 80% of the long distance activities were made by car for the entire tour, which has the highest share of the travel mode usage. The second most popular travel mode is the air which accounts for 15.6%. From the 11 activity types in 1995 ATS, three main activity types (business, personal business, pleasure) were aggregated to reduce the model computation complexity (Table 3-1) based on the activity similarities. Figure 3-3 illustrates the percentage of the aggregated three trip purposes. Almost 60% of the long distance trips are made for pleasure, and 25.8% and 15.6% of 27 the long distance activities are made for business and personal business related respectively. Figure 3-2: Percentages of Long Distance Tour Travel Modes Reported Trip Purpose Encoded Trip Purpose Business Business Combined Business/Pleasure (B/P) Business Convention, Conference, or Seminar Business School-related activity Personal Business Visit relatives or friends Pleasure Rest or relaxation Pleasure Sightseeing, or to visit a historic/scenic attraction Pleasure Outdoor recreation Pleasure Entertainment Pleasure Shopping Pleasure Personal, family or medical Personal Business Others Deleted Table 3-1: Encoding Reported Trip Purposes 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% Car-Car Air-Air Bus-Bus Rv-Rv Train-Train Others 28 Figure 3-3: Percentages of Long Distance Trip Purposes When people make long distance travel for different purposes, they usually have different tendency for transportation means. For example, people travelling for business purpose are usually more likely to take air mode than traveling for personal business and pleasure purposes, while people tend to make personal business and pleasure trips by car. As expected, compared to pleasure and personal business trips, people preferred choosing air for their business and personal business travel (Figure 3-4). Up to 47% and 43% of air trips were for business and personal business, and only 10% were pleasure trips. Among people travelling by car, 61% of them made pleasure trips, and only 22% and 17% of people traveled for business and personal business. With regard to people travelling by bus, 73% of them traveled for pleasure purpose. Among all the long distance trips by train, pleasure trips had the largest share (53%) and personal business trips have the least share (only 8%). 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% Business Personal Business Pleasure 29 Figure 3-4: Travel Mode Usage by Trip Purpose It is observed from 1995 ATS data (Figure 3-5) that the largest share (28%) of the long distance trips occurred during the third quarter (from July to September), whereas the fewest people (20.6%) chose to make long distance travel during the fourth quarter (from October to December). The third quarter was also the peak time period to make pleasure travel (up to 30.1%). Meanwhile, the largest proportion of the business trips (29.9%) occurred during the first quarter (from January to March), and from the first quarter to the fourth quarter the share of the business trips decreased. The second quarter (from April to June) is the most popular time period for people to make long distance travel for personal business (29.7%), and then followed by the third quarter. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Car Air Bus Train Personal Business Pleasure Business 30 Figure 3-5: Trip Distribution by Time of Year Figure 3-6: Trip Distribution by Purpose and Time of Year When people make long distance travel, they usually would make stops during the half legs of the tour. The 1995 ATS data records the stop information in terms of ‘Number of Stops’ and ‘Reason for Each Stop’ for each long-distance trip. In this research, the stop is redefined as the one that people make during the tour legs for a 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 1 2 3 4 Quarter of Year 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 1 2 3 4 Business Pleasure Personal Business Quarter of Year 31 certain purpose (e.g. business, personal business, and pleasure). The stops for rest and transfer in the same travel mode or across the modes will be eliminated from the study. Figure 3-7 illustrates the distribution of the number of stops in both outbound and inbound legs of the long distance tour. It can be found out that most people (above 95%) did not stop during their long distance travel, and only few people would make one stop (0.2% or 0.1%) or four stops (0.3% or 0.3%) during either inbound or outbound trip of the long distance tour. The distribution of the stop purposes shown in Figure 3-8 indicates that most of people make stops for pleasure purpose either during the inbound or outbound leg of the long distance tour. And if they decide to stop for pleasure, more people would choose to stop on the way back home rather than on the way to the primary tour destination. The smallest share of the stops made during either inbound or outbound leg was for personal business. Figure 3-7: The number of Inbound/Outbound Stop Distribution 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 0 1 2 3 4 Outbound Stop Inbound Stop Number of Stops 32 Figure 3-8: Outbound/Inbound Stop Purpose Distribution 3.3 Transportation OD Skim and Economic/Demographic Data According to the percentages of the travel modes and the difficulty in accessing the travel data about the bus and the RV, only three travel modes (car-car, air-air, and train-train) were considered for the entire tour of the long distance activities. The OD skim data or level of service variables primarily refer to the travel time and travel cost between each origin and destination pair via different travel modes, and it can be observed as the functions of distance between TAZs. Usually an MSA or a non-MSA is made up of more than one county or city, so the distance between two TAZs can be estimated by averaging the distance between all the county-county (county-city and city-city) pairs in each zone (Equation 3.1). 𝐷𝑖𝑗 = ∑ 𝑑𝑚𝑛𝑚,𝑛 𝑚×𝑛 (3.1) where i, j refer to the zone of the MSA or non-MSA; m, n indicate the number of the county in zone i and zone j, respectively; 𝑑𝑚𝑛 is the distance between county or 0 1000 2000 3000 4000 5000 6000 7000 Business Personal Business Pleasure Outbound Stop Inbound Stop N u m b er o f S to p s 33 city m in zone i and county or city n in zone j. The Census Bureau provides the geographic information of each county/city in the U.S., which assists us to estimate the great circle distance 2 between each zone pair. Auto travel time and cost were derived as a series of functions with the information of the great circle distance of a zone pair, the average driving speed, the vehicle’s characteristics, and etc. The cost of the vehicle driving usually contains the fuel cost, the insurance, the maintenance and the tire costs. And among them only the fuel cost is out-of-pocket expense for the trip, while the other costs are paid separately after the trip. Therefore, only fuel cost is considered for the vehicle driving cost during the long distance travel. Several assumptions are made in order to estimate the auto travel time and cost: 1) the average auto speed is 65 miles/hour; 2) the auto travel time/cost consists of two parts spent on driving and lodging; 3) people on business travel will stop for an overnight stay every 9 hours, while people taking personal business and pleasure travel will stop every 13 hours; 4) the auto average fuel efficiency is 19.7 mpg (Grush, 1998) and the average retail fuel price is $1.48/gallon in the U.S. in 1995 (EIA, 2016); 5) the average lodge cost per person night for business travel from the low-income to the high-income are $70, $90, and $110 respectively, while the lodge cost for personal business and pleasure are $30, $50, and $70 respectively; 6) the travel party size is 1 person for business travel, and 2 persons for personal business and pleasure trip, which helps to estimate the vehicle driving cost for each person. 2 The great circle distance: the shortest distance between two points on the surface of a sphere. Distance= 6371*acos(sin(latitue1)*sin(latitue2) + cos(latitue1)*cos(latitue2)*cos(longitude2- longitude1) ), where 1 and 2 refer to the point 1 and point 2 on the surface of a sphere. 34 Air fare and the number of layover were collected from the Airline Origin and Destination Survey (DB1B) data provided by the Bureau of Transportation Statistics, Research and Innovation Technology Administration (RITA). DB1B is a 10% sample of airline tickets from reporting carriers; therefore, in order to obtain a sample size as large as possible the DB1B data from 1994 to 1996 was employed. The air travel time is made up of access/egress time (time spent in traveling to the airport and from the airport to the final destination), air fly time, and transfer wait time between flights. Air fly time is estimated with the obtained great circle distance and the average flight speed which is assumed as 500 mph (Boeing, 2011). The average total access and egress time is assumed as 2 hours for all the air travel, and the average wait time per transfer is set as 1.5 hours. The number of layover is obtained based on the airport groups in the DB1B data which lists the airport codes of all the airports in the flight itinerary. Since the airport code is of three characters and the airports in the airport group variable are separated by colons, the number of layovers can be obtained according to the number of the characters in the variable. By multiplying the average wait time per transfer and the number of layovers, we can acquire the total transfer wait time in the itinerary between TAZs. Air fare was taken from the DB1B data, and we eliminated the first/business class fare to reduce the travel cost variance and to be in line with the fact that the majority of the travelers choose the economic class for their air travel. A MSA or non-MSA may have more than one airport from the geographic perspective, therefore, the air fare and time between TAZs should be the average value between all the airport pairs in the corresponding zones. 35 It is hard to get the train fare and time data in 1995 or neighboring years. Therefore, we collected the train fare and travel time in August, 2013 from Amtrak, a national railroad passenger corporation, as a proxy. The Amtrak website provides an access to look up station-to-station timetable and ticket information. The Amtrak train ticket has several classes including saver, value, flexible, and premium. Generally, saver class ticket charges the least, flexible or premium class ticket charges the most, and the price of the value class ticket is at the middle level of all the classes’ fares. While collecting the train fare for our study, we chose the value class price and then converted the fare to 1995 dollar according to the Consumer Price Index (CPI). The travel time from Amtrak contains both the train travel time and the transfer waiting time from origin station to destination station. In economically developed regions, a TAZ may have multiple rail stations. Under this circumstance, the TAZ-to-TAZ train fare and time is achieved by aggregating the fare and time between all station pairs from the two zones. Data for the zones’ attractiveness indexes in this research mainly include the total population, the number of employment by industry sector, and the number of households. These economic and demographic data for each MSA and non-MSA was obtained from the Complete Economic and Demographic Data Source (CEDDS) by Woods & Poole Economics. This database offered historical, current and projected socioeconomic indicators (e.g. population, employment, households, etc.) for all the regions, states, statistical areas and counties in the U.S. 36 3.4 Public Use Microdata Sample Data The input data for the base year model is the 2010 1-year ACS Public Use Microdata Sample data (PUMS) that represents about 1 percent of the total U.S. population or approximately 1.3 million housing unit records and about 3 million person records (Census Bureau, 2008). Detailed person and household information were stored in person record and housing unit record in one PUMS file for each state. Each record had a unique identifier linking the people to the proper housing unit record. The specific geographic unit was defined as Public Use Microdata Areas (PUMAs) in 2000 Census PUMS, and each PUMA contained a minimum population threshold of 100,000. The housing unit record in PUMS file contains detailed household information such as home ownership, real estate taxes, number of vehicles, number of persons in the household, household type, household unit weight, presence and age of own children, PUMA code, state code, household income, MSAPMSA code, and etc. The person record includes person’s information like age, gender, race, marital status, education attainment, school enrollment, employment status, means of transportation to work, travel time to work, class of worker, personal income, person weight and etc. The weight in PUMS file for each person and housing unit can be used to expand the sample to the relevant total. Since each state is consisted of one or more PUMAs and some large metropolitan areas may be divided into several PUMAs, a PUMA could contain parts of multiple TAZs. Geographic equivalency between PUMA and MSA/non-MSA needs to be identified prior to the data being employed. All the persons or housing units located in a PUMA containing mixed TAZs should be allocated to each TAZ according to the population percentage of the TAZs in the 37 specific PUMA. Table 3-2 shows part of the PUMA and the MSA/Non-MSA correspondence. POP in the table indicates the total population in the MSA/Non- MSA. It can be told that the PUMA 800 in state 1 contains two TAZs (MSA/Non- MSAs) which are 1000 and 0199. According to each TAZ’s (MSA/Non-MSA) population percentage in PUMA 800, we can assign 62% of the PUMA 800 population to TAZ 1000 and 38% population to TAZ 0199. STFIPS PUMA CountyFIPS MSA/NonMSA POP pumapop percentage 1 100 33 2650 285900 285900 1 1 200 89 3440 315904 315904 1 1 800 9 1000 231532 372958 62% 1 800 127 0199 141426 372958 38% Table 3-2: Part of PUMA and MSA/Non-MSA Correspondence Table 38 Chapter 4: Model System Analysis Framework The long distance trip in our model system is defined as the ones greater than or equal to 50 miles one way. For each long distance activity, there is only one tour destination or primary destination, and during each leg of the tour, there could be multiple intermediate stops. People can also travel or make side trips (stops) based on their primary destination (Figure 4-1). Due to the data limitation of the side trips, our national travel demand model system will not cover the part of people’s side trips or stops. Figure 4-1: Long distance travel illustration Compared to short distance travel or urban travel (less than 50 miles), long distance travel has its own characteristics of longer travel distance, low frequency and longer duration usually in days (see Figure 4-1). Usually, people do not take any or take several long distance trips during one year, and it is unlikely that people take long distance travel every day. The commute trip longer than 50 miles one way is not taken into account in this research. Also, the trip purposes for long distance travel are different from the purposes of short distance travel which mainly include work/school, 39 social/recreation, shopping, pickup/drop-off, meal, and errands/personal/family business. The long distance trip purposes are mainly business, pleasure and personal business. Compared to short distance travel, long distance travel is unlikely to be made by non-motor travel modes including walk and bike. And it is also impossible for people to take air for the short distance travel, which is a main travel mode for long distance travel. Long Distance Travel Short Distance Travel Distance >= 50 miles <50 miles Frequency Low frequency, multiple trips per year Daily trips, multiple trips per day Activity Duration Days Hours, Minutes Trips Connection Usually large time gap between multiple long distance trips or activities during one year Tight connection between trips and activities Travel Modes Mainly car, air, train, bus Mainly car, transit, non- motor mode (bicycle, walk) Travel Purposes Business, pleasure, personal/family business Work/school, social/recreation, shopping, pickup/drop-off, meal, errands/personal/family business Table 4-1: Comparison between long distance and short distance travel The activity-based national travel demand model we developed can generate the long distance passenger trips made by auto, air, and train in the U.S. in one year period. It can serve as a forecasting tool of long distance travel in the U.S. The model system has root in econometric model developments including discrete choice model and duration model. These models are employed to guarantee the maximum behavior realism and model sensitivity to regional and national projects and policies. The model is implemented in a micro-simulation framework which simulates the long 40 distance travel for each adult in the U.S. Since the finest spatial resolution in 1995 ATS data is metropolitan statistical area and non-metropolitan statistical area, we adopted MSA and Non-MSA as our traffic analysis zone system. The model system consists of three tiers, see Figure 4-2, 1) the yearly long distance activity pattern level which estimates the number of different types of activities a person will choose during one year; 2) the tour level model system which contains choices of tour destination, time of year, tour duration, and tour mode; 3) the stop level model system which estimates the intermediate stop frequency, the purpose and the location of each stop made during the inbound and outbound legs of the tour. Yearly Activity Pattern Tour Level Choice Stop Level Choice Figure 4-2: Activity-Based Long Distance Travel Demand Model System 4.1 Activity Pattern Level Model The demand for long distance activities and travel can be considered as a choice among all the possible bundles of activities and travel annually. The model system adopts a timeframe of one year because of less frequent long distance travel , and days, weeks and even months of activity duration. Dissimilar to regular urban- level activities schedule, people choose their long distance activities with few 41 interactions within one year due to much less frequent long distance travel. As shown in Figure 4-3, the yearly long distance activity schedule can be presented as a set of different long distance activities per year, and all the long distance activities at this level are the primary activities. So the yearly long distance activity pattern can be presented as {B-x, PB-y, P-z}, where B, PB, and P stand for the activity type of business, personal business and pleasure respectively, and x, y, z are integers (x, y, z ≥ 0) referring to the number of the corresponding activities during one year. In the model, the Multiple Classification Analysis (MCA) method which is a mostly-used trip generation method in traditional four-step travel demand model is employed to estimate the long distance trip rates by activity type. The long distance trip rates for each purpose are shown in Table 4-2, Table 4-3, and Table 4-4. No. of long distance activities No. of LD Business Activities No. of LD Personal Business Activities No. of LD Pleasure Activities Figure 4-3: Yearly Long Distance Activity Pattern Level Low Income Medium Income High Income Male Female Male Female Male Female Employed MSA 1.478 0.407 2.051 0.980 2.860 1.789 Non-MSA 1.731 0.660 2.304 1.233 3.113 2.042 Un-Employed MSA 0.433 0.000 1.006 0.000 1.814 0.743 Non-MSA 0.685 0.000 1.258 0.187 2.067 0.996 42 School MSA 0.293 0.000 0.866 0.000 1.675 0.604 Non-MSA 0.546 0.000 1.119 0.048 1.928 0.857 Table 4-2: Trip Rate for Long Distance Business Travel Low Income Medium Income High Income Age1 Age2 Age3 Age1 Age2 Age3 Age1 Age2 Age3 Couple w/o Children MSA 2.223 2.567 2.564 2.766 3.110 3.108 3.267 3.610 3.608 Non-MSA 2.521 2.864 2.862 3.064 3.408 3.405 3.564 3.908 3.905 Couple w Children MSA 1.970 2.314 2.312 2.514 2.857 2.855 3.014 3.357 3.355 Non-MSA 2.268 2.611 2.609 2.811 3.155 3.152 3.311 3.655 3.652 Single MSA 2.070 2.414 2.412 2.614 2.957 2.955 3.114 3.457 3.455 Non-MSA 2.368 2.711 2.709 2.911 3.255 3.252 3.411 3.755 3.752 Non-Family MSA 2.106 2.450 2.447 2.649 2.993 2.990 3.149 3.493 3.491 Non-MSA 2.403 2.747 2.745 2.947 3.290 3.288 3.447 3.790 3.788 * Age1: age between19 and 35; Age2: age between 36 and 55; Age 3: age larger than 55 Table 4-3: Trip Rates for Long Distance Pleasure Travel Age 1 Age 2 Age 3 Employed Un- employed School Employed Un- employed School Employed Un- employed School Couple w/o Children MSA 0.385 0.643 0.755 0.627 0.884 0.997 0.725 0.982 1.095 Non- MSA 0.866 1.124 1.236 1.108 1.365 1.478 1.206 1.463 1.576 Couple w Children MSA 0.243 0.500 0.613 0.485 0.742 0.855 0.582 0.840 0.952 Non- MSA 0.724 0.981 1.094 0.966 1.223 1.336 1.063 1.321 1.433 Single MSA 0.146 0.403 0.516 0.387 0.645 0.757 0.485 0.743 0.855 Non- MSA 0.627 0.884 0.997 0.868 1.126 1.238 0.966 1.224 1.336 43 Non- Family MSA 0.000 0.219 0.331 0.203 0.460 0.573 0.301 0.558 0.671 Non- MSA 0.442 0.700 0.812 0.684 0.941 1.054 0.782 1.039 1.152 * Age1: age between19 and 35; Age2: age between 36 and 55; Age 3: age larger than 55 Table 4-4: Trip Rates for Long Distance Personal Business Travel 4.2 Tour Level Structure Each long distance activity schedule has a primary tour, and may have zero or more intermediate stops or side stops during the legs of the tour and at the destination. In our model system, the secondary tours or the side stops that occur based on the long distance primary destination are ignored due to the data limitation and its coverage of urban- or metropolitan-level travel. The tour level model system defines the characteristics of the primary tour of each long distance activity such as the tour destination, time of year, tour duration, travel party size and tour travel mode. When we develop and estimate each model component at the tour level, it is assumed that the outcomes of the upper-level model, the household, person characteristics and mobility attributes are already known. So the solid arrow in Figure 4-4 indicates that the output of the upper level can be used as an explanatory variable at the lower level, while the dash arrow means that the expected utility of the lower-level models can affect the choices at the upper level. Meanwhile, the upper level decisions of tour duration and destination will constrain the travel mode choice at the lower level; for instance, if people from Washington, D.C., only have one day to travel to California and get back, it is unlikely that he/she will drive. In reality, the proportion of the long distance tour travel time in total duration varies by person. As the 1995 ATS data has limitation on the activity duration information at the primary destination, we 44 simplified the temporal constraint and made assumption that for each person the total tour travel time should not exceed half of the total tour duration. When people decide to make a long distance activity, they usually have different priority considerations and decision procedures for different activity types. In the research, we made a set of assumptions about people’s decision making process of long distance travel at the tour level. For example, the long distance pleasure activity (a discretionary activity) requires people to consider their time availability prior to other decisions. When they have a period of time (days, weeks or months) for pleasure, they will decide when to spend it, where to go, whom to go with and how to go sequentially. In contrast, people taking long distance business and personal business activities usually give priority to the decisions of the activity location and the time (including time of year and duration), followed by travel party size and tour mode choice. Therefore, two different tour level structures are proposed for business/personal business and pleasure (Figure 4-4). According to the direction of the dash lines in the figure, both time of year models and destination models should include the expected utility variable from the mode choice model (mode choice logsum). 45 Destination Choice Time of Year Tour Duration Tour Mode Choice Business & Personal Business Travel Party Size Tour Duration Time of Year Destination Choice Tour Mode Choice Pleasure Travel Party Size Figure 4-4: Tour Level Procedure and Model Components All the model components are estimated mainly based on the 1995 ATS data, and most of the models are estimated using discrete choice model (multinomial logit model) except the tour duration model. In our research, we use 80 percent of the data sample to conduct the model estimation, while the rest of 20 percent of the data sample to validate the estimated model. 4.2.1 Travel Mode Choice Model Three travel modes are modeled at the tour level, i.e. {(car, car), (air, air), (train, train)}, and no combination of different travel modes is considered due to the small sample size in the ATS data (See Figure 3-2). Multinomial logit model is employed to develop the travel mode choice model, and a piecewise linear utility function (Ben-Akiva & Lerman, 1985) shown in Equation (4.1) is adopted. 46 Uij = α1 ∙ tcij.r1 + α2 ∙ tcij.r2 + … + αn ∙ tcij.rn + β ∙ ttij + 𝜀𝑖𝑗 (4.1) Uij: the utility value of person choosing travel mode i for long distance activity j between a specific OD; i: refers to the three tour level travel modes, {(car, car), (air, air), (train, train)}; j: one of the three long distance activities (business, personal business, and pleasure); tcij.r1, tcij.r2, … . , tcij.rn: total travel cost of mode i for jth long distance activity when travel cost falls into the range of r1, r2, …., rn; ttij: total travel time using mode i for jth long distance activity. α1, α2, … . , αn: the coefficients of total travel cost for different travel cost ranges; β: the total travel time coefficient; 𝜀𝑖𝑗:error term capturing the factors that affect utility, but are not observable by the researcher In order to study the high-speed rail, air travel behavior (Hess et al. 2007), and some other travel behavior with choice situations not yet revealed in the market, stated preference data is suggested in the mode choice. However, for the forecasting purpose, as there exists gap between the interviewees’ stated preference/response and their actual preference/response (Wardman, 1988), the error in the model makes it unsupported for the analysis (Daly and Rohr, 1998). Table 4-5 illustrates the estimation results for the tour mode choice. And all the variables have expected signs and are significant at the 95% confidence level. The coefficient estimation implies that as the cost of the travel mode people choose increases, the value of time will increase correspondingly. And in general, people 47 taking long distance pleasure trip or personal business trip have larger VOT than taking business trip. This can be explained by the fact that people going on a business trip can get reimbursement for both their travel and accommodation expense, while they need to pay for their pleasure and personal business travel out of the pocket. Variables Business Pleasure Personal Business TTC(<=$188) -0.0325 -0.0095 -0.0127 TTC (>$188 & <=$332) -0.0093 TTC (>$332 & <=$476) -0.0066 TTC(>$476 & <= $620) -0.0037 TTC ( >$620) -0.0028 TTC (>$188 & <=$312) -0.0043 -0.0057 TTC (>$312 & <=$436) -0.0009 -0.0040 TTC (>$436) -0.0003 TTC (>$436 & <=$560 ) -0.0028 TTC (>$560) -0.0011 TTT -0.0356 -0.0590 -0.0328 Constant-Air -0.4400 -2.9500 -1.4900 Constant-Train -2.9300 -3.5600 -3.7500 Rho-Square 0.536 0.754 0.682 * TTC: Total Travel Cost TTT: Total Travel Time Bold and Italic: variables are significant at 95% confidence interval Table 4-5: Tour Mode Choice Model Estimation Results Figure 4-5, Figure 4-6, and Figure 4-7 present the aggregate share of the observed tour travel mode choice and the estimated one for business, pleasure, and personal business purposes. Results show that the mode choice models can estimate the mode choice market share in the right trend but with errors. Compared to business and personal business mode choice models, pleasure mode choice model has a 48 weaker performance. The air mode for pleasure purposes is under estimated by almost 40%, and the large percentage is also caused by the small sample size of the air pleasure travel in the validation data set. Figure 4-5: Tour Mode Choice Validation for Business Purpose Figure 4-6: Tour Mode Choice Validation for Pleasure Purpose 0 200 400 600 800 1000 1200 1400 Car Air Train Observed Estimated N u m b er o f O b se r v a ti o n s 0 1000 2000 3000 4000 5000 6000 7000 Car Air Train Observed Estimated N u m b er o f O b se r v a ti o n s 49 Figure 4-7: Tour Mode Choice Validation for Personal Business Purpose 4.2.2 Time of Year Choice Due to the fact that the finest temporal resolution in ATS is quarter, our proposed model system will function at a time resolution of three-month or one- quarter. The three-month increments begin in January and end in December, thus four quarters in total. In the ATS sample data, few records are observed that depart from and arrive at home across quarters. Consequently, we adopted a choice set of only four alternatives {(Q1, Q1), (Q2, Q2), (Q3, Q3), (Q4, Q4)} for each person when he/she decides what time of the year to travel and what time of the year to get back, where Q1, …., Q4 refer to Quarter 1 to Quarter 4. Multinomial logit model is adopted for time of year choice model, and the model employs the person and the zonal characteristics most of which are generic across the four time alternatives. Since transportation network LOS attributes vary by time periods especially the air fare with large price fluctuations in different seasons, these variables are specified as 0 200 400 600 800 1000 1200 1400 1600 1800 Car Air Train Observed Estimated N u m b er o f O b se r v a ti o n s 50 alternative-specific based on the four time alternatives. The general form of the TOY model utility can be represented in Equation (4.2) 𝑈𝑖 = 𝛼 ∙ 𝑙𝑜𝑔𝑠𝑢𝑚𝑖 + 𝐵𝑖 ∙ 𝑋 + 𝜀𝑖 (4.2) 𝑙𝑜𝑔𝑠𝑢𝑚𝑖 = 𝑙𝑛 ∑ 𝑒 𝑉𝑖𝑘 𝑘 where 𝑈𝑖: the utility value of person choosing to travel during the time period i, where i=1, 2, 3, 4, refering to (Q1, Q1), (Q2, Q2), (Q3, Q3), (Q4, Q4); 𝑙𝑜𝑔𝑠𝑢𝑚𝑖: mode choice logsum during time period i, and it represents the total ease of travel between two TAZs across all available travel modes during time period i; 𝛼: mode choice logsum coefficient; 𝑋: vector of person’s characteristics; 𝐵𝑖: vector of person’s characteristics coefficients for time alternative i; 𝑉𝑖𝑘: representative mode utility for the tour by mode k during time i; At the tour level for pleasure long distance activity, a simplified time of year choice model which only takes into account the person’s characteristics is firstly developed and applied when the pleasure destination is not known. The time period assigned from this simplified model will serve as an input or known attributes for the destination choice model. Once the destination is chosen, a full time of year model considering the mode choice logsum is re-run to choose the final time period (Figure 4-8). 51 Tour Duration Simple Time of Year Destination Choice Tour Mode Choice Pleasure Full Time of Year Travel Party Size Figure 4-8: Re-simulating Time of Year Choice Model The following three tables (from Table 4-6 to Table 4-8) present the estimation results for time of year choice under the long distance business and pleasure trip. The fourth quarter (Q4, Q4) is set as the base alternative for all the TOY models. Coefficients in bold and italic format are significant at the 95% confidence level. The mode choice logsum variables in the full TOY models infer that people tend to take their long distance trip in the quarter with larger accessibility in terms of mode choice logsum. Moreover, the mode choice logsum coefficient in pleasure full time of year model has a larger value than that in business full TOY model, which means that people are more sensitive to the mode accessibility when they take long distance pleasure trips. It can be explained by the fact that people care little about their travel expense when taking business trips due to the reimbursement of business travel. Due to the unexpected sign of the mode choice logsum coefficient and its 52 insignificancy in personal business full time of year model, the model will not be used. Instead, the time of year distribution {(Q1,Q1): (Q2, Q2): (Q3,Q3): (Q4,Q4) = (0.228: 0.297: 0.278: 0.197)} for the personal business trip in the survey data (1995 ATS) will be used to assign each person’s personal business trip a quarter based on Monte Carlo simulation. As noted, the personal business trips are usually School- related activities and Personal, family or medical activities (Table 3-1). So it is reasonable that when people travel for these activities, they are not sensitive to the time of year they choose to travel. Good estimation can be observed from the with- out sample validation results (Figure 4-9, Figure 4-10, and Figure 4-11), having a very small difference between the observed time of year distributions and the estimated ones. (Q1,Q1) (Q2,Q2) (Q3,Q3) (Q4,Q4) Mode choice logsum 0.05 Household Income 8.02e-06 -1.39e-06 3.61e-06 0.000 Employed 0.586 0.245 -0.385 0.000 Couple with Child 0.376 0.290 0.002 0.000 Age 0.03 0.037 0.0003 0.000 Constant -2.330 -2.678 -1.527 0.000 Pseudo R-Square 0.05 Table 4-6: Full Time of Year Choice for Business Trip 53 Figure 4-9: Time of Year Choice Validation for Business Purpose (Q1,Q1) (Q2,Q2) (Q3,Q3) Household Income 8.67e-06 4.02e-06 8.09e-07 Age 0.010 0.010 0.006 Couple w/o Child 0.312 0.468 0.42 Couple w Child 0.539 0.842 0.825 Single 0.137 0.28 0.124 Employed 0.308 0.458 0.236 Unemployed 0.062 0.374 0.019 Constant 0 .119 -0.39 -0.005 Pseudo R-Square 0.01 Table 4-7: Simple Time of Year Choice for Pleasure Trip 0 100 200 300 400 500 600 Q1,Q1 Q2,Q2 Q3,Q3 Q4,Q4 observed estimated N u m b er o f O b se r v a ti o n s 54 Figure 4-10: Simple Time of Year Choice Validation for Pleasure Purpose (Q1,Q1) (Q2,Q2) (Q3,Q3) (Q4,Q4) Mode choice logsum 0.147 Household Income 8.76e-06 4.38e-06 7.20e-08 0.000 Employed 0.376 0.155 0.154 0.000 Male -0.256 -0.066 -0.357 0.000 Couple w Child 0.223 0.544 0.538 0.000 Age 0.016 0.01 0.006 0.000 Constant -1.33 -0.8945 -0.43 0.000 Pseudo R-Square 0.01 * Coefficients in bold and italic are significant at 95% confidence level; coefficients in italic are significant at 90% confidence level Table 4-8: Full Time of Year Choice for Pleasure Trip 0 500 1000 1500 2000 2500 Q1,Q1 Q2,Q2 Q3,Q3 Q4,Q4 observed estimated N u m b er o f O b se r v a ti o n s 55 Figure 4-11: Full Time of Year Choice Validation for Pleasure Purpose 4.2.3 Tour Duration Choice Model Different from the urban- or metropolitan-level travel demand model systems, the duration in long distance trip is measured in days away from the origin and covers the whole time period starting from the origin and ending at the origin. The tour duration is modelled as it could affect the travel distance and travel mode that people will choose when they plan the long distance travel. Hazard duration model (survival analysis) analyzing the time to the occurrence of event is employed for the tour duration model. According to the feature that the long distance travel duration is recorded as days, the discrete time survival analysis method is utilized for tour duration choice (Gokovali, 2007). In the discrete time survival analysis for tour duration, we consider each long distance tour as a subject, and all the subjects are uncensored in a one-year calendar time. The longest duration for the long distance 0 500 1000 1500 2000 2500 Q1,Q1 Q2,Q2 Q3,Q3 Q4,Q4 observed estimated N u m b er o f O b se r v a ti o n s 56 tour is set as 31 days. The time unit is measured as day. The survival time T is a discrete random variable with probabilities: 𝑓(𝑡) = Pr (𝑇 = 𝑡) (4.3) where t represents the time interval. The discrete time survival function which describes the chance that a person will survive beyond the time period t in question without experiencing the event is given by Equation (4.4), while the failure function giving the probability of the event has occurred by duration t is given by Equation (4.5) 𝑆(𝑡) = Pr(𝑇 ≥ 𝑡) = ∑ 𝑓𝑛 ∞ 𝑛=𝑡 (4.4) F(t) = 1 − S(t) (4.5) And the hazard rate, which represents the probability of an event occurs given that one has survived to that time t, is: h(t) = Pr(T = t|T ≥ t) = 𝑓(𝑡) 𝑆(𝑡−1) (4.6) Given the hazard rate, the discrete time survival function can also be written in Equation (4.7): 𝑆(𝑡) = (1 − ℎ1)(1 − ℎ2) … … (1 − ℎ𝑡−1)(1 − ℎ𝑡) (4.7) The probability of the event being occurred during the time interval t is: Pr(𝑡 − 1 < 𝑇 ≤ 𝑡) = 𝐹(𝑡) − 𝐹(𝑡 − 1) = 𝑆(𝑡 − 1) − 𝑆(𝑡) (4.8) Two functions including logistic regression function and complementary log- log function can be used to fit the discrete-time hazard models (Allison, 1982; Jenkins, 2000), and we adopted the logistic regression function for the hazard rate in our analysis. log ( hit 1−hit ) = αt + β ∙ Xit (4.9) 57 where hit is the hazard rate, and it is the probability of an event occurs given that one (i) has survived to the that time (t); i (1, 2,…n) refers to individual; t, taking on positive integer value, refers to the discrete time; αt is the baseline hazard function; β is the coefficient vector of the covariates; Xit is the covariates or explanatory variables of individual i at time t. In the duration model, the explanatory variables or covariates are known features of the long distance tour, person and household characteristics. It is less likely that these attributes will change over time during the period of the long distance tour. Therefore, the covariates are assumed time independent. Meanwhile, multiple baseline hazard functions including log(time) and polynomial in time are tested. Little difference is shown in the model estimated coefficients (for the same covariates) and the model with-out sample validation results. In our model, the polynomial function of time was employed as the baseline hazard function. Therefore, the hazard rate function can be represented as: log ( hit 1−hit ) = γt + θ𝑡2 + β ∙ Xit (4.10) ℎ𝑖𝑡 = 1 1+exp (−γt−θ𝑡2−β∙Xit) (4.11) where γ, θ, and β are coefficients which need to be estimated. Table 4-9 presents the tour duration model estimation results for long distance business and personal business activities. The duration of long distance pleasure activities is estimated based on the observed distribution of the pleasure duration using Monte Carlo simulation, as the duration model for pleasure trips has a very small Pseudo R2 and the validation results shows a much different pattern between the estimated distribution of the duration and the observed one. The observed duration distribution of the long distance pleasure trip is shown in Figure 4-12, and it 58 shows that most people usually have 2 days for their entire long distance pleasure activities. Figure 4-12: Observed Duration Distribution of Long Distance Pleasure Activities According to the duration model estimation results, people travelling with tour primary destination in MSA zone have lower hazard rate for both business and personal business purposes. People in couple-family household with children have higher hazard rate than people in single family household and non-family household if they make long distance business trips. Meanwhile, people in couple-family household with children and non-family household have higher hazard rate than people in single family household when they travel long distance for personal business purpose. Compared to low income group, the hazard rates of medium and high income groups are lower, with the hazard rate of high income even lower than that of medium income group. Also, the hazard rate increases with person age, and decreases if the person is unemployed. People travelling in the second, the third, and the fourth quarter have lower hazard rate than people travelling in the first quarter, 0% 5% 10% 15% 20% 25% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Days 59 and the largest extent of decreasing rate is the third quarter. The baseline hazard rates for business and personal business are shown in Figure 4-13. Business Personal Business Destination in MSA -0.182 -0.096 Couple w Children 0.086 0.149 Single Family -0.075 -0.148 Non-Family Household -0.169 0.204 Household Size 0.028 0.030 Medium Income -0.045 -0.082 High Income -0.203 -0.279 Unemployed -0.132 -0.194 Student -0.052 -0.669 Tour Departure in Quarter 2 -0.196 -0.062 Tour Departure in Quarter 3 -0.326 -0.128 Tour Departure in Quarter 4 -0.141 -0.084 Age 0.006 0.003 Time interval (t) -0.084 -0.188 Squared time interval (t 2 ) 0.002 0.005 constant -0.854 -0.096 Pseudo R2 0.02 0.05 * Coefficients in bold and italic are significant at 95% confidence level; coefficients in italic are significant at 90% confidence level Table 4-9: Tour duration choice model estimation results 60 Figure 4-13: Baseline Hazard Rate for Business and Personal Business Duration Model Based on the model estimation results, we estimated the duration of each long distance tour based on Equation (4.7), (4.8) and (4.11) using Monte Carlo simulation method. The validation results between the observed duration distribution and estimated duration distribution are shown in Figure 4-14 and Figure 4-15 for business and personal business purposes separately. It is observed that except the duration of 2 days the estimated duration distributions follow the same pattern as the observed distribution for both long distance purposes. -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Business Personal Business H az ar d r at e Days 61 Figure 4-14: Validation results for Business Duration Model Figure 4-15: Validation results for Personal Business Duration Model 4.2.4 Travel Party Size Choice Model The travel party size choice is modeled for each long distance tour, and it determines how many persons participating in the tour. And it is assumed that no one will get on or get off the tour during the long distance travel. The model is 0 100 200 300 400 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 observed estimated Days N u m b er o f O b se r a v a ti o n s 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 observed estimated Days N u m b er o f O b se r v a ti o n s 62 multinomial logit model, and each people will have a choice set of four alternatives (travelling alone, travelling in 2 persons, in 3 persons, and in 4 and more persons) for all the three long distance activities (business, personal business and pleasure). People travelling alone is set as the base alternative for all the three models. The explanatory variables mainly include the person and the household characteristics. According to the tour-level model structure, the tour destination is determined and known before people decide the travel party size for business and personal business travel. Therefore, the zonal attributes can be utilized in the travel party size choice model for long distance business and personal business tour. The estimation results are shown in tables from Table 4-10 to Table 4-12. The coefficients in both bold and italic font are significant at 95% confidence level, while those only in italic font are significant at 90% confidence level. The results imply that when people take long distance business travel, they prefer travelling alone instead of travelling in a party if the destination is a metropolitan statistical area. And compared to people with high income, people with low and medium income tend to travel in a party and low-income level people prefer more persons on the trip. If the traveler is female, she tends to travel with companies during her long distance business travel. Aged people are more likely to travel in a two- and three-person party on their business trips. When people travel for long distance personal business and pleasure activities, they are more likely to travel alone if they live in a single family. Contrariwise, they tend to travel in a party with more persons if they live in a family with spouse and children. The same pattern as in the business tour can be observed for female, low-income and medium-income people when they take the personal business and pleasure trips. The with-out sample 63 validation results (Figure 4-16, Figure 4-17, and Figure 4-18) illustrate that the travel party size choice models of all the three purposes can estimate the number of persons during the entire tour with small errors. 2 Persons 3 Persons 4 and 4+ Persons Destination in MSA -0.187 -0.327 -0.248 Single Family -1.207 -0.847 -0.187 Couple with Children -0.474 0.050 0.027 Household Size -0.003 -0.051 0.261 Low Income level 0.660 1.063 1.161 Medium Income level 0.347 0.632 0.606 Age 0.020 0.007 -0.007 Female 0.908 0.984 1.046 Constant -1.221 -2.052 -2.288 Pseudo R2 0.055 * Coefficients in bold and italic are significant at 95% confidence level; coefficients in italic are significant at 90% confidence level Table 4-10: Travel Party Size Choice Model Estimation for Business Tour Figure 4-16: Travel Party Size Choice Validation for Business Purpose 2 Persons 3 Persons 4 and 4+ Persons Destination in MSA -0.086 0.0004 -0.238 Single Family -2.095 -1.385 -0.911 0 100 200 300 400 500 600 700 800 1 2 3 4 observed estimated N u m b er o f O b se r v a ti o n s 64 HH with Children 0.010 0.619 0.842 Household Size -0.174 -0.017 0.270 Low Income level 0.382 0.699 0.771 Medium Income level 0.268 0.335 0.394 Age 0.022 0.012 0.008 Female 0.283 0.384 0.402 Constant 0.315 -0.988 -1.519 Pseudo R2 0.075 * Coefficients in bold and italic are significant at 95% confidence level; coefficients in italic are significant at 90% confidence level Table 4-11: Travel Party Size Choice Model Estimation for Personal Business Tour Figure 4-17: Travel Party Size Choice Validation for Personal Business 2 Persons 3 Persons 4 and 4+ Persons Single Family -2.505 -1.754 -0.921 HH with Children 0.088 1.588 1.370 Household Size -0.265 -0.184 0.395 Low Income level 0.119 0.199 0.467 Medium Income level 0.168 0.256 0.394 Age 0.023 0.013 0.016 Female 0.023 0.098 0.118 Constant 1.058 -0.196 -1.682 Pseudo R2 0.133 0 100 200 300 400 500 600 700 800 900 1 2 3 4 observed estimated N u m b er o f O b se r v a ti o n s 65 * Coefficients in bold and italic are significant at 95% confidence level; coefficients in italic are significant at 90% confidence level Table 4-12: Travel Party Size Choice Model Estimation for Pleasure Tour Figure 4-18: Travel Party Size Choice Validation for Pleasure Purpose 4.2.5 Tour Destination Choice Model The destination choice determines the location of the long distance tour’s primary destination. It works at the zonal level and each person will be assigned a TAZ as his/her primary destination according to the multinomial logit destination choice model. In the 1995 ATS sample data, there are a total of 208 TAZs which means that each person faces a universe choice set of 208 alternatives. In order to reduce the estimation time and complexity, the method of simple random sampling (SRS) is implemented due to the independency of irrelevant alternatives (IIA) property of multinomial logit model (Nerella & Bhat, 2004; Lemp & Kockelman, 2012). Consequently, each person will have a destination choice sub-set of 10 alternatives among which one is the person’s chosen zone and nine will be randomly 0 500 1000 1500 2000 2500 3000 3500 1 2 3 4 observed estimated N u m b er o f O b se r v a ti o n s 66 selected from the rest of 207 zones. Due to the fact that people tend to take their long distance pleasure and business trips in places that are large cities, tourist attractions or vacation locations, we added several dummy variables in the long distance business and pleasure destination choice models to control people’s preference for certain places. The zonal attractiveness variables in the destination choice models are mainly the zonal total employment, the number of households, and the dummy variable indicating a metropolitan statistical area. As we can see in Figure 4-4, destination is firstly determined for long distance business and personal business travel and the travel time period is unknown. Therefore, in the destination choice model for business and personal business, the mode choice logsum is calculated with the average travel time and travel cost of each travel mode across four seasons. Table 4-13 presents the estimation results for the destination choice under all the three long distance activities. All the variables in the table are significant at 95% confidence level. The coefficients of the mode choice logsum in all the three models are positive, implying that people tend to make a long distance travel to a place with large accessibility. And the larger value in business model indicates that people are more sensitive to the accessibility when it is a long distance business travel. The variable of distance to destination shows that people are not likely to take a long distance travel to a place farther from home. The variables of distance squared and cubed are incorporated in the model to allow for a nonlinear effect of the distance. Also no matter what kind of purpose they travel for, they prefer travelling to a location with a lot of job positions which implies plenty of resources. If a zone is a metropolitan statistical area and has a large number of households, it is not likely 67 people would choose the zone for their long distance travel destination. As expected, people prefer choosing the large cities (Las Vegas) or tourist attractions (Florida) for their long distance business and pleasure travel. Business Personal Business Pleasure Mode choice logsum 0.999 0.609 0.424 Distance to destination -0.003 -0.004 -0.003 Squared Distance to destination (1000mi) 0.001 0.001 0.001 Cubed Distance to destination (10 5 mi) -0.00001 Destination in MSA -0.695 -0.965 -1.256 No. of Employment 0.002 0.001 0.002 No. of Households -0.003 -0.001 -0.002 Destination in Las Vegas -2.179 -- -1.822 Destination in Florida -- -- 1.974 * Coefficients in bold and italic are significant at 95% confidence level; Table 4-13: Primary Destination Choice Model Estimation at Tour Level Figure 4-19, Figure 4-20, and Figure 4-21 show the validation results for only 206 zones that exist in the dataset for destination model estimations for the three long distance travel primary purposes. The figures depict that most of the destinations can be estimated with small difference from the observed except for a few ones with large errors due to the relatively small sample size. Among all the three purposes, the destination model for pleasure presents the overall largest error. 68 Figure 4-19: Destination Choice Validation for Business Purpose Figure 4-20: Destination Choice Validation for Pleasure Purpose Figure 4-21: Destination Choice Validation for Personal Business Purpose 0 10 20 30 40 50 60 70 80 1 8 1 5 2 2 2 9 3 6 4 3 5 0 5 7 6 4 7 1 7 8 8 5 9 2 9 9 1 0 6 1 1 3 1 2 0 1 2 7 1 3 4 1 4 1 1 4 8 1 5 5 1 6 2 1 6 9 1 7 6 1 8 3 1 9 0 1 9 7 2 0 4 estimated observed N u m b er o f O b se r v a ti o n s 0 50 100 150 200 250 300 350 400 450 1 8 1 5 2 2 2 9 3 6 4 3 5 0 5 7 6 4 7 1 7 8 8 5 9 2 9 9 1 0 6 1 1 3 1 2 0 1 2 7 1 3 4 1 4 1 1 4 8 1 5 5 1 6 2 1 6 9 1 7 6 1 8 3 1 9 0 1 9 7 2 0 4 estimated observed N u m b er o f O b se r v a ti o n s 0 20 40 60 80 100 120 140 160 180 1 8 1 5 2 2 2 9 3 6 4 3 5 0 5 7 6 4 7 1 7 8 8 5 9 2 9 9 1 0 6 1 1 3 1 2 0 1 2 7 1 3 4 1 4 1 1 4 8 1 5 5 1 6 2 1 6 9 1 7 6 1 8 3 1 9 0 1 9 7 2 0 4 estimated observed N u m b er o f O b se r v a ti o n s 69 4.3 Stop Level Structure After people have made decisions about their travel to the main destination, they will make plans for their trips on the way to and from the destination based on the remained time. It is assumed that people have the same logic to determine their stops or trips during the tour legs regardless of the main activity types. Consequently, the same model structure at the stop level will be applied to all the three tour-level activity types (business, pleasure, personal business) (Figure 4-22). The stop level structure generates the information of the intermediate stops people would make during their inbound/outbound legs of the long distance tour. A stop during the tour is defined as one people make for certain purpose like business, personal business or pleasure. The stops for rest or transfer in the same travel mode or across multiple travel modes are not the ones we are going to analyze, and are excluded from the data set. At the stop level, the information about each long distance tour such as the tour duration, travel mode, travel party size, and tour origin and destination are already known. Stop Frequency Stop Purpose Stop Location Figure 4-22: Stop Level Procedure and Model Components 70 4.3.1 Stop Frequency Choice Model The stop frequency model at the higher level determines the number of intermediate stops people will have on the way from/to the tour destination. In each direction, a maximum number of 4 stops can be made which results in a maximum of 5 trips on each tour leg. The stop frequency choice model for each half tour leg is developed using multinomial logit model, and each person faces a choice set of 5 alternatives (0, 1, 2, 3, 4) for each tour leg. That people make zero stop is set as the base alternative for both models. The models mainly utilize the long distance tour characteristics as the explanatory variables such as tour duration, tour mode, the activity type of the long distance tour, time of year, and distance between tour origin and destination. As expected, people would like to make stops on inbound and outbound trips if the travel distance between origin and primary destination is large or people have plenty of time for their long distance travel. When the long distance travel is for pleasure, people tend to make stops on both trips (inbound and outbound trips), more likely one stop. People using car for their long distance travel mode prefer stopping one or two times when they head home (inbound trip). While they are on the way to the tour primary destination they are more likely to make stops especially 2 stops for certain purposes. The with-out sample validation results in Figure 4-23 and Figure 4-24 show that the stop frequency models perform well in estimating the number of stops during either inbound or outbound leg of the tour. 1 2 3 4 OD Distance 0.001 0.0005 0.0002 0.001 71 Tour Duration 0.025 0.017 0.015 0.031 Travel Party 0.034 0.0002 -0.043 -0.025 Car mode 1.044 2.309 -3.090 -1.483 Business Tour 0.640 0.201 0.365 0.599 Pleasure Tour 1.014 0.177 0.185 0.570 Travel in Quarter 2 -0.659 0.060 1.420 0.393 Travel in Quarter 3 -0.237 0.110 1.298 0.463 Travel in Quarter 4 -0.011 -0.398 1.582 0.268 Constant -9.078 -6.444 -3.344 -6.100 Pseudo R-Square 0.16 Table 4-14: Stop frequency model estimation for tour inbound leg Figure 4-23: Inbound Stop Frequency Model Validation 1 2 3 4 OD Distance 0.001 0.001 0.001 0.001 Tour Duration 0.023 0.017 0.020 0.022 Travel Party -0.003 -0.004 0.013 0.005 Car mode 0.975 3.283 0.185 0.983 Business Tour -0.527 0.208 0.704 0.298 Pleasure Tour 0.727 0.434 0.611 0.628 Travel in Quarter 2 -0.368 0.133 0.004 0.176 Travel in Quarter 3 0.045 -0.001 0.094 0.563 Travel in Quarter 4 -1.162 -0.096 -0.405 -0.691 Constant -8.481 -7.512 -6.326 -8.052 Pseudo R-Square 0.05 Table 4-15: Stop frequency model estimation for tour outbound leg 0 2000 4000 6000 8000 10000 12000 0 1 2 3 4 observed estimated N u m b er o f O b se r v a ti o n s 72 Figure 4-24: Outbound Stop Frequency Model Validation 4.3.2 Stop Purpose Choice Model Once the number of stops a person will make during the long distance tour is obtained, the purpose of each stop will be determined (the middle-level model component in Figure 4-22) through the stop purpose choice model. The stop purpose category follows the same tour-level activity types that include business, personal business and pleasure. The model is developed for each half tour using multinomial logit model, and the pleasure purpose will be set as the base alternative. Both obtained stop-level and tour-level characteristics can be used as the explanatory variables in the stop purpose choice model for each tour leg, such as the sequence of the stop, the long distance primary activity type, travel party size, and the tour travel mode. The estimation results for the stop purpose choice models (Table 4-16 and Table 4-17) indicate that it is less likely that people will make a business stop for their 0 2000 4000 6000 8000 10000 12000 0 1 2 3 4 observed estimated N u m b er o f o b se r v a ti o n s 73 second, third, or fourth stop if they are on the way to the primary destination. Meanwhile, people don’t like to arrange a stop for business purpose either on inbound trip or on outbound trip when their primary activity of the long distance travel is pleasure or personal business. However, a stop is more likely to be a personal business stop on either half tour if people’s primary activity is personal business. People travelling by car or air tend to make a business stop when they are on the way back home. Figure 4-25 and Figure 4-26 illustrate the good performance of the stop purpose models in estimating the purposes of each stop during each half leg of the long distance tour. Business PB Second Stop -0.462 -0.061 Third Stop -0.395 -0.352 Fourth Stop -0.589 0.048 Pleasure Tour -3.904 1.259 Personal Business Tour -2.697 3.662 Travel Party -0.015 -0.123 Car mode -0.067 -0.547 Air mode 1.227 0.007 Constant 0.666 -3.939 0 100 200 300 400 500 600 700 800 Business Pleasure Personal Business observed estimated N u m b er o f O b se r v a ti o n s 74 Pseudo R-Square 0.32 Table 4-16: Purpose estimations for outbound stops Figure 4-25: Outbound Stop Purpose Model Validation Business PB Second Stop -0.061 -0.083 Third Stop -0.019 -0.111 Fourth Stop 0.561 1.719 Pleasure Tour -1.735 1.193 Personal Business Tour -1.319 2.972 Travel Party -0.01 0.067 Car mode 0.824 0.645 Air mode 0.872 -0.273 Constant -1.920 -5.953 Pseudo R-Square 0.1 Table 4-17: Purpose estimations for inbound stops 0 100 200 300 400 500 600 700 800 Business Pleasure Personal Business observed estimated N u m b er o f O b se r v a ti o n s 75 Figure 4-26: Inbound Stop Purpose Model Validation 4.3.3 Stop Location Choice Model At the low tier of the stop-level structure, the location for each stop will be estimated with the similar method employed in the primary destination choice at the tour level. And before the stop location, we know the number of stops and the sequence of the stops during each half tour, the stop purpose and tour origin and destination. Since we assume that people only take one of the three modes (air, car, train) and no transfer among different modes for the entire tour, the travel mode for each trip on each half leg will remain the same as the one estimated at the tour level. In the stop location choice, the distance between stop origin and stop location should be larger than 50 miles. The short distance travel based on the stop origin will not be incorporated and modelled. The same methodology as the one used in tour destination choice will be employed. However, different from the tour-level primary destination 0 200 400 600 800 1000 1200 1400 Business Pleasure Personal Business observed estimated N u m b er o f O b se r v a ti o n s 76 choice, the impedance of the travel to an intermediated stop in the stop location choice model should measure the additional impedance between the tour origin or stop origin and the tour primary destination if it is an outbound trip (Bowman & Bradley, 2006). And the main variables in stop location choice model are the out-of- direction or detour generalized travel cost and detour travel distance. For example, the level of service (LOS) variables for the first stop on the way to the tour primary destination are based on the additional impedance between the tour origin and the tour destination (Figure 4-27), and the LOS for the following stop is based on the additional impedance between the previous stop and the tour destination (Figure 4-28). At the meantime, the tour origin becomes the stop origin for the stop in the situation of Figure 4-27, and stop i will be called the stop origin of stop j in the Figure 4-28 situation. The same method works with the stops of the inbound direction but in an opposite way, as the anchor point is the tour origin instead of tour primary destination. Figure 4-27: LOS estimation for the first stop during outbound tour leg Tour Origin Tour Destination Stop Cost (O, S) Cost (S, D) Cost (O, D) Detour travel cost = Cost (O, S) + Cost (S, D) – Cost (O, D) 77 Figure 4-28: LOS estimation for the jth stop during outbound tour leg The detour generalized travel cost combines the detour travel cost and travel time components according to the time of value obtained from the tour level mode choice model. Table 4-18 and Table 4-19 provide the estimation results for all the coefficients in the stop location choice models for both inbound and outbound trips. The results imply that it is less likely that people will make a stop at a location with large detour travel distance on the way to or from the primary destination. Furthermore, people would like to stop at a non-MSA location with a large number of employments, no matter whether they are on the inbound or outbound trip. When the distance between the stop origin and tour destination is less than 150 miles and people will not be likely to take a stop at a place with higher detour generalized travel cost when they head to the tour destination. If the distance between the stop origin and tour destination is larger than 150 miles and less than 550 miles, people are still not likely to stop at a location with high detour generalized travel cost, but less sensitive compared to the situation of the distance between the stop origin and destination less than 150 miles. When the distance between the stop origin and destination is larger than 550 miles, the detour generalized travel cost has an Cost (Si, Sj) Tour Origin Tour Destination Stop i Stop j Cost (Sj, D) Cost (Si, D) Detour Travel Cost for Stop j = Cost (Si, Sj) + Cost (Sj, D) – Cost (Si, D) 78 insignificant impact on travelers’ stop location choice. The positive coefficients could be caused by the small sample size of the outbound stops for location choice. On the way back from the tour destination, people are less likely to stop at a location where the detour generalized travel cost is high when the distance between the stop origin and the tour origin is shorter than 550 miles. However, it seems that people by car would not care about the detour generalized travel cost when the distance between the stop origin and tour origin is larger than 550 miles, and this impact is insignificant. Figure 4-29 and Figure 4-30 shows the with-out sample validation results for the two models. From the figures, we can observe that the size of the validation sample is very small for each zone, with the largest number of trips less than 30. Based on the very few validation data records, we can say that both models can estimate the stop location well for a number of zones. Furthermore, the outbound location choice model shows a better performance than the inbound location model, with a smaller number of trips under/over-estimated. Explanatory Variables Coefficients Detour Travel Distance -0.00811 Detour GTC ( DSo-D<150 miles) -0.10292 Detour GTC (150<=DSo-D<550) -0.00016 Detour GTC (DSo-D>=550) 0.00158 Detour GTC (DSo-D>=550, Car mode) -0.00152 No. of Employment 0.00170 No. of Households -0.00158 The zone is not MSA 1.82495 R-Square 0.09 * GTC: Generalized Travel Cost; DSo-D : Distance from stop origin- tour destination Coefficients in bold and italic are significant at 95% confidence level; 79 Coefficients in italic are significant at 90% confidence level Table 4-18: Stop location model estimation for tour outbound leg Figure 4-29: Outbound Stop Location Choice Validation Explanatory Variables Coefficients Detour Travel Distance -0.0081 Detour GTC ( DSo-D<150 miles) -0.0361 Detour GTC (150<=DSo-D<550) -0.0007 Detour GTC (DSo-D>=550, Car mode) 0.0010 Detour GTC (DSo-D>=550) -0.0001 No. of Employment 0.0006 No. of Households -0.0002 The zone is not MSA 1.7805 R-Square 0.04 * GTC: Generalized Travel Cost; DSo-o : Distance from stop origin- tour origin Coefficients in bold and italic are significant at 95% confidence level; Coefficients in italic are significant at 90% confidence level Table 4-19: Stop location model estimation for tour inbound leg 0 5 10 15 20 25 30 1 8 1 5 2 2 2 9 3 6 4 3 5 0 5 7 6 4 7 1 7 8 8 5 9 2 9 9 1 0 6 1 1 3 1 2 0 1 2 7 1 3 4 1 4 1 1 4 8 1 5 5 1 6 2 1 6 9 1 7 6 1 8 3 1 9 0 1 9 7 2 0 4 observed estimated N u m b er o f o b se r v a ti o n s 80 Figure 4-30: Inbound Stop location Choice Validation The developed national passenger travel demand model has a series of analytic tools to ensure travel behavior realism and model sensitivity. All the model components in the model system employed the discrete choice modelling, specifically, multinomial logit model, except the yearly long distance activity model and tour duration model component. Most of the model components adopted discrete choice modelling methodology, because these models have limited discrete dependent variables which represent the person’s decisions or choices when planning for his/her long distance travel. Another reason we chose the multinomial logit model over the other methodology is because using the multinomial logit model can give us better estimation results when we conducted the with-out sample validation. For example, we tried the negative binomial model and the Poisson model to estimate the stop frequency during each leg of the tour (stop frequency model). However, the with-out sample validation results of these models showed poor performance in matching the observed data. On the contrary, the with-out sample validation results of the stop 0 2 4 6 8 10 12 14 16 18 20 1 8 1 5 2 2 2 9 3 6 4 3 5 0 5 7 6 4 7 1 7 8 8 5 9 2 9 9 1 0 6 1 1 3 1 2 0 1 2 7 1 3 4 1 4 1 1 4 8 1 5 5 1 6 2 1 6 9 1 7 6 1 8 3 1 9 0 1 9 7 2 0 4 observed estimated N u m b er o f o b se r v a ti o n s 81 frequency model using multinomial logit modelling methodology shows good performance in matching the observed data (Figure 4-23 and Figure 4-24). We have the same reason for choosing multinomial logit model over ordered logit model in estimating the travel party size. The methodology we used to estimate the number of long distance activities during the course of one year is Multiple Classification Analysis which is usually mostly used in trip generation in traditional four-step travel demand model. We chose this method over other methodologies (e.g. multinomial logit model and count models) because some of the explanatory variables (e.g. income and travel cost) that we took into account in the multinomial logit model or count models didn’t give us the right sign and the model performance was poor in terms of validation. Therefore, we chose a less advanced methodology, MCA method, to generate the long distance trip rate. The discrete time survival analysis or discrete hazard duration model was chosen because we would like to model the time that people spend for their entire long distance tour, and the time is measured as days which is discrete. Most importantly, the discrete time survival analysis method shows good performance in terms of matching the observed data in with-out sample validation. 4.4 National Travel Demand Model Flow and Key Assumptions Based on the model system, we can tell that the model assumes that people usually make plans for their long distance activities for one year months or even one year in advance. People will first make a plan of how many long distance activities they will have during one year at the end of the last year. For example, a person who 82 lives in Washington D.C area decides to have three long distance activities during the year, one for business, one for pleasure, and one for personal business. Then he/she has to make plans for the three long distance activities. With regard to the business travel, according to his/her company’s schedule and his/her own work schedule, he/she decides to go to California for one week in the first quarter, and go there on his/her own by air. As one week he/she schedules is tight for him/her and he/she has to go back to work, he/she doesn’t want to make any stops on the way from/to California. Meanwhile, he/she makes plans for his/her long distance pleasure activity. After he/she checks his/her work schedule and accrued leave from work, he/she finds out that he/she could take two weeks in July or August (the third quarter) to have a long distance pleasure activity. He/she decides to make the long distance pleasure travel with his/her spouse. As they have plenty of time (two weeks), they decide to go to Miami in Florida by car. And on the way from Washington D.C to Miami, he/she plans to make a stop in Charlotte, North Carolina to pay a visit to his/her sister for two days, and back from Miami to Washington D.C, he/she decides to stop in Orlando for pleasure purpose. Then, he/she plans his/her long distance personal business activity during the year. As noticed one year ago, he/she knows that he/she needs to go to Chicago to attend his/her best friend’s wedding in May this year. The wedding is scheduled on Saturday, and he/she plans to take three days to make this long distance personal business tour with his/her spouse. As the time is limited, he/she decides to go to Chicago by air. And on the way to/from Chicago, he/she plans no stops. 83 Based on the structure of the model system and how the model works, we could summarize the information and data that needed for each person to simulate his/her long distance activities and travel (Table 4-20), the output of each model component (Table 4-21), as well as the model system generated or simulated long distance travel information for each person (Table 4-22). In Table 4-22, the trip is part of the long distance tour. If there are no stops (Stops_In = 0 and Stops_Out = 0) during the tour legs, there are only two trips for the long distance tour, and one trip for each direction. If there are, for instance, 2 stops during outbound leg of the tour and 2 stops during inbound leg of the tour (Stops_In = 2 and Stops_Out = 2), there will be a total of 6 trips during the entire tour, with 3 trips in each direction. If the trip is the first trip during the outbound leg of the tour, the Trip_Origin would be the Tour_Origin. If the trip is the first trip during the inbound leg of the tour, the Trip_Origin would be the Tour_Dest. If the trip is the last trip during the outbound leg of the tour, the Trip_Dest would be the Tour_Dest. If the trip is the last trip during the inbound leg of the tour, the Trip_Dest would be the Tour_Origin. If the trip is not the last trip during the outbound leg or the inbound leg of the tour, the Trip_Dest is the Stop_Location. 84 Input categories Input variables Description Person Attributes PersonID Person ID State State FIPs MSAPMSA MSAPMSA where person lives, 4-digit code PUMA PUMA where person lives, from PUMS data TAZ TAZ where people lives, integer value from 1-380 NMSA whether TAZ is MSA or Non-MSA, 0 or 1 PAGE Person Age, integer value INCLVL Household income level: low, middle, and high income level Gender Person's gender EmpStatus Person's employment status: Employed, Unemployed, School HHtype Household type: Couple w/o Children, Couple w Children, Single, Non-Family HHsize Household size Transportation OD Skim TT_car Car travel time TC_car_Business/PB Car travel cost for business and personal business purpose TC_car_Pleasure Car travel cost for pleasure purpose TT_air Air travel time TC_air Air travel cost TC_air_1 Air travel cost in quarter 1 TC_air_2 Air travel cost in quarter 2 TC_air_3 Air travel cost in quarter 3 TC_air_4 Air travel cost in quarter 4 TT_train Train travel time 85 TC_train Train travel cost TAZ Economic and Demographic # of Households Number of household in TAZ # of Population Number of population in TAZ # of Employment Number of employment in TAZ Table 4-20: Long distance travel demand model input data Model Component Model component output Output Description Yearly activity pattern model Business Person's total number of long distance business tours per year PB Person's total number of long distance Personal business tours per year Pleasure Person's total number of long distance pleasure tours per year Tour destination choice model Tour_Dest The primary tour destination TAZ of each tour Tour duration model Tour_Duration The tour duration in days Time of year choice model Tour_TOY Time of year the tour is made: Q1, Q2, Q3, Q4 Travel party size choice model Traparty Travel party size during the tour Travel mode choice model TravelMode Tour travel mode Stop frequency Model Stops_In, Stops_Out Number of stops during the inbound and outbound legs of the tour Stop purpose model Stop_Purpose Purpose of each stop Stop location model Stop_Location Destination TAZ of each stop Table 4-21: Output of each model component 86 Model Output Description Person ID Person ID N_Business Person's yearly total number of long distance business tours per year N_PB Person's yearly total number of long distance Personal business tours N_Pleasure Person's yearly total number of long distance pleasure tours per year Tour ID Tour ID Tour_Purpose Purpose of the tour: business, personal business, pleasure Tour_Origin Tour origin: home TAZ Tour_Dest Tour primary destination TAZ Tour_TOY Time of Year the tour is made: Q1, Q2, Q3, Q4 TravelMode Travel mode: car, air, train TraParty Travel party size during the tour Tour_Duration Tour duration in days Stops_In Number of stops made during inbound leg of tour Stops_Out Number of stops made during outbound leg of tour Trip ID Trip ID during the tour Trip_Purpose Trip purpose: business, personal business, pleasure Trip_Origin Trip origin TAZ Trip_Dest Trip destination TAZ Table 4-22: Output of long distance travel demand model system for each person From the modelling perspective, a model is usually considered to be able to reflect the maximum travel behavior reality if it takes into account all the necessary data and factors. However, such model could be too complex and have too many parameters that cannot be accurately measured using the available data. And it is likely that as the model becomes more complicated, it may not be able to generate stable and reasonable results. Our national travel demand model is developed and estimated with all the available information in 1995 ATS data, and to some extent it is simplified with a series of assumptions. 87  People’s long distance activities during one don’t have interactions, which means that one long distance activity and travel will not affect the person’s other long distance activities and travel during one year.  People’s long distance activity or the entire long distance tour occurs in the same quarter.  The duration of the entire long distance tour will be modeled, but the duration of the activities occurring at the primary destination and the stops are not modelled due to the data limitation.  The travel mode stays the same through the entire long distance tour, no travel mode transfer is considered. Only car, air and train are taken into account as the travel mode alternatives for the long distance travel.  The stop is defined as the one that people make for a certain purpose (business, pleasure, and personal business) and stay for one day or more than one day. The stop made for a rest or mode transfer between the same travel mode or across different travel modes will not be included in the model.  The maximum number of stops people could make during the tour leg is assumed to be four.  As in the model estimation there is no data allowing us to model the time constraints, we assume that when simulating travel mode choice at the tour level, the one-way travel time should not be larger than the half time of the tour duration.  At the stop level, the distance between the chosen stop location and the tour origin should be less than the distance between tour origin and tour 88 destination. The distance between the stop origin and stop destination should also be larger or equal than 50 miles.  During the entire long distance tour, the travel party size will stay the same. It means that no one will get on or get off the tour during the long distance travel.  The side stops occur based on the primary tour destination is not considered in the model system, and will not be estimated by the model system.  At the tour level, people have different decision procedures and logics to determine their travel to primary destination for different purposes. At the stop level, people have the same logic and decision process to determine their stops from/to the primary destination. 89 Chapter 5: Preliminary Base Year OD Estimations The long distance activity-based model system has a series of analytic tools to ensure maximum travel behavior realism and model sensitivity. All the model components at the tour level and the stop level adopt the discrete choice forms (multinomial Logit model) except for the tour duration component which employs the hazard duration methodology. A microsimulation-based framework, which simulates each person’s long distance travel decision features, is developed based on the model system. The Monte Carlo simulation method is used in the framework to achieve an unbiased selection of alternatives in the case of estimating each decision. The year of 2010 was chosen as the base year. The 2010 PUMS data provides the population information. The 1 percent 2010 PUMS data was expanded to the whole nation’s population according to the person weight in PUMS file. Each individual will be assigned a TAZ (MSA/Non-MSA) based on the correspondence file of MSA/Non-MSA and PUMA. In this way, the generated population by expanding the sample according to the weight can be observed statistically representative of the true distribution in MSA/Non-MSA. The population in each MSA/Non-MSA is shown in Figure 5-1. The base year transportation OD skim data and the economic/demographic data are also collected and processed using the same methods discussed in Chapter 3. In auto skim data estimations, the average auto fuel efficiency is 21 mpg (Bureau Transportation Statistics, 2015) and the average retail fuel price is $2.56/gallon in the U.S. in 2010 (EIA). The average unit lodge costs for different income groups are correspondingly converted to 2010 dollar based on CPI. 90 The other assumptions (e.g. travel speed and rest hours) in auto OD skim data estimation remain the same. The 2010 air skim data is obtained from the 2010 DB1B data. The train cost for each OD is converted to 2010 dollar based on the collected 2013 Amtrak train data, and the travel time by train for each OD is assumed unchanged from 2010 to 2013. The 2010 county-level economic/demographic data is obtained from CEDDS and further aggregated to MSA/Non-MSA level. Figure 5-1: MSA/Non-MSA Population In the simulation, we made several assumptions to add the time constraints as many as we can, as in the model estimation there is no data allowing us to model the time constraints. When the travel mode choice at the tour level is simulated, the one- way travel time should not be larger than the half time of the tour duration. Such assumption constrains the travel mode choice for certain OD pairs given the tour duration. Meanwhile, at the stop level, the distance between the chosen stop location 91 and the tour origin should be less than the distance between tour origin and tour destination. As in the long distance travel in our model, the tour destination is defined as the one with the furthest distance. It will conflict with the definition if the distance between the stop and the tour origin is larger than the tour OD distance. The microsimulation tool is developed using Java. Given all the input data, our developed micro-simulator could output the long distance activity patterns of each person in the U.S. Given all the input data including 2010 PUMS population data, transportation OD skim data, and economic/demographic data, our developed micro-simulator could output the long distance travel information of each person in the U.S. Based on the simulation results (integrating the generated tour-level and the stop-level travel information), we can obtain a total of 48 national-level trip OD tables (4 quarters * 3 travel modes * 4 Purposes, 4 purposes includes business, personal business, pleasure, and back to origin). Aggregating the cell trips in each of the 48 OD tables can give us the national trip distribution by travel mode, trip purpose and time of year separately. Aggregating the cell trips in each of the 48 OD tables can give us the national trip distribution by travel mode in each quarter (Figure 5-2). The aggregate results imply that car travel accounts for above 80% among the three travel modes, and a very small portion of people (less than 1%) would choose train for their long distance travel. The very small sample size of train travel in model estimation data (1995 ATS) could cause the bias of the model estimation coefficients of train travel. There are almost 20% of trips made by air. Figure 5-3 shows the trip distribution by travel mode and time of year. It indicates that a large number of people prefer taking their long 92 distance trips during the first three months (from January to March) regardless of the travel mode, and a few people would choose the fourth quarter (from October to December) to travel. The trip distribution pattern by time of year has a discrepancy compared to what is expected and the survey distribution. In the survey, as what we expected, trips made in the third quarter (from July to September) account for the largest group. One reason of this could be that the model coefficients used to simulate the travel in 2010 are estimated using 1995 ATS data which is 15 years old.. It could be desirable to use a survey data close to the year of 2010 to estimate the model. However, the most current long distance travel survey with detailed travel information is 1995 ATS. In model calibration procedure, we will try to resolve this discrepancy. Summing all the trips including back to home/origin trips yields a total of over 3.59 billion trips during a year which infers that a person would take an average of around 11 long distance trips during one year. Figure 5-2: Trip Distribution by Travel Mode 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Car Air Train 93 Figure 5-3: Trip Distribution by Travel Mode and Time of Year Figure 5-4: Trip Distribution by purpose Based on the OD tables, we can obtain long distance trips from/to each MSA/Non-MSA zone (see Figure 5-5) for the nation. As noted here, trips from/to each zone plotted in the map are different from the trip generation or attraction terms in the traditional 4-step model. The trips are parts of the long distance tour. If a 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 N u m b er o f T ri p s (i n 1 0 0 ,0 0 0 ) 0% 10% 20% 30% 40% 50% 60% Business Pleasure PB 94 person has one tour and one stop during each half leg, then this person will have 4 trips. Therefore, the number of trips generated from each zone should equal to the number of trips destinated to the zone for the model year. It is observed from Figure 5-5 that the trip distribution has a similar pattern with the population distribution. Generally speaking, the MSA zones (except several large cities such as New York, Los Angeles, Chicago, Seattle, Washington D.C) have a smaller number of trips due to the smaller size in terms of geography population. The zones with larger population (in northeast, west pacific, Texas, and some large cities in west coast) usually have a larger number of trips during one year, while the zones in the states of Montana, North Dakota, South Dakota, and Wyoming produce a relatively small number of trips which can be explained by the small number of the population and low GDP of these states. As an illustrative example, Figure 5-6, Figure 5-7, and Figure 5-8 show the trip distribution by TAZ and travel mode for trips from the Washington D.C metropolitan area. Figure 5-6 infers that the car trips from D.C usually are centralized around D.C, and the farther the zone is away from DC, the fewer people will travel by car. Therefore, most of the car trips from D.C occur in the middle-east and east parts of the U.S around the D.C area. Since the train network is not as wide as car or air network, the trips taken by train should be distributed in the zones with train stations. The distribution of train stations reflects the distribution of Amtrak rail stations, as the train data are from Amtrak. As expected, the train trips from D.C are mainly located in east coast and centered around D.C. In general, those trips have shorter distance than the air trips and the distribution range is much smaller than that of the air trips 95 (see Figure 5-7). The most significant advantage of air travel is that it is unquestionably the fastest mode among all the transportation means, especially when people travel in longer distance. Moreover, in the U.S people can travel by air to most places in the nation due to the high density of airlines and airports. Consequently, trips by air departing from D.C are observed all over the U.S. (Figure 5-8). A large number of air trips to non-MSA zone in Virginia and Richmond are observed due to the fact that the DB1B data contains the air fare between D.C and the two zones. Also because the Non-MSA zone in Virginia contains multiple counties in Virginia. If we could divide the Non-MSA zone into smaller parts, we should observe that the areas near D.C don’t have air trips, and the areas far from DC have a lot of air trips. If we compare the number of trips from D.C to Non-MSA in Virginia and Richmond by travel mode, we could see that a lot more people would choose car than air. Figure 5-5: Trips Originate/Destinate at MSA/Non-MSA level 96 Figure 5-6: Yearly Car Trip Distribution by TAZ originating from Washington D.C Figure 5-7: Year Train Trip Distribution by TAZ Originating from Washington D.C 97 Figure 5-8: Yearly Air Trip Distribution by TAZ Originating from Washington D.C The national travel demand model is a person-based microsimulation model, and it employs the Monte Carlo simulation method to obtain an unbiased selection of alternatives in the case of estimating each decision. Therefore, there exists random noise in the model results. To test the noise level or stability of the microsimulation- based national travel demand model, we need to run the model multiple times and analyze the results of the model runs. As it usually takes four days to complete one full run of the national travel demand model, we only conducted 5 more full runs to evaluate the model’s stability. And with the base year model simulation results, we could have a total of 6 runs of model results to compare. 98 Figure 5-9: Number of trips of different categories for different model run Figure 5-9 shows the comparison of the total number of trips, number of trips by travel model, trip purpose and time of year for the 6 model runs. As we can see that the total number of trips ranges from 3.18 billion to 3.7 billion, and four model runs generate the trips at the level of 3.58 billion trips. The number of generated car trips ranges from 2.7 billion to 3.13 billion, and 3 model runs generated a total of around 3.02 billion car trips. The number of air trips ranges from 0.47 billion to 0.68 billion, and four model runs generated the air trips at a similar level, which is around 0.5 billion trips. The number of train trips ranges from 12 million to 25 million, and four model runs generated the train trips at the level of around 25 million. From the trip comparison for different purposes, we can observe that the model is able to generate quite stable results in terms of number of trips by purpose. And with regard to the number of trips by time of year, 4 model runs (Run_2, Run_3, Run_4 and Run_5) generated a similar level of number of trips in different quarters. 0 5000 10000 15000 20000 25000 30000 35000 40000 Run_1 Run_2 Run_3 Run_4 Run_5 Run_6 N u m b er o f T ri p s in 1 0 0 ,0 0 0 99 Therefore, with more runs of the national travel demand model, we could expect that most of the model runs will generate a similar level of total number of trips which is around 3.58 billion, and a total number of around 3.02 billion car trips, around 0.5 billion air trips and around 25 million train trips. The microsimulation model could generate quite stable results in terms of number of trips by trip purpose. Most of the model runs will generate a similar level of number of trips in different quarters. 100 Chapter 6: Model Calibration Travel demand model calibration is essential to accurately model people’s travel. Model calibration is the process of adjusting the model parameter values until the simulated or estimated travel results closely match the observed travel for the base year. In general, the process of system-wide calibration of parameters of a simulation-based model is understood to find the value of the parameters that minimize the error between observed outputs and simulated outputs. The observed outputs, usually, refer to trusted external data sets containing aggregate measures matching the base conditions (i.e. OD tables, traffic counts for base year). Thus, we want the simulation-based model to replicate these base conditions. The degree of trust in each measurement may be represented as a weight in the calibration process. The simulated output refers to the aggregation of the simulation results into measures that are equivalent to the observed output’s measures. The calibration process is formulated as a constrained minimization optimization problem as follows, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝜃 𝑤||𝑂𝑚 − 𝑂𝑆(𝜃)|| 2 𝑠. 𝑡. 𝑂𝑆 = 𝐹(𝑍; 𝜃) 𝑙 ≤ 𝜃 ≤ 𝑢 Where 𝑤 is a vector of weights indicating the trust of the modeler on different observed outputs, and in our calibration, the w is set as 1; 𝑂𝑚 is a vector representing 101 the observed outputs and 𝑂𝑠 is a vector representing the simulated outputs; 𝜃 is the vector of parameters from the model to be calibrated, and 𝑙 and 𝑢 are vectors of lower bounds and upper bounds for the parameters; 𝐹(𝑍; 𝜃) represents the link between the simulated outputs and the simulation-based model, and 𝑍 are the inputs required to run the simulation-based model. Lastly, the ||.|| represent the Euclidian norm. The algorithm selected to solve this constrained minimization problem is Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall, 2003). This algorithm has the following advantages: 1). it accounts for the simulation error in the simulation-based model output; 2). it can be applied in stochastic gradient setting or a gradient-free setting; 3). it only requires two function evaluations per iteration regardless of the length of the vector of parameters. In essence, the SPSA algorithm works by perturbing the components of the vector of parameters and computing difference of these perturbations with respect to the objective function of the constrained minimization problem. More detailed information can be found in Spall’s work (Spall, 2003). One example for calibrating models using a variation of SPSA is the work from Antoniou et al. (2015). The implementation step of the algorithm is briefly described as follows, Step 0 Initialization and coefficient selection In this step, the SPSA algorithm is set up with the initial values for the vector of parameters and also the values for the vector of hyper-parameters. These hyper- parameters belong exclusively to the SPSA algorithm. Step 1 Generation of simultaneous perturbation vector 102 A perturbation vector is generated using Monte Carlo simulation. This perturbation vector uses a Bernoulli distribution centered at zero. Step 2 Objective function evaluations Evaluate the objective function twice using the perturbation vector. Step 3 Gradient approximation Compute the gradient approximation using the perturbation vector and two evaluations of the objective function computed from step 2. Step 4 Update vector of parameters Update the values of the vector of parameters based on the gradient descent using the approximated gradient computed from step 3. Step 5 Iteration or termination Return to step 1 to continue iterating or terminate if there is negligible change between iterations in the objective function and/or values of the vector of parameters. For the passenger long distance model, we initially calibrated the alternative specific constants of the time of year choice model, and the travel mode choice model using the Airline Origin and Destination Survey (DB1B) data. This survey is a 10% sample of airline tickets from reporting carriers collected by the Office of Airline Information of the Bureau of Transportation Statistics. Data includes origin, destination and other itinerary details of passengers transported. This database is used to determine air traffic patterns, air carrier market shares and passenger flows. The model is calibrated on the base year model. For the purpose of this calibration effort, airline OD data is summarized to the following 4 values for each quarter: 103 -Number of flights departing from Maryland -Number of flights landing in Maryland -Number of flights departing from all other states -Number of flights landing in all other states Consequently, airline OD data was summarized into 16 variables. The model outputs were also summarized to capture these 16 variables. Sum of the squared differences between model simulation and airline OD values for these 16 variables was used as the objective function of the calibration. SPSA seeks to minimize the objective function by changing model parameters. For that, some of the model parameters should be selected as the inputs of SPSA to be changed. Mode choice and Time of year choice models were selected to be calibrated against airline OD data. Alternative specific constants for these models were selected as the variables to be calibrated. The followings are the variables selected for calibration: -Business trips time of year model alternative specific constant 1 -Business trips time of year model alternative specific constant 2 -Business trips time of year model alternative specific constant 3 -Pleasure trips simple time of year model alternative specific constant 1 -Pleasure trips simple time of year model alternative specific constant 2 -Pleasure trips simple time of year model alternative specific constant 3 -Pleasure trips full time of year model alternative specific constant 1 104 -Pleasure trips full time of year model alternative specific constant 2 -Pleasure trips full time of year model alternative specific constant 3 -Business trips mode choice model alternative specific constant 1 -Business trips mode choice model alternative specific constant 2 -Pleasure trips mode choice model alternative specific constant 1 -Pleasure trips mode choice model alternative specific constant 2 -Personal Business trips mode choice model alternative specific constant 1 -Personal Business trips mode choice model alternative specific constant 2 Another important aspect of the calibration is hyper-parameter selection for SPSA algorithm. SPSA includes 5 hyper-parameters: a, c, α, γ, and A. More information about these parameters can be found in the literature of Spall’s. These parameters were selected based on the suggestions in the optimization literature using variance of objective function, and average of gradients. We conducted several rounds of calibration, and finally chose the calibrated parameters under 60 iterations of calibration as it shows better performance in trip distribution by time of year and other trip distributions. SPSA algorithm was coded in Java to find the calibrated variables. The initial values for calibration variables were set to their estimated values from the estimation process. The 60-iteration calibration results are shown as below: Hyper-Parameter Value A 100 a 1.3724531551126038E-18 105 c 0.1 α 0.6 γ 0.1 Table 6-1: 60-Iteration Calibration Hyper-Parameter Number of iterations 60 Initial objective function value 5.3036402075560817E18 Final objective function value 4.692863010673278E18 Table 6-2: 60-Iteration Calibration Hyper-Parameter Variable Calibrated Value Initial Value toyBusiness_ASC1 -2.199 -1.469 toyBusiness_ASC2 -1.695 -0.744 toyBusiness_ASC3 -0.427 -0.370 toySimplePleasure_ASC1 -2.017 -1.228 toySimplePleasure_ASC2 -1.239 -1.285 toySimplePleasure_ASC3 -0.384 -0.541 toyFullPleasure_ASC1 -1.415 -1.120 toyFullPleasure_ASC2 -1.049 -0.653 toyFullPleasure_ASC3 -1.011 -0.403 mc_BUSINESS_Air -1.622 -0.44 mc_BUSINESS_Train -2.350 -2.93 mc_PLEASURE_Air -3.960 -2.95 mc_PLEASURE_Train -2.676 -3.56 mc_PERSONAL_BUSINESS_Air -1.626 -1.49 mc_PERSONAL_BUSINESS_Train -4.110 -3.75 Table 6-3: 60-Iteration Calibration Results 106 Chapter 7: Future Year Policy Analysis 7.1 Future Year Policy Scenarios The year of 2040 is selected as the horizon year analysis. Two categories of transportation policies are analyzed for the year of 2040 through the calibrated model. One is the fuel price changes, and the other one is the High Speed Rail. In recent years, the crude oil price has driven large fluctuations of the market fuel price (Figure 7-1) (EIA, 2016) which is an important component of transportation cost. Since January of 2003, the retail motor gasoline price has risen dramatically from the average $1.5 to more than $3. Travelers have responded to the 100 percent increase of the gasoline price in different ways. They have adjusted their travel behavior, driving habit and even changed their vehicle type to more fuel- efficient ones (CBO, 2008). According to the CBO study on evaluating the effects of the fuel price increase in a metropolitan area where the transit is available, it shows that for every 50 cents increase in the retail fuel price, there is a 0.7 percent decrease in the number of freeway trips. And the number of transit trips is increased accordingly. As the travel distance increases, the fuel cost would be a dominating cost that people would take into account while driving. 107 Figure 7-1: Retail Gasoline Price Changes Meanwhile, between 2002 and 2013, jet gasoline price has increased more than four times from $0.72 to $2.98 per gallon and general aviation gasoline price has increased more than three times from $1.29 to $3.93 per gallon in nominal terms (GAO, 2014). The increase of fuel price does affect the aviation activity including both scheduled and non-scheduled air service. In order to mitigating the financial effect of the fuel price increase, commercial airlines may improve the flight fees including flight tickets, checked bag fee, and other facilities, and some commercial airlines have taken a number of other steps such as restraining the domestic seat capacity growth, reconfiguring the fleets, conducting efficient flight and ground operation, etc (GAO, 2014). All of these could affect the comfort of individual’s travel experience and airline’s level of service, which will definitely affect travelers’ travel mode choice when making plans for long distance travel. Compared to airline and driving, the fuel cost has accounted for less percentage of total operating cost in railway (Tipping et.al, 2015). As the fuel price $- $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 $4.50 1 9 9 3 1 9 9 4 1 9 9 5 1 9 9 6 1 9 9 7 1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 2 0 0 7 2 0 0 8 2 0 0 9 2 0 1 0 2 0 1 1 2 0 1 2 2 0 1 3 2 0 1 4 2 0 1 5 2 0 1 6 108 decreases, the rail, as a more energy-efficient travel mode, will lose some of the cost advantages (USDOT, 2008). Therefore, in our study, we will not reflect the fuel price changes through the railway level of service variables (e.g. rail fare) in the scenario of fuel price changes. In our study, the impact of the fuel price increase on people’s long distance travel is analyzed. According to the crude oil projections by EIA (Figure 7-2), the crude oil price is observed to increase around 1.75 times of the price in 2012. Therefore, for simplicity, we assume that the retail fuel price is also increased by the same extent which is 1.75 times of the 2012 retail fuel price. It is complicated to identify the quantitative relationship between the fuel price changes and the air fare changes. But we do have the knowledge that the longer the flight distance is, the more fuel the airplane would consume given the same number of passengers. Based on this knowledge, in this research we simplified the relationship by assuming that the air fare would be increased by $5 if the flight distance is less than 500 miles, increased by $10 if the flight distance is between 500 miles and 1000 miles, and increased by $30 if the distance is larger than 1000 miles. 109 Figure 7-2: Crude Oil Price Projections to 2040 As mentioned before, High speed rail (HSR) is expected to help alleviate the heavy load of the traffic in road and air corridors and improve the inter-regional accessibility. The U.S federal and state planners are prompted to provide the high speed rail services through selected major corridors. The developed personal-based national travel model gives us ability to quantitatively forecast the high speed rail demand and evaluate the operational effectiveness of the investment to some extent. In this research, part of the northeast corridor is selected to forecast the high speed rail demand and evaluate its impact on the long distance travel market. It is desirable and more accurate that the analysis is conducted based on the stated preference (SP) data, as high speed rail along the northeast corridor does not exist and no one has experience of taking high speed rail. For simplicity, we take the method of improving the speed of the current rail and the corresponding fare as an equivalent of high speed rail. 110 To be specific, the travel time/cost changes in the scenarios for future year analysis are described as follows: Base Scenario: Future Year Base Scenario No change for the air and the train OD skim data in this scenario in the year of 2040. With regard to the car OD skim data, the vehicle fuel economy is assumed to improve to 30 MPG while the fuel price and other information maintain the same as in the year of 2010 for car travel. The increased MPG will definitely cause the cost decrease of car travel. This scenario is a basis of the other two scenarios, which means the transportation OD skim changes in the following scenarios are based on the future year base scenario. Scenario 1: Fuel Price Increase As we discussed before, fuel cost takes up fewer proportion of the total operating cost for railway compared to car and air. Therefore, in this scenario, the train OD skim data stays the same as the base year while the car and the air travel cost will be changed. To be specific, the retail fuel price is increased to 1.75 times of the base year fuel price for car travel, which is finally $4.48/gallon. As the vehicle fuel economy is set as 30 MPG which is 9 MPG higher than the base year. As both the fuel price and the vehicle economy are increased, it is hard to tell the travel cost change without calculation of specific OD. Regarding the air travel, the air fare is increased by $5 if the OD flight distance is less than 500 miles, increased by $10 if the distance is between 500 miles and 1000 miles, and increased by $30 if the distance is larger than 1000 miles. 111 Scenario 2: High Speed Rail In this scenario, we improve the travel speed of the existing rail line as a proxy of high speed rail scenario. The Washington D.C – New York section of the northeast corridor is chosen to analyze the travel demand changes along the corridor and its effect on the effectiveness of the investment. According to Amtrak’s projected planning and construction of high speed rail in northeast corridor, the travel time between New York and Washington DC including a stop in Philadelphia will be reduced to 96 minutes by 2040 (Amtrak, 2010). Therefore, in this scenario, we adopt the travel time projected by Amtrak, which is 96 minutes. Based on this, we assume one hour of travel time between New York and Philadelphia, and 0.6 hours of travel time between Washington D.C and Philadelphia. As the high speed rail provides a higher level of service in terms of the travel time which decreases a lot compared to the regular train travel, the travel cost of the high speed rail should be increased to reflect the improved service. A 30% increase of travel cost for the high speed rail is proposed and used. In all these three scenarios, we use the base year dollar in monetary value as we have no information about the CPI between the 2040 and 2010. Besides the transportation OD skim data, the economic and demographic data for each MSA and non-MSA in the year of 2040 is used to reflect the future year changes in economy and population. Such data is obtained from the Complete Economic and Demographic Data Source (CEDDS) by Woods & Poole Economics. As mentioned before, this database also offers projected socioeconomic indicators (e.g. population, employment, households, etc.) for all the regions, states, statistical areas and counties in the U.S. 112 7.2 Future Year Population Synthesis The developed person-based microsimulation national travel demand model recognizes individual as the decision maker, and the travel demand is derived from each individual’s desire in spatially located activities. When implementing such model, the initial step is generating the population in the U.S, particularly each individual’s socio-demographic characteristics except children under 18 years old, as the children under 18 years old is not considered as the decision maker in our study. Population synthesis is a procedure that expands the sample drawn from a population to the full size of the population such that the synthesized population can be representative of the actual population at various aggregate levels (Ryan et al, 2009; Lim & Cargett, 2013). The main idea of the population synthesis is to combine the census sample data (both household and person) with available up-to-date aggregate distribution or margins data (Beckman et.al, 1996). There are many population synthersizers, either standalone software packages or component of microsimulation activity-based travel demand models, most of which function based on iterative proportional fitting procedure (IPF) (Bowman, 2004). IPF estimates a distribution of control variables so that the number of individuals in given categories matches the corresponding margins and meanwhile the correlation structure of the seed is maintained (Axhausen and Müller, 2010). For more details about the IPF procedure, please refer to Deming and Stephan (1940) who first introduced the IPF method. 113 Among the various kinds of population synthesizers such as PopSynWin, PopGen, ILUTE (Salvini & Miller, 2005), FSUMTS (Srinivasan & Ma, 2009; Srinivasan et.al, 2008), CEMDAP (Guo & Bhat, 2007), etc., we chose PopGen (Ye et.al, 2009), a standalone open source software package, developed by Arizona State University to generate the future year population for the whole U.S by using distributions of household and person variables of interest and a sample of household data. PopGen also uses the standard IPF procedure draw households from the provided sample data to match the marginal distributions of control variables. It adopts the iterative proportional updating algorithm to estimate the household weights. Based on the household composition and the provided marginal, PopGen develops the weights for households in the sample data by using the IPU algorithm. It incorporates a heuristic approach to generate synthetic populations while matching both household-level and person-level characteristics of interest. PopGen is a Python-based software with an friendly graphical user interface (GUI) (Figure 7-3). Through the wizard-based project setup procedure, the users can choose the region at different levels in the U.S and provide their own input data or use the default inputs for population synthesis. Once the inputs are provided, PopGen will import the data into MySQL database and works from there. A total of 5 input files are required before conducting the population synthesis in PopGen. They are 1.Household sample file providing regional distribution of household characteristics, 114 2.Person sample file providing regional distribution of person attributes, 3.Household marginal file providing the marginal distribution of household attribute at specific geographic level, 4.Person marginal file providing the marginal distribution of person attributes at specific geographic level 5. Correspondence file between different geographies used in sample file and marginal file. Figure 7-3: Interface of PopGen The TAZ (MSA/Non-MSA) is consisted of multiple counties or cities, once we know the population of each county, we can have the population in each TAZ. With regard to the county in which different cities belong to different MSA, we allocate the number of population in the county to the corresponding TAZs according to the population ratios in that county. Therefore, the geographic level we choose to synthesize the population is county. The household sample file and person sample file are prepared from the 2013 ACS PUMS one-year data which is the latest ACS PUMS 115 data having the same PUMA system as the one in PopGen. The household marginal file and the person marginal file were prepared based on the Woods & Poole Economics’ projected data on the number of households by income, the number of persons by age, and the number of person by gender in 2040. Specifically, in household marginal file, the control variable household income is divided into 4 categories which are income group 1 with income less than $30000, income group 2 with income greater than or equal to $30k and less than $75k, income group 3 with income greater than or equal to $75k and less than $150k, and income group 4 with income greater than or equal to $150k. In the person file, we use person age and gender as the control variables, and the person age is divided into 5 categories which are [0, 20), [20, 35), [35, 55), [55,75), and [75, the up limit in the sample]. Regarding the geographic correspondence file, we used the default one in PopGen. We conducted the population synthesis for each state. And we randomly select one state to evaluate how the synthetic population and households match the provided control margins. The comparisons between the synthetic population and provided totals by control variable (Figure 7-4) shows that PopGen can generate the future population that can match the control totals for different variables very well. The synthetic population in 2040 for each MSA and Non-MSA can be shown in Figure 7-5. 116 Figure 7-4: Comparison between Synthetic Population and Control Margins by Control Variable 117 Figure 7-5: MSA/Non-MSA Population in 2040 Figure 7-6, Figure 7-7, and Figure 7-8 shows the synthetic population distribution by gender, income group and age group. It shows that female population occupies a larger share in our 2040 synthetic population, with above 50.5%, while male population has a share of less than 49.5%. Income group distribution figure indicates that high-income group (household income above $75,000) has the largest share of the whole 2040 synthetic population, while the low-income group (household income less than $30,000) has the smallest number of population. The number of middle-income group population (household income between $30,000 and $75,000) is smaller than high-income group but larger than low-income group. Also, as expected, the middle-age group (from 18 years old to 35 years old) has the largest 118 number of population, while the old-age group (above 60 years old) has the smallest number of population which is close to but slightly smaller than the young-age group (above 35 years old and younger than 61 years old) population. Figure 7-6: Gender distribution of synthetic 2040 population Figure 7-7: Income group distribution of synthetic 2040 population 47.0% 47.5% 48.0% 48.5% 49.0% 49.5% 50.0% 50.5% 51.0% 51.5% 52.0% Male Female 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% Inc1 Inc2 Inc3 119 Figure 7-8: Age group distribution of synthetic 2040 population 7.3 Future Year Scenario Results Analysis Compared to the base year model, the only difference in the future year model is the calibrated model coefficients shown in Table 6-3. All the assumptions in the model simulation are kept the same as the base year model simulation such as one- way travel time should not be larger than the half time of the tour duration, at the stop level the distance between the chosen stop location and the tour origin should be less than the distance between tour origin and tour destination, and etc. Using the calibrated model parameters, the synthetic population in 2040, and the proposed OD skim data, we can have the long distance travel patterns and trip information in future year for different scenarios. The total number of long distance tours by purpose, travel mode, and time of year could be obtained. As in the national travel demand model the number of long distance activities (tours) is determined by the long distance trip rate (see Table 4-2, Table 4-3, and Table 4-4) and has no 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% Age1 Age2 Age3 120 sensitivity to policies and scenarios, we will analyze and compare the results of different scenarios at the level of trips for future year. 7.3.1 Scenario1: Base Scenario Given the trip OD tables by purpose, time of year, and travel mode, we can have the total number of trips by time of year, travel mode, and purpose. Aggregating all the trips from the trip tables, we can have a total of 5.12 billion trips, which means a person would make an average of 12.5 long distance trips with one-way more than 50 miles during one year. Figure 7-11 shows the trip distribution by travel mode at the national level, and as expected that the car is the most used transportation mode when people make long distance travel due to the large coverage of highway network and the flexibility of car travel, while the train is the least used mode due to its limited distributions of rail stations and rail tracks. Figure 7-11 presents the trip distribution by time of year. Using the 60-iteration calibrated parameters, the model can predict the trip distribution by time of year ({Q1, Q1)-28%, {Q2, Q2)-23%, {Q3,Q3}-26%, {Q4,Q4}-24%) in the right track to the distribution of 1995 ATS data ({Q1, Q1)- 23.9%, {Q2, Q2)-27.5%, {Q3,Q3}-28%, {Q4,Q4}-20.6%) , even though the result shows that most people choose the first quarter to make long distance travel instead of the third quarter. And for different travel modes, the trip distributions by time of year show slight difference (Figure 7-11). With regard to the car mode, it presents a similar pattern with the distribution across all the three modes, with most people choosing the first quarter to travel and then the third quarter. The second quarter is the 121 least popular quarter that people will choose for long distance travel. Most of the people travelling by air would like to make their long distance travel in the third quarter, and then in the first quarter. The second quarter is also the least popular quarter that people by air will choose. The train trip distribution by time of year shows a same pattern as the distribution for trips by air. Given more iterations of calibration, the simulated time of year distribution should be close to the 1995 observed distribution. As it takes more than one week to do 60 iterations of calibration, we will explore more in the future work in the group. In this dissertation, we will continue to use the 60-iteration calibrated parameters. Figure 7-9: Trip distribution by travel mode 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Car Air Train 122 Figure 7-10: Trip Distribution by Time of Year Figure 7-11: Trip distribution by Travel Mode and Time of Year 0% 5% 10% 15% 20% 25% 30% (Q1,Q1) (Q2,Q2) (Q3,Q3) (Q4,Q4) 0% 5% 10% 15% 20% 25% 123 Figure 7-12: Trip distribution by trip purpose Figure 7-13 and Figure 7-14 shows the trip distribution and average number of trips per person by income group and travel mode at the national level. As expected, high-income group (household income above $75,000) generate the largest share of trips regardless of the travel mode, and higher income people usually travel more during one year than lower income people. Among the three travel modes, car is always the most popular transportation means for people to make long distance travel for different income groups. During one year, the high-income group people generate a total of almost 2.75 billion car trips, 0.5 billion air trips, and 23 million train trips. In average, a person in the category of high-income group could make 15 long distance trips by car, 2.5 long distance trips by air and 0.1 trips by train. The lower income a person has, the fewer long distance trips by car, air and train he/she will make during one year. Figure 7-15 and Figure 7-16 shows the trip distribution and average number of trips per person by gender and travel mode. It shows that although female takes up 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% Business Pleasure PB 124 the larger share of the whole population (above 50.5%), they don’t make as more long distance trips as the male. The male population generate a larger number of long distance trips than female regardless of the travel mode. A male person will make an average of above 18 long distance trips per year (almost 15 car trips, 3 air trips, and 0.13 train trips), while a female person would travel less than a male with an average of only 12.5 long distance trips per year (12.5 car trips, 1.9 air trips and 0.1 train trips). Figure 7-17 and Figure 7-18 represents the trip distribution and average number of trips generated per year by age group and travel mode. The middle-age group people (36 years old – 60 years old) has the largest contribution to the long distance trip generation of all the three travel modes. It is reasonable that the middle- age group people make a lot of long distance trips as most of them have jobs and steady income (which would lead to business trips) and have families (which would lead to more pleasure trips). The young-age group population (18 years old – 35 years old) and old-age group population (above 60 years old) have fewer long distance trips per year for all the three travel modes, compared to middle-age group population, while the old-age group population generates the fewest long distance trips. As expected, a middle-age person travel more frequently per year, with an average of 17.3 trips (14.5 car trips, 2.7 air trips, and 0.13 train trips). A young-age person could make an average of 16.3 long distance trips (13.7 car trips, 2.5 air trips, and 0.12 train trips), and an old-age person make long distance travel the least frequently, with an average of 14.7 trips (12.5 car trips, 2.1 air trips, and 0.1 train trips). 125 Figure 7-13: Trip Distribution by Income level and Travel Mode Figure 7-14: Average number of trips/person during one year by income group and travel mode 0 500000 1000000 1500000 2000000 2500000 3000000 Inc 1 Inc 2 Inc3 Air Car Train N u m b er o f T ri p s (i n t h o u sa n d s) 0 2 4 6 8 10 12 14 16 Inc1 Inc2 Inc3 Air Car Train A v er a g e n u m b er o f tr ip s /p er so n 126 Figure 7-15: Trip distribution by Gender and Travel Mode Figure 7-16: Average number of trips/person during one year by gender and travel mode 0 500000 1000000 1500000 2000000 2500000 Air Car Train Male Female N u m b er o f T ri p s (i n t h o u sa n d s) 0 2 4 6 8 10 12 14 16 Air Car Train Male Female A v er a g e N u m b er o f T ri p s/ p er so n 127 Figure 7-17: Trip distribution by travel mode and age group Figure 7-18: Average number of trips/person during one year by age group and travel mode Disaggregating the national trips into each TAZ, we can have the number of trips (all three travel modes) at the TAZ (MSA/Non-MSA) level, see Figure 7-19. As the figure shows that except some large metropolitan areas like New York, 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 Age1 Age2 Age3 Air Car Train N u m b er o f T ri p s (i n t h o u sa n d s) 0 2 4 6 8 10 12 14 16 Age1 Age2 Age3 Air Car Train A v er a g e N u m b er o f T ri p s/ P er so n 128 Washington D.C, Seattle, Los Angeles, Chicago, San Diego, and etc., most MSAs have fewer trips due to their smaller size in geography and population compared to Non-MSAs. The pattern that TAZ with smaller number of population usually have fewer trips than TAZ with large number of population is similar with the 2010 base year results. The zones with large population and high GDP (in east coast cities, Illinois, Ohio, Texas, and some large cities in west coast) usually have a larger number of trips during one year, while the zones in the states of Montana, North Dakota, South Dakota, and Wyoming produce a relatively small number of trips which can be explained by the small size of the population and low GDP of these states. Figure 7-19: Trips Originate/Destinate at MSA/Non-MSA level 129 7.3.2 Scenario 2: Fuel Price Increase This scenario assumes that the fuel price will increase which will affect the car driving cost and airfare. Details of this scenario are described in section 7.1. The driving cost increases $0.064/mile which means that making a long distance trip by car will incur $3.2 more at least (50 miles). And the longer distance they travel, the more people will pay for their driving cost. The air fare increases $0.03/mile, and compared to the relatively high base air fare the increase of the air fare is very insignificant. As the number of long distance activities is decided by the pre- calculated trip rate for each purpose, the number of long distance tours by purpose for each person will not be affected by the fuel price increase and it will be the same as that in 2040 base scenario. People’s choice at the tour level and stop level will be affected by the fuel price change. The model component that affects the number of long distance trips is the stop frequency model. It decides how many stops a person will make during the inbound/outbound long distance tour legs. The fuel price changes have impact on the number of stops indirectly through the variables (e.g. Car Mode, Time of Year, and Tour Duration) that are output of the high level model components. Summing all the trips from the trip tables by time of year, purpose, and travel mode, we can have a total of around 5.17 billion long distance trips per year, which is a 0.4% increase from base scenario. As the number of long distance tours for each person during one year will not change from base scenario to the scenario of fuel price increase, the number of trips will be affected by the simulation results of the stop frequency model. In the stop frequency model (Table 4-14 and Table 4-15), 130 people taking long distance travel by car has a larger impact on making one and two stops than making three and four stops. Therefore, the probability of people choosing three and four stops will increase under the circumstances of not taking car mode, which would directly increase the number of trips by air and by train. And it could make the total number of trips (all of the three travel modes) increase. Figure 7-20 shows the comparison of trip distribution by travel mode and time of year between base scenario (Base) and fuel price scenario (FP). As we can see that the trip distribution by time of year has the similar pattern as the base scenario for the three travel modes separately. And for car travel in fuel price scenario, the share of trips decreases proportionately for all the time periods, and the percentage of air trips and train trips increase proportionately for the four time periods under fuel price scenario. Consequently, the share of car trips among the total trips at the national level decreases by almost 5% due to the increased driving cost, while the percentage of air and train trips among the total trips increase by 4% and 0.3% (Figure 7-21). The fuel price increase can affect people’s choice of the travel modes, while it has little impact on people’s trip purpose choice. Therefore, the percentage of trips for different trip purposes doesn’t change much (see Figure 7-22). Combining the trip purpose and travel mode, we can generate the trip distribution by purpose and travel mode (Figure 7-23). The total number of car trips decreases under the fuel price scenario, and most of the decreased car trips are for business purpose, and then for personal business purpose. The number of pleasure trips by car increased a little by only 0.59%. In the stop purpose choice model, estimation results indicate that travel mode have a significant impact on people’s stop purpose. People do not prefer car driving for 131 business and personal business trips, and it will reduce the probability of people’s stopping for business and personal business during their long distance outbound tour leg. Therefore, as the driving cost increases, people would limit their stops if they travel by car and most of the stops are for business and personal business purposes. During the inbound tour leg, the stop purpose model estimation results show that if people travel by car, it is not likely that they would stop for business and personal business purposes. Therefore, as the number of people travel by car decreases due to fuel price increase, the number of people making stops for business and personal business will decrease as well during the inbound tour leg. It is possible that the number of pleasure trips have a slightly not significantly increase (0.59%) according to the model estimation results. Figure 7-20: Comparison of trip distribution by travel mode and time of year 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% FP Base 132 Figure 7-21: Comparison of trip distribution by travel mode Figure 7-22: Comparison of trip distribution by purpose 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% Car Air Train FP Base 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% Business Pleasure PB FP Base 133 Figure 7-23: Comparison of trip distribution by purpose and travel mode Table 7-1 presents the numeric comparison of trip changes for time of year, travel mode, and trip purpose between base scenario and fuel price increase scenario at the national level. It is observed that the total number of trips increased a little by 0.4% under the fuel price increase scenario, compared to the base scenario. The number of trips by time of year and trip purpose has little change. The number of trips by time of year changes by less than 1%, and the number of business trips and personal business trips changes by only around 0.1%, while the number of pleasure trips increases by 1.4%. The significant changes occurred for number of trips by travel mode, as the fuel price change directly influence the travel mode choice of people making long distance travel. The number of trips by car decreased by 5% under fuel price increase scenario compared to base scenario, while the number of trips by air and train increased by 28% and 42% respectively. Fuel cost is an important component of the driving cost. As the travel distance is longer, the fuel cost 0 200000 400000 600000 800000 1000000 1200000 1400000 FP Base N u m b er o f T ri p s (i n t h o u sa n d s) 134 will become more significant. Fuel price increase will definitely increase the driving cost of long distance travel which will make fewer people choose to travel by car. Compared to fuel cost change for driving and the relatively high air fare, the proposed amount of air fare increase due to the fuel price increase is insignificant. Therefore, it is possible that people will turn to air instead car for long distance travel. Meanwhile, the cost of travel by train doesn’t change at all which could lead to a larger percentage of increase in number of trips by train. Base Fuel Price Increase % difference (FP- Base)/Base # of trips % of trips # of trips % of trips (Q1,Q1) 1.43E+09 27.81% 1.43E+09 27.68% -0.10% (Q2,Q2) 1.18E+09 22.85% 1.18E+09 22.82% 0.26% (Q3,Q3) 1.32E+09 25.54% 1.32E+09 25.62% 0.72% (Q4,Q4) 1.23E+09 23.79% 1.23E+09 23.88% 0.76% Car 4.35E+09 84.38% 4.14E+09 79.99% -5% Air 7.68E+08 14.91% 9.83E+08 19.00% 28% Train 36786395 0.71% 52312913 1.01% 42% Business 9.21E+08 32.65% 9.2E+08 32.85% 0.1% Pleasure 1.46E+09 51.88% 1.44E+09 51.54% 1.4% PB 4.37E+08 15.47% 4.37E+08 15.61% -0.1% Total trips 5151015931 100% 5171431217 100% 0.4% Table 7-1: Comparison of trips between Base Scenario and Fuel Price Increase Scenario The following figures show the distributional impacts of the fuel price increase on the number of trips by income group, age group, and gender. Figure 7-24 shows that high-income group (Inc3) people altogether reduced more car trips and generated more air trips and train trips than the other two groups when fuel price increases. It is also observed that on an average, a person with high income will cut slightly more car trips and have slightly more air trips and train during one year, 135 compared to person with low income and medium income. Meanwhile, male population is more sensitive to the fuel price change than female population. Under the fuel price increase, male population will limit their trips by car more than female population (Figure 7-25), and turn to air mode for more trips than female population. Male population will also generate more train trips than female population. The distributional impact of the fuel price increase on age group (Figure 7-26) also shows that the middle-age group people is more sensitive to the fuel price increase than young-age group people and old-age group people. The middle-age group people altogether reduced car trips and increased air and train trips to a larger extent than the young-age group and old-age group people. And generally the young-age group people are less sensitive than middle-age group people but more sensitive than old- age group people. From Figure 7-26, we can see that if fuel price increases the old- age group would also reduce their car trips and increase their air and train trips, but at a minimum degree compared to the other two age groups. According to the base scenario analysis, we can see that the male, the high-income group, and the middle age group are the main population who make a lot of long distance trips per year. Therefore, it is reasonable that when fuel price increases, the total number of trips made by the three groups respectively would reduce/increase more than the peer groups. 136 Figure 7-24: Comparison of trip distribution by income group and travel mode Figure 7-25: Comparison of trip distribution by gender and travel mode 0 500000 1000000 1500000 2000000 2500000 3000000 Inc1, air Inc1, car Inc1, train Inc2, air Inc2, car Inc2, train Inc3, air Inc3, car Inc3, train FP Base 0 500000 1000000 1500000 2000000 2500000 FP Base N u m b er o f T ri p s (i n t h o u sa n d s) 137 Figure 7-26: Comparison of trip distribution by age group and travel mode Figure 7-27: Comparison of miles by car per person during one year by income group 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 FP Base N u m b er o f T ri p s (i n t h o u sa n d s) 0 500 1000 1500 2000 2500 3000 3500 Inc 1 Inc 2 Inc 3 FP Base M il es /p er so n d u ri n g o n e y ea r 138 Figure 7-28: Comparison of miles/car trip by income group Figure 7-27 and Figure 7-28 shows the impact of the fuel price increase on car driving by income group. It is seen that when the fuel price increases, not only the total number of car trips is decreased for all the three income groups but also the average driving miles of each people during one year is shrunk for all groups. Specifically, a person in middle income group would reduce an average of 185.78 miles of car travel per year, while a person in high income group would reduce an average of 176.71 miles and a person in low income group with an average of only 111.46 miles. And a high income person still has the largest average driving miles per year under the fuel price increase scenario. Although high income person generate more car trips than low income and middle income people, their travel distance per trip is shorter (Figure 7-28). The low-income person has the longest travel distance per trip. With the travel distance longer, high income will choose air instead of car for travel, while the low income is less sensitive to the driving distance due to their low income level and high air fare. It will make them more tolerant of the car driving 275 280 285 290 295 300 305 310 315 320 Inc 1 Inc 2 Inc 3 FP Base M il es /T ri p 139 distance than the high income person, and it also means that low income person will be more sensitive to the driving cost per trip. Consequently, as the fuel price increases and driving cost increases, the low income person would travel less than before, reducing more miles per trip (average of 5.48 miles/trip) than middle income (4.06 miles/trip) and high income (1.05 mile/trip). 7.3.3 Scenario 3: High Speed Rail In this scenario, the travel time between New York and Washington DC including a stop in Philadelphia is reduced to 96 minutes, travel time between New York and Philadelphia will be 1 hour, and the travel time between Washington D.C and Philadelphia will be 0.6 hours. As the high speed rail provides a higher level of service in terms of the travel time which decreases a lot compared to the regular train travel, a 30% of travel cost increase of the high speed rail (HSR) is also assumed to reflect the improved service. Therefore, all the input data (OD skim data, TAZ economic/demographic data) are the same as the 2040 base scenario except the travel time and cost between New York and Washington DC, New York and Philadelphia, as well as Washington D.C and Philadelphia are altered. As the number of long distance activities each person made is not sensitive to the travel time and cost, the number of long distance activities by purpose is the same as the 2040 base scenario for each person. The tour level and stop level choice could be affected by the changes. It is expected that with the travel time decrease of HSR, more people would turn to train for their travel instead of car and air. However, as the train network only covers a limited area and people’s preference for car and air, train is still not the popular travel mode in the U.S. Since the train travel time decrease only occurs in one 140 corridor, we don’t expect significant changes in the number of car trips and air trips at the national level. Table 7-2 summarizes and compares the number of trips for different categories between High Speed Rail (HSR) scenario and Base scenario at the national level. As we can see that, the total number of trips has little change under HSR, with trips increasing by only 0.004%. Comparing the number of trips and trip distributions by time of year and purpose for both scenarios, we can find out that the travel time increase of the northeast corridor train service has little impact on the number of trips (trip distribution) by time of year and purpose at the national level. The significant change is observed for the number of train trips. Compared to base scenario, the travel time increase of the northeast corridor train service makes the total number of train trips increased by 5.79%, though the HSR is only open for three lines of the northeast corridor. The HSR we assumed operates between Washington D.C, New York, and Philadelphia, and these three MSA areas are main tourism places with a lot of population and are transportation hubs in the train network which collect to other train lines. Therefore, people would consider taking train as their long distance travel mode, if part of their long distance trips fall in the lines between Washington D.C, New York, and Philadelphia. It will definitely increase the number of trips at the national level. High Speed Rail (HSR) Base % difference (HSR- Base)/Base # of Trips % of Trips # of Trips % of Trips (Q1,Q1) 1.43E+09 27.81% 1.43E+09 27.81% 0.00% (Q2,Q2) 1.18E+09 22.86% 1.18E+09 22.85% 0.00% 141 (Q3,Q3) 1.32E+09 25.54% 1.32E+09 25.54% 0.00% (Q4,Q4) 1.23E+09 23.79% 1.23E+09 23.79% 0.01% Car 4.34E+09 84.34% 4.35E+09 84.38% -0.04% Air 7.68E+08 14.91% 7.68E+08 14.91% -0.01% Train 38918060 0.76% 36786395 0.71% 5.79% Business 9.2E+08 32.85% 9.2E+08 32.85% 0.00% Pleasure 1.44E+09 51.54% 1.44E+09 51.54% 0.01% PB 4.37E+08 15.61% 4.37E+08 15.61% 0.00% Total Trips 5151212120 100% 5151015931 100% 0.004% Table 7-2: Comparison of trips between High Speed Rail Scenario and Base Scenario As little changes are observed between HSR scenario and Base scenario in trip distribution by time of year and trip purpose for car mode and air mode, only the train trip distribution between HSR scenario and Base scenario are compared for time of year and trip purpose. The number of train trips is increased for all the time periods (four quarters) but by different percentages (Figure 7-29). The train trips in Quarter 1 is increased the most by 6.58% under the HSR scenario, which makes the Quarter 1 becomes the most popular time period for train travel during one year while in Base scenario Quarter 3 is the most popular time period for train travel. The number of trips in Quarter 2 is increased by 6.14%, but it is still the least popular time for travel under HSR scenario. Meanwhile, the number of trips in Quarter 3 and Quarter 4 are increased by 4.95% and 5.56% respectively. Figure 7-30 shows the comparison of train trip by trip purpose between Base scenario and HSR scenario. The operation of HSR in the northeast corridor boosts the train travel and increases the total number of train trips. Among all the three trip purposes (business, person business, and pleasure), pleasure travel is affected by HSR the most and the largest number of trips is 142 increased for pleasure purpose. And then the business trip follows the pleasure trip as the second affected purpose. The HSR has little impact on personal business travel. Figure 7-29: Comparison of train trip distribution by time of year Figure 7-30: Comparison of Train Trips by trip purpose 0 2000000 4000000 6000000 8000000 10000000 12000000 Q1 Q2 Q3 Q4 HSR Base N u m b er o f T ra in T ri p s 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 16000000 Business Pleasure PB HSR Base N u m b er o f T ra in T ri p s 143 Analyzing the trip changes by travel mode at TAZ level under HSR scenario, we present Table 7-3 showing the percentage of trip changes by travel mode for TAZs of which the train trips is increased by above 0%. Taking a view of the table, we can find out that the increased train trips mainly is concentrated along the northeast corridor and the TAZs that are connected with or close to Washington D.C, Philadelphia, and New York under HSR scenario. The largest percentage of increase in number of train trips mainly occurred at stations between Washington D.C, MSA and New York, MSA (Philadelphia, MSA, Trenton, MSA), and including D.C, MSA and New York, MSA. Philadelphia has the largest percentage of increase in number of train trips (43.76%), due to its location between the Washington D.C and New York and it connectivity to multiple rail lines. The train trips from/to New York, MSA and Washington D.C are increased by 38.64% and 22.86% respectively. To the contrary, the number of car trips and air trips of the TAZs has little changes, decreasing by less than 0.5%. And generally, the number of trips by car is decreased more than the number of trips by air, which can tell that car travel is the main competitor of train travel. TAZ Name Car Air Train (HSR-Base)/Base Philadelphia, PA-NJ MSA -0.46% -0.08% 43.76% New York, NY MSA -0.47% -0.11% 38.64% Trenton, NJ MSA -0.04% 0.00% 36.17% Washington, DC-MD-VA- WV MSA -0.28% -0.07% 22.86% Lancaster, PA MSA -0.03% -0.19% 20.88% Wilmington-Newark, DE-MD MSA -0.05% 0.00% 15.31% MD Non-MSA -0.12% -0.15% 14.15% Baltimore, MD MSA -0.01% -0.03% 13.25% 144 Bridgeport, CT MSA -0.05% -0.01% 11.45% Harrisburg-Lebanon-Carlisle, PA MSA -0.05% -0.12% 9.62% Atlantic-Cape May, NJ MSA -0.08% 0.00% 8.79% PA Non-MSA -0.01% -0.038% 2.91% Table 7-3: Percentage of trip changes by travel mode between HSR scenario and Base scenario Table 7-4, Table 7-5, Table 7-6 and Table 7-7 show the trips changes among Washington D.C, New York, and Philadelphia by trip purpose and travel mode. From Table 7-4, we can know that the total number of trips (including business, personal business, pleasure, and return to home trips) between the TAZs has more or less decreased, while the number of business, personal business and pleasure trips between the TAZs has increased except the ones from Philadelphia, MSA to Washington D.C MSA. The trips from T361 to T258, for example, means that although the business, personal business and pleasure trips has increased from Washington D.C (T361) to New York (T258), fewer people with tour originating from New York will choose Washington D.C as their last stop back home under the HSR scenario. TAZTAZ Total Business PB Pleasure T361-T258 -52522 8109 1889 33156 T361-T276 -14865 24881 1068 5598 T258-T361 -57107 9551 1311 42756 T258-T276 -18437 21808 1888 10314 T276-T258 -15197 13921 2085 7967 T276-T361 -25096 -10658 -319 -2087 Note: T361- TAZ number of Washington, DC-MD-VA-WV MSA; T258- TAZ number of New York, NY MSA; T276- TAZ number of Philadelphia, PA-NJ MSA; Changes of trips=number of Trips under HSR scenario – number of Trips under Base scenario 145 Table 7-4: Trip Changes between TAZs by Trip purpose Table 7-5, Table 7-6 and Table 7-7 show the number of trips changes among the three TAZs by trip purpose and travel mode. When the HSR is operated along the northeast corridor between Washington D.C and New York in terms of travel time increase in existing rail lines, more people will choose HSR to travel between the three TAZs while fewer people will choose car and air to travel. And the changes in number of trips between TAZs are consistent with the changes of trip from/to the TAZs. Although the total number of trips (Table 7-5) is decreased between the TAZs (Washington D.C, New York, and Philadelphia), the total number of trips by train is increased (Table 7-5). The number of business, personal business and pleasure trips by train is also increased to different degrees. It is observed that the increased train trips between the TAZs are mainly for business and pleasure purposes, and only the smallest share of the increase train trips is for personal business. And a larger number of train trips are increased between Washington D.C and New York, due to the fact that the Washington D.C and New York are two large cities attracting a large number of trips going in and out. Between the pair of Washington D.C and New York and the pair of Washington D.C and Philadelphia, more people choose to travel by train for pleasure purpose, as it is shown that the largest number of train trips is increased for pleasure purpose (179712 trips and 248059 trips). Between New York and Philadelphia, more increased train trips are for business purpose than for pleasure purpose. The percentage of change in number of train trips is also high. The high percentage of changes in train trips between New York and Philadelphia is due to the small value of the base year train trips. 146 The total number of car trips between the TAZs is observed to be decreased a lot more than the number of air trips between the TAZs (Table 7-6 and Table 7-7), which again indicates that between these TAZs train mode and car mode are mainly compete with each other while air doesn’t have many advantages over train and car between these TAZs. The number of car trips (Table 7-6) between Washington D.C and New York is decreased the most, and the largest share of the decreased car trips are for pleasure purpose. The change of car trips between New York and Philadelphia has a similar pattern with the change between New York and Washington D.C but at different degrees. The largest share of the decreased car trips between New York and Philadelphia are for pleasure purpose, followed by business purpose. Although the number of pleasure trips are decreased more than business trips for both pairs (Washington D.C and New York, and New York and Philadelphia), the difference between increased pleasure trips and increased business trips for the pair of Washington D.C and New York is larger than the difference for the pair of New York and Philadelphia. The number of car trips between Washington D.C and Philadelphia is increased a little for business purpose and personal business (only on the side from Washington D.C to Philadelphia). The increase of the car trips is small in terms of absolute value and percentage change. Although the number of car trips is decreased a lot in terms of absolute value (above 40,000 trips) between TAZs, the percentage of change is only no more than 2% due to the originally large number of car trips between the TAZs in base scenario. Very few changed car trips are seen for personal business purpose. 147 Compared to trips made by car and train, the trips made by air between the TAZs are the least affected (Table 7-7) in terms of the absolute value changes. One of the reasons for this is that the air mode is not the popular travel mode between these TAZs even before the HSR is operated. Although the travel time is increased a lot due to HSR, HSR still cannot attract many trips from air by which a small number of people travel between these TAZs. Among the pairs of the TAZs, the pair between Washington D.C and New York has the largest number of air trips, and the pair between New York and Philadelphia has the smallest number of air trips. It is seen that few people will choose to travel by air between New York and Philadelphia for any purpose even at the base scenario. Therefore even though only few trips are changed (less than 5 trips), the percentage of trip changes is not small (up to 7.1%) due to the small base value. The air trips between New York and Washington D.C are decreased by around 1% (above 4000 trips at the most) for almost all the purposes except for personal business purpose on the rail line from New York to Washington D.C. which has an increase by 0.2% (203 trips). As the travel time of the train is increased, the air line from Philadelphia to Washington D.C loses the largest number of trips (22972 air trips) for business purpose which is about 13% of decrease from base scenario. But in the opposite direction (from Washington D.C to Philadelphia), there is an increase of air trips for business (1578 trips) and personal business (77 trips) purposes by less than 1%. Looking at the national travel demand model structure and the simulation flow, we found out that the number of long distance activities per year for each person is not sensitive to any of the two scenarios in our research which means that the number 148 of long distance activities by purpose keeps the same for all the three scenarios. The travel mode is determined before the stop frequency and stop location choice, and will be through the entire tour. Therefore, we can see that as the HSR is operated, more people would choose to travel by train when they travel around the northeast corridor. And once people choose to travel by train for their long distance tours, they are more likely to make stops in the TAZs where the HSR is connected for all purposes. As the main competitor for train in the northeast corridor is the car travel, more people would choose to travel by train instead of car which decrease the number of car trips. And people are generally less likely to stop in the three TAZs if they travel by car. TAZTAZ Total Business PB Pleasure Changes of Train Trips T361T258 367731 37357 2495 179712 T361T276 151171 10446 801 74760 T258T361 358812 45753 2693 248059 T258T276 139599 56758 3767 50108 T276T258 140480 51489 4371 51308 T276T361 140548 11985 354 108534 Percentage of Change in Train Trips T361T258 109.2% 49.5% 34.9% 199.7% T361T276 58.2% 35.0% 13.8% 81.5% T258T361 107.1% 50.4% 36.2% 226.4% T258T276 146.3% 784.5% 108.8% 89.7% T276T258 158.1% 1030.6% 131.5% 106.8% T276T361 55.3% 31.8% 5.0% 89.1% Note: Changes of trips=number of Trips under HSR scenario – number of Trips under Base scenario Percentage of Change=Change of trips/number of trips under Base scenario Table 7-5: Changes of number of train trips by trip purpose between TAZs TAZTAZ Total Business PB Pleasure Changes of Car Trips 149 T361T258 -409744 -26987 -92 -144338 T361T276 -164172 12849 190 -67164 T258T361 -405003 -33939 -1585 -200790 T258T276 -157976 -34949 -1879 -39791 T276T258 -155677 -37570 -2285 -43340 T276T361 -163108 329 -612 -108782 Percentage of Change in Car Trips T361T258 -2.0% -0.7% 0.0% -3.9% T361T276 -1.2% 0.4% 0.0% -2.9% T258T361 -2.0% -0.7% -0.1% -4.1% T258T276 -1.2% -1.1% -0.2% -1.4% T276T258 -1.1% -1.2% -0.2% -1.3% T276T361 -1.2% 0.0% -0.1% -3.2% Table 7-6: Changes of number of car trips by trip purpose between TAZs TAZTAZ Total Business PB Pleasure Changes of Air Trips T361T258 -10509 -2261 -514 -2218 T361T276 -1864 1586 77 -1998 T258T361 -10916 -2263 203 -4513 T258T276 -60 -1 0 -3 T276T258 0 2 -1 -1 T276T361 -2536 -22972 -61 -1839 Percentage of Change in Car Trips T361T258 -0.8% -0.6% -0.3% -0.8% T361T276 -0.2% 0.8% 0.1% -0.8% T258T361 -0.9% -0.6% 0.2% -1.5% T258T276 -6.4% -1.8% 0.0% -7.1% T276T258 0.0% 5.4% -4.5% -4.2% T276T361 -0.4% -13.0% -0.1% -1.0% Table 7-7: Changes of number of air trips by trip purpose between TAZs Table 7-8 presents the share of the trips by travel mode between TAZs under two scenarios (HSR and Base). Summing the percentage values with the same purpose and same TAZs pair of the three travel modes will give us the total (100%) 150 trips of the purpose between the TAZ pair. For example, from Washington D.C to New York (T361T258), the sum of business train trip share (2.6%), business air trip share (9.41%), and business car trip share (87.98%) will be the total business trips percentage from Washington D.C to New York. Taking an overall view of the table, we can see that although the HSR is operated in terms of travel time increase between the three TAZs, the most popular travel mode is still the car mode. The share of the train trips between the TAZs has more or less increased for all the purposes under the HSR scenario. The share of air trips between the TAZs is decreased a little by no more than 0.5% for all the purposes except that the share of the air trips from New York to Washington D.C for personal business is increased by 0.01%. The share of car trips is also observed to have decreased for almost all the TAZ pairs except the pair from Philadelphia to Washington D.C for business purpose which is increased by 0.24%. After the HSR is operated, the share of the train trips is increased for all purposes between the TAZ pairs. The share of the train trips between some pairs has increased and exceeded the share of the air trips, while most of the pairs keep the original pattern under base scenario. For example, under base scenario, from Philadelphia to Washington D.C, 3.3% of pleasure trips are made by train and 4.79% of pleasure trips are made by air. When the travel time of train travel is increased between the two TAZs, there are more people choosing to travel by train. And there are more pleasure trips made by train (6.24%) than by air (4.74%). For the other pairs, the comparison between the share of the train trips and air trips under HSR scenario keeps the same pattern as under Base scenario. It means although the share of the train trips is increased 151 compared to base scenario, it is still smaller (larger) than air trips under HSR scenario if it is smaller (larger) under the base scenario. TAZTAZ The Share of Train Trips HSR Scenario Base Scenario Total Business PB Pleasure Total Business PB Pleasure T361T258 3.26% 2.60% 0.53% 6.61% 1.55% 1.74% 0.39% 2.23% T361T276 2.72% 1.22% 0.81% 6.17% 1.72% 0.91% 0.71% 3.41% T258T361 3.24% 2.66% 0.53% 6.70% 1.56% 1.77% 0.39% 2.07% T258T276 1.71% 1.94% 0.74% 3.71% 0.69% 0.22% 0.36% 1.96% T276T258 1.66% 1.84% 0.54% 2.96% 0.64% 0.16% 0.23% 1.44% T276T361 2.70% 1.15% 0.63% 6.24% 1.73% 0.87% 0.60% 3.30% TAZTAZ The Share of Air Trips HSR Scenario Base Scenario Total Business PB Pleasure Total Business PB Pleasure T361T258 6.12% 9.41% 8.94% 6.68% 6.15% 9.48% 8.98% 6.79% T361T276 4.92% 6.16% 6.36% 9.12% 4.93% 6.16% 6.36% 9.21% T258T361 5.72% 6.98% 6.48% 5.69% 5.76% 7.04% 6.47% 5.82% T258T276 0.01% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% T276T258 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% T276T361 3.87% 3.56% 5.73% 4.74% 3.88% 4.08% 5.74% 4.79% TAZTAZ The Share of Car Trips HSR Scenario Base Scenario Total Business PB Pleasure Total Business PB Pleasure T361T258 90.62% 87.98% 90.53% 86.71% 92.29% 88.77% 90.63% 90.99% T361T276 92.36% 92.62% 92.83% 84.71% 93.36% 92.93% 92.93% 87.38% T258T361 91.04% 90.36% 92.99% 87.61% 92.68% 91.19% 93.14% 92.11% T258T276 98.28% 98.06% 99.25% 96.29% 99.30% 99.78% 99.64% 98.04% T276T258 98.34% 98.16% 99.46% 97.04% 99.36% 99.84% 99.76% 98.56% T276T361 93.43% 95.29% 93.64% 89.02% 94.39% 95.04% 93.67% 91.92% Table 7-8: Comparison of trip shares by travel mode between HSR and Base scenario Table 7-9 shows the distributional impact of HSR on number of trips at the national level by income group, gender, age group and travel mode. As we can see that although the number of car trips and air trips decreased to different degrees, the share of the air and car trips for different categories (income, gender, and age) has no 152 change (by air and car) or slightly decreased (by car). And the share of car trips is decreased by less than 0.05% for different categories. For example, the share of air trips by all the income groups and the share of car trips by low-income and middle- income groups don’t have any changes, while the share of car trips by high-income is decreased by 0.03%. The number of train trips made by high-income group is increased the most, and the number of train trips made by low-income group is increased the least. However, the share of the train trips by low-income and middle-income group has the same change by 0.01% from base scenario to HSR scenario, while the share of high- income group has increased by 0.03%. Among the train trips, the high-income group still has the largest share under the HSR scenario (0.48%). Although the female group has more train trips increased than the male group, the share of the train trips made by female and male increased by the same percentage (0.02%) under HSR scenario. The male group still takes up the larger share of train trips under both base scenario and HSR scenario. The middle-age group (Age2) has the largest share of the trips for all the three travel modes, and the group also has the largest increase in number of train trips among the three age groups. The number of train trips made by old-age group is increased the least. Although the number of train trips made by middle-age group increased more than the young-age group, the share of the train trips by the two groups has increased by the same percentage (0.02%). As the HSR is operated in terms of travel time increase in the existing northeast corridor rail lines, it has a very small impact on number of trip by income, gender and age group at the national level. The car is still the most popular travel 153 among all the groups. Although the number and share of the train trips is increased, it still has the smallest share among the three travel modes at the national level. 154 Category Base HSR Changes in # of Trips % of trips % of trips (HSR trips – Base trips) Inc1, air 0.84% 0.84% -5795 Inc1, car 5.59% 5.59% -87273 Inc1, train 0.04% 0.05% 103475 Inc2, air 4.48% 4.48% -21296 Inc2, car 25.74% 25.74% -393541 Inc2, train 0.22% 0.23% 465063 Inc3, air 9.59% 9.59% -72507 Inc3, car 53.04% 53.01% -1355064 Inc3, train 0.45% 0.48% 1563127 Total 100% 100% - Male,air 8.86% 8.86% -61495 Male,car 44.71% 44.69% -902927 Male,train 0.39% 0.41% 1058576 Female,air 6.05% 6.05% -38103 Female,car 39.66% 39.64% -932951 Female,train 0.32% 0.34% 1073089 Total 100% 100% - Age1,air 4.47% 4.47% -39611 Age1,car 25.08% 25.07% -593362 Age1,train 0.21% 0.23% 689245 Age2,air 6.68% 6.68% -37891 Age2,car 36.55% 36.54% -784857 Age2,train 0.31% 0.33% 917319 Age3,air 3.76% 3.76% -22096 Age3,car 22.75% 22.74% -457659 Age3,train 0.19% 0.20% 525101 Total 100% 100% - Table 7-9: Comparison of percentage of trips and trip changes between HSR and Base 155 Chapter 8: Long Distance Travel Survey Instrument The ability to collect and analyze trip data plays a critical role in the success of travel demand modeling at both the statewide and national levels. The most recent sources of long distance passenger trip data in the U.S are the 1995 American Travel Survey (ATS) conducted by the Bureau of Transportation Statistics (BTS) and 2001/2009 National Household Travel Survey (NHTS). However, as known to all, the ATS data are more than 20 years old and have a limited sample size, which will limit its applicability for future long distance passenger travel analysis in the U.S. The Federal Highway Administration (FHWA) has planned the development of a new set of travel surveys – the next round of the National Household Travel Survey (NHTS) and the next iteration of a long distance travel survey of the U.S. household population. It pushes the current survey research by identifying the novel, innovative and cost effective methods to capture the data and improve estimates in future FHWA long distance household travel studies. With the new long distance survey data, we can expect more applications for future long distance passenger travel analysis and conduct the proposed national travel demand modeling research. The techniques and methods that are utilized in the travel survey have evolved over the past decades. Prior to 1970, most national surveys of the U.S. population were conducted through in-person, interviewer administered methods. Self- administered surveys followed, with widespread adoption of telephone-based survey methods occurring in the 1980s to 1990s. These traditional long distance travel surveys at the household level are capable of collecting most of the information 156 required for travel analysis and modeling. However, collecting certain travel-related data can place a large burden on respondents at a relatively high cost, and the resulting reporting and measurement errors associated with these data can decrease the overall data reliability. In addition, the low frequency of long distance trips for most households makes it difficult and costly to acquire a sufficiently large sample of long distance trips. Consequently, researchers are turning to advanced travel survey methods using GPS technology, smartphone, social media, etc., to provide the temporal-spatial information on travel more accurately than traditional surveys. The funded FHWA project “Design of a completely new approach for a national household travel survey instrument” designed, developed, and tested new technology and applications to collect survey data that could improve data quality, response rates, respondent burden, and bias reductions. In the project, the long distance travel study continued to feature a probability sample design as its core component. However, the affordability of the probabilistic-based survey will come into question. And the sample size available under these high-priced designs will be unable to achieve the target precision objectives required for the long distance travel study. Therefore, a non-probabilistic-based survey component as an inexpensive supplement to a probabilistic core sample is adopted for data collection in the project (Battelle Memorial Institute et al. 2013). The non-probability sample could provide a large sample size and improve the survey data quality. In the non-probabilistic-based survey, respondent’s burden can be reduced by implementing passive data collection using technologies such as GPS, smartphone, social-media and etc. for triggering the presence of a long distance trip and collecting information in real time on these trips. 157 However, the non-probabilistic survey methods based on new technologies cannot provide all the necessary long distance trip information such as travel mode, trip purpose, and travel time that are important components in national travel demand analysis. Therefore, the practical post-processing methods that can generate data on these missing travel characteristics are needed to supplement the data that are passively collected from the GPS/smartphone/social media-based survey. Compared to the travel mode and the travel time that can be easily estimated based on the collected spatial-temporal information, the long distance trip purpose needs to be estimated with efforts. In this chapter, we propose post-processing methods (based on machine learning techniques) that can estimate trip purpose for the non-probabilistic-based long distance travel survey with the new technologies. Available datasets, including travel survey data and other supplementary data, are employed to test and validate the methods. In addition, the research aims to provide the support tool for long distance travel data collection and the sound methodology needed for post-processing the GPS-, smartphone-, and social media-based travel survey data. We also consider alternative trip purpose categorization schemes and the effects of different attributes on trip purpose imputation for long distance travel. Model performance under these alternative schemes and with different attributes is tested in order to provide comprehensive information for the design of future long distance travel surveys. 158 8.1 Methodology for Long Distance Trip Purpose Classification The trip purpose detection system consists of four parts: model inputs, learning process, model output, and validation (Figure 8-1). Model inputs include travelers’ trip spatial-temporal data, land use data, and individuals’ social- demographic and economic attributes provided by travel recall survey if there is recall in the travel survey. The learning process employs machine learning methods and implements trip purpose detection algorithms. Once trip purposes are derived and output from the model that implements these machine learning methods, the validation component evaluates the classifier performance and the reliability of the results. Figure 8-1: Trip Purpose Learning System In the learning process part of the trip purpose detection system, multiple machine learning methods (e.g. decision tree learning and meta-learning) are employed and tested for trip purpose imputation. The purpose of testing is to identify the classifier that achieves the best performance. Furthermore, alternative trip purpose GPS Geospatial Data Trip Information GIS Land Use Data Individual Characteristics Derived Trip Purpose Travel Recall Survey Reported Trip Purpose Validation Trip Purpose Estimation Machine Learning Methods Input Learning Process Output 159 categorization schemes for long distance travel have been developed and tested step by step for six different sets of trip purpose classifiers, or “models,” from binary-class to multi-class (Table 8-1). The following sections detail these machine learning methods. Reported Trip Purpose Decoded Trip Purpose (# Categories) Model 1 (2) Model 2 (3) Model 3 (4) Model 4 (3) Model 5 (2) Model 6 (3) Business Business Business Business Business Business Business Combined Business/Pleasure (B/P) Business Business Business Combined B/P Non- Business Pleasure Convention,Conference, or Seminar Business Business Business Business Business Business School-related activity Non- Business Personal Business Personal Business Non- Business Non- Business Personal Business Visit relatives or friends Non- Business Pleasure Social Visit Non- Business Non- Business Pleasure Rest or relaxation Non- Business Pleasure Leisure Non- Business Non- Business Pleasure Sightseeing, or to visit a historic/scenic attraction Non- Business Pleasure Leisure Non- Business Non- Business Pleasure Outdoor recreation Non- Business Pleasure Leisure Non- Business Non- Business Pleasure Entertainment Non- Business Pleasure Leisure Non- Business Non- Business Pleasure Shopping Non- Business Pleasure Leisure Non- Business Non- Business Pleasure Personal, family or medical Non- Business Personal Business Personal Business Non- Business Non- Business Personal Business Other Non- Business Deleted Deleted Non- Business Non- Business Deleted Table 8-1: Six Sets of Long Distance Trip Purpose Categorization Schemes 8.1.1 Decision Tree Learning Decision tree learning involves using a series of input attributes to construct a decision tree for classifying trip purposes (Witten & Frank, 2005). These attributes 160 include trip characteristics derived from the add-on trip location data, GIS-based land use type, and the individual’s social-demographic attributes. The widely used decision tree algorithm in practice is C4.5. It is introduced by J. Ross Quinlan in 1993. C4.5 algorithm utilizes the information gain to split each node. It chooses the attribute at each node that produces the purest daughter node to split on. The information is a measurement of purity. The daughter nodes in the sub-tree will be split based on the same procedure, until all the instances at a node reach the same classification. Given a training data set S and an attribute set A (a1,a2,…an), the n attributes create branches and partition the training data set S of trip information into n different subdivisions sets (V1, V2,…Vn). The number of leaf nodes (L) denoted as v in subdivision Vi varies by the split attribute. The information gain of each attribute in the attribute set A will be calculated and the attribute with the largest information gain will be chosen to split on. The information gain is represented in Formula (8.1). Gain(S, ai)=Info(S) –Average [Info(L1,Vi), Info(L2,Vi), …, Info(Lv,Vi)] (8.1) Where Gain(S, ai) represents the information gain of the attribute ai in the data set S. Info(S) refers to the information value of the data set S. (Lj, Vi) represents the leaf node Lj in subdivision Vi; Info(Lj, Vi) is the information value of leaf node Lj in subdivision Vi resulting from the data split on attribute ai. The term Average [Info(L1,Vi), Info(L2,Vi), …, Info(Lv,Vi)] on the right hand side in the formula is a weighted average linked to the number of instances at each leaf node. It represents the amount of information expected to be necessary to determine the class of a new 161 instance, given the tree structure. Therefore, the information gain of each attribute in attribute set A based on data set S can be generated, and the attribute with the largest information gain will be selected to be split on. Using this basic framework, each attribute in set A would be split recursively so that the information gain can reach the maximum value at each node of the tree, until all the instances at each leaf node will have only one classification. Pruning a decision tree is a technique that reduces the size of the tree by cutting off some nodes from the tree which have little power in instance classification. Employing pruning in decision tree model can improve the computational efficiency and accuracy, and can reduce the complexity of the tree to avoid the problem of the data set over-fitting as well. The pruning methods applied to the trip purpose decision tree in the research are post-pruning and on-line pruning. Post-pruning, a bottom up pruning strategy, is executed based on a built decision tree. The relative frequencies of leaf nodes are calculated and compared, and any leaf node with dominant classification will result in a pruning of the parent node. Afterwards, error estimates of the replacement node and the old parent node would be compared to evaluate whether the pruning is advantageous. On-line pruning is different from the post- pruning in the time of pruning, and the former one implements pruning while the decision tree is being built. When a split is made on a certain node which we discussed in the Trip Purpose Estimation part, several children leaf nodes will be generated. Once a child leaf node owns less than a minimum number of instances, the parent node and its children leaf nodes will be compressed into a single node. The pruning process continues until the completion of the entire tree. 162 The method used to estimate the error rate associated with the decision tree learning technique is the 10-fold cross-validation approach. The full sample size is randomly divided into 10 parts, where each part has the same proportion of classes. Every part is held alternately, and the remaining nine parts are trained by the learning algorithm, so that the error rate of the held part can be calculated. The learning procedure is repeated 10 times with different training sets. The overall error rate is equal to the average of the 10 error rates. 8.1.2 Metaleaner Meta-learning implies learning from the learned knowledge. Here, learning occurs from the classifiers produced by the inducers and from the classifications of these classifiers on training data. The main objective of meta-learning is to implement a number of base learning processes on a number of data subsets, and to integrate the knowledge obtained from the separately learned classifiers by an extra level of learning to boost the overall estimation accuracy. Ensemble methods, which are typically employed for classification and combine multiple base classifiers results, are one type of meta-learning methods. Bagging or Bootstrap Aggregating is the specific ensemble method used in the trip purpose detection system. It builds data subsets by bootstrap sampling, trains the multiple classifiers grounded on these data subsets, and predicts (tests) by majority vote for classification. When larger variance exists in the training data set and the base classifier is over-fitted, the bagging method works well by decreasing the 163 variance without changing the bias. However, if the base classifier is under-fitting, bagging will not improve the predictive accuracy much. 8.2 Data for Long Distance Trip Purpose Classification Trip data from the 1995 ATS were used to derive trip purpose imputation. More information about the data can be found in Section 3.1. Information on all trips is employed to help derive the trip purpose. In addition to the primary long distance trip characteristics, additional information including stops to the destination, stops from the destination, and the side trips at the destination are provided. These include the stop location at metropolitan area level and state level, travel mode used to get to the stop, reason for the stop, the number of nights at the stop, and the lodging type at the stop. Because long distance travel represents a wide coverage area, land use data at the national level is required. Sources of land use data included land use type and intensity at the state, zone, parcel, block and building levels from local, metropolitan and state planning agencies, and graphic/digital land use information and other geospatial information. Because no geo-coded address information is available for trips in the 1995 ATS, land use data at a more aggregate level are adequate and suitable under the premise of providing the destination state or metropolitan area. This research used the NOAA Coastal Assessment and Data Synthesis System as a source of land use data at the national level. It provides the area and the corresponding percentage of different land use types by state. A total of 39 land use types can be combined into 10 land use classes. According to the particular objective 164 of the research, if the 10 land use classes would yield over-utilization, they are further aggregated into three land use covers including urban, agriculture, and nature. In order to better derive long distance trip purpose, supplementary data such as travel and tourism statistics data as well as Gross State Product (GSP) data were collected and employed. It is hypothesized that people who travel to states having a higher travel and tourism population are more likely to travel for pleasure and visiting. Similarly, states having a higher GSP tend to have more enterprises and easier accessibility, leading to a higher possibility of attracting business trips. The travel and tourism data were obtained from the U.S. Census Bureau. They include yearly recreation visits in national parks and state parks by state. Meanwhile, the Gross State Product data in 1995 were obtained for each state from the Bureau of Economic Analysis. To derive trip purposes for long distance travel, various model input variables were proposed in four categories: trip-related variables, respondent characteristics, land use attributes, and other supplementary data. A list of the model variables is found in Table 8-2. Variable Name Description HHIncome Household Income Age Respondent's Age Race Respondent's Race EducAttainment Respondent's education level Activity Activity of Respondent TrParty Travel party size TrPrHousePercent Percentage of Adult Household Members in Travel Party TrPrTyCh Children Under 18 Years in the Travel Party Weekend Whether it's a weekend trip NiteDest Number of nights at destination LodgDest Lodge type at destination 165 TransportOriginDest Principal Transportation from Origin to Destination InternationalDestFlag U.S. or International Destination Flag StopsTo Number of Stops to Destination SideTrps Number of Side trips Sex Respondent's gender Side1state Side trip 1 destination locates in the same state as the main trip or not SidetripDest1Lodgn Lodge type at side trip 1 destination SidetripDest1Reasn Trip purpose of side trip 1 SidetripDest1Transportation Transportation mode to side trip 1 destination DestRegion The region where the destination state falls in Tourism National Park recreation visits by state GSP Gross State Product Urban Percentage of urban land use cover by state Agriculture Percentage of agriculture land use cover by state Nature Percentage of natural land use cover by state Table 8-2: Model Variables Used for Long Distance Trip Purpose Estimation 8.3 Trip Purpose Classification Results Using the 1995 ATS data, models from binary class to multi-class were developed to estimate trip purpose and provide methodological sound support to assist the design of a GPS-, social media-, and smartphone-based long distance travel survey. The six different classifiers, or “models” defined in Table 8-1were tested, and the classifier yielding the highest level of classification accuracy was noted. Model 1. First, a binary classification model was developed in which long distance trip purpose was classified as either business or non-business. The results of the classifier with the highest classification accuracy are shown in Table 8-3. The classification results are encouraging: non-business trips are correctly identified with 96.1% accuracy, while for business trips, the accuracy level is 70.1%. Meanwhile, the results imply that the non-business trips are over-predicted with over twice the 166 number of business trips wrongly classified into non-business trips compared to vice versa. Overall, Model 1 successfully estimates trip purposes for 90.31% for all long distance trips in the 1995 ATS. Non-business Business Actual Purpose TP Rate 415,473 16,950 non-business 96.1% 36,932 86,671 Business 70.1% Overall Accuracy: 90.31% Number of Leaves: 27,473; Size of the tree: 35,643 Table 8-3: Model 1 Results Model 2. Trip purpose imputation models that considered more than two categories were also tested. The majority of long distance travel models involve only three trip purpose categories, usually along the lines of business, pleasure (leisure/vacation), and other personal purposes. In Model 2, trips involving a combination of business and pleasure were regarded as business trips, while non- business trips were split into pleasure and personal business trips. The best classifier results for these three trip purposes are presented in Error! Reference source not ound.. An overall predictive accuracy of 81.87% was achieved, with pleasure trips acquiring the highest performance of 91.5% and personal business trips obtaining the lowest accuracy of 51.7%. Almost half of the personal business trips are wrongly classified as pleasure trips, leading to the under-prediction of personal business trips. Moreover, as the number of trip purpose categorization increases from binary to three-class (i.e., from Model 1 to Model 2), the decision tree grows much larger. Pleasure Business Personal Business Actual Purpose TP Rate 315,520 17,032 12,328 Pleasure 91.5% 167 25,656 94,419 3,528 Business 76.4% 35,955 6,286 45,276 Personal Business 51.7% Overall Accuracy: 81.87 % Number of Leaves: 91,241; Size of the tree: 108,145 Table 8-4: Model 2 Results Model 3. A model with four trip purpose categories was tested to examine the impact of having more than three trip purposes in the trip purpose classification. In Model 3, the decoding structure for business and personal business trips remains the same as that in Model 2, while the pleasure trips are further split into leisure and social visit trips (Table 8-1). Table 8-5 shows the results of the four-trip-purpose imputation. The overall accuracy decreased from 81.87% (Model 2) to 76.98%. Compared to having only one pleasure trip category, the breakout of pleasure trips into leisure and social visit categories appears to deteriorate the predictive accuracy of pleasure trips. Meanwhile, the lowest classification performance is still seen with personal business trips. Leisure Social Visit Business Personal Business Actual Purpose TP Rate 136,656 14,359 13,395 8,665 Leisure 79% 13,871 144,403 6,519 7,012 Social Visit 84.1% 14,544 7,430 97,597 4,032 Business 79% 14,894 16,231 7,051 49,341 Personal Business 56.4% Overall Accuracy: 76.98 % Number of Leaves: 137,660; Size of the tree: 163,883 Table 8-5: Model 3 Results Model 4. Another 3-trip-purpose scheme was tested which involved business, non-business and combined business and pleasure (B/P) trip categories. The classification results are shown in Error! Reference source not found.. Overall 168 ccuracy was at 90.22%. Among all trips, the combined B/P trips have the weakest predictive power with only 30.50% of accuracy, due primarily to nearly 60% of these trips being classified as non-business trips resulting in the under-predication of combined B/P trips. Non-business Business Combined B/P Actual Purpose TP Rate 417,072 14,288 1,063 Non-business 96.50% 28,740 80,249 397 Business 73.40% 8,557 1,330 4,330 Combined B/P 30.50% Overall Accuracy: 90.22% Number of Leaves: 30,342; Size of the tree: 38,672 Table 8-6: Model 4 Results Models 5 and 6. Two additional models were developed and evaluated which were slight alterations to Models 1 and 2, to assess the extent to which changing the decoded classification of combined B/P trips led to improved classification performance. In both Models 1 and 2, combined B/P trips were to be classified as business trips. In Model 5, combined B/P trips were to be classified as non-business trips, while in Model 6, combined B/P trips were to be classified as pleasure trips. Due to the uncertainty of the pleasure part in the combined B/P trip, it was deemed risky to define the combined B/P trip as either a leisure trip or a social visit trip for the four-trip-purpose scheme. Therefore, a similar alteration of Model 3 was not considered. Model 5 represented an alternative binary classification of business versus non-business trips. The results of this model are represented in Table 8-7. An overall accuracy of 91.86% is achieved, which is 1.5% higher than the classification accuracy observed for Model 1. 169 Non-business Business Actual Purpose TP Rate 432,079 14,561 non-business 96.7% 30,693 78,693 business 71.9% Overall Accuracy:91.86% Number of Leaves: 24,109; Size of the tree: 30,120 Table 8-7: Model 5 Results A reconstructed business, pleasure, and personal business trip scheme was utilized to learn the three trip purpose classification (Model 6). The results (Table 8-8) indicate that the predicative accuracy is increased from Model 2 to a small extent (from 81.87% to 82.82%), when the combined B/P trips are coded as pleasure trips. Pleasure Business Personal Business Actual Purpose TP Rate 332,435 15,029 11,633 Pleasure 92.6% 22,158 84,496 2,732 Business 77.2% 38,696 5,249 43,572 Personal Business 49.8% Overall Accuracy: 82.82 % Number of Leaves: 86,176; Size of the tree: 100,786 Table 8-8: Model 6 Results Travel surveys using advanced passive tracking device such as GPS and smartphone cannot record information that requires the respondents’ interaction. With such tracking devices, the data will just be collected; there is no further reference to the user. Under this circumstance, the trip purpose can be derived only based on the passively collected spatial-temporal data or the passively collected data combined with the respondent’s social-economic information. In order to provide comprehensive information for future travel survey designs, and to evaluate the effect of different categories of travel information on trip purpose derivation, multiple binary classifiers of business versus non-business are developed and re-estimated 170 (Table 8-9). To overcome the bias towards one class caused by the imbalanced data set, we augmented the sample size of the underrepresented class (business trip) by duplicating the training examples. Full Model. The binary classifier was learned with all kinds of information fed into the model. The results are represented in Table 8-10. The overall classification performance of the model can reach 94.57%, with 2.71% increment in accuracy compared to the model based on the original sample size (91.86%). Variable Sets Full Model Reduced Model Minimized Model Passively Collected Spatial- temporal Data √ √ √ Respondents’ Characteristics and Other Supplementary Variables √ √ Information based on Respondents’ answers in travel survey √ Table 8-9: Compared Model Developments Business Non-business Actual Purpose TP Rate 427302 10242 Business 97.7% 37790 408850 Non-business 91.5% Overall Accuracy:94.57% Number of Leaves: 79949; Size of the tree: 99695 Table 8-10: Full Model Estimation Results Reduced Model. Without the respondents’ interaction in the travel survey, the trip purpose can be derived based on the passively collected spatial-temporal data as well as the respondent’s social-economic characteristics and other supplementary information. The results of the reduced model are shown in Table 8-11. The overall 171 classification accuracy is decreased by above 3% from full model to reduced model, and the size of the decision tree grows. Minimized Model. The final model is developed based on passively collected spatial-temporal data only. The attributes used in the model consist of two parts. One part includes the attributes which can be directly used in the model after some simple post-processing of the passive collected data, such as number of stops, number of side trips, travel start time, etc., while the other part includes the attributes, such as travel mode, lodge type at destination, etc. which need to be derived or estimated based on certain algorithms or established models. Table 8-12 shows the results of the minimized model. The overall classification performance (71.72%) is reduced a lot by more than 20%, compared to the full model. Meanwhile, the tree size is dramatically decreased. Business Non-business Actual Purpose TP Rate 415048 22496 Business 94.9% 55855 390785 Non-business 87.5% Overall Accuracy:91.14% Number of Leaves:92855 ; Size of the tree:116914 Table 8-11: Reduced Model Results Business Non-business Actual Purpose TP Rate 356522 81022 Business 81.5% 169014 277626 Non-business 62.2% Overall Accuracy:71.72% Number of Leaves:1766; Size of the tree:2466 Table 8-12: Minimized Model Results 172 Estimation results show that in general, as the number of categories increases, the accuracy of a trip purpose imputation scheme tends to deteriorate, and the decision tree is inclined to become more complex. We found that non-business trips or pleasure trips can achieve satisfactory results with higher classification accuracy than business trips. Moreover, based on the comparison of the results of different trip purpose categorizations, it is more appropriate to decode the reported combined business and pleasure trips to non-business trips for binary classification and to pleasure trips for three-class classification. Unsatisfactory results can be seen for business trips and personal business trips, which can be explained by the reported errors that tend to be inevitable in the traditional travel survey, as well as the sharing of similar characteristics between personal business trips and pleasure trips (e.g., travel party, travel mode, lodging type of destination, duration). More information about respondents’ travel at the destination at urban level and detailed land use data would be helpful to better distinguish business trips and personal business trips from other types of trips based on high-quality travel survey data. Three additional models were developed and re-estimated to examine the role of different kinds of information in long distance trip purpose imputation, and to assist future long distance travel survey designs. The full model considers all available information including passively collected spatial-temporal data, respondents’ socio-economic and demographic characteristic, and other supplementary data such as land use attributes. The reduced model does not require information that can only be collected through travel surveys (e.g., travel party size), but still uses respondents’ socio-economic and demographic characteristics and GPS or other location data. The 173 minimized model is only based on passively collected spatial-temporal data. As expected, the full model reaches the highest performance accuracy (94.57%) based on the augmented sample size. The classification performance decreases gradually with more variable sets excluded. More than 3% decrease can be achieved by the reduced model excluding the information that requires respondents’ interaction in the travel survey. While only the passive collected data is employed to estimate the trip purpose, the overall classification performance is reduced by above 20% compared to full model. Consequently, it can be concluded that the travel information which collected from respondents’ interaction can affect the long distance trip purpose estimation to a small degree. The respondents’ characteristics and other supplementary information can help improve the trip purpose classification accuracy significantly. 174 Chapter 9: Conclusions and Future Research 9.1 Conclusions The needs to support high level personal long-distance national travel requires that we have accurate analysis tools to be able to understand the long-distance travel behavior and forecast the travel patterns in the future. However, national long- distance passenger travel demand analysis has been an understudied area in transportation planning. This research demonstrates a more advanced academic research endeavor for national passenger travel analysis. It aims to provide important insight and help guide federal and state to make decisions on corridor-level, region-level, and nation-level infrastructure investment, design, and management, as well as to research on long- distance passenger travel demand. The developed national travel demand model exhibits the system logic and concept, statistically supports its basic structure, and provides OD estimations based on the model simulation. The research represents the first attempt to develop an integrated activity- based travel demand model system for individual’s quarterly/yearly long distance or national activities and travel in the U.S at the Metropolitan Statistical Area (MSA)/Non-MSA level which is the finest geographic resolution in the long distance travel survey data. The model system is developed considering the specific attributes of the long distance travel such as low frequency, long activity duration, long activity duration at intermediate stops on the tour legs, different sets of mode alternatives, etc. Therefore, the model system not only takes into account the people’s long distance travel at the tour level, but also at the stop level. Three levels of choice are modeled. 175 The first level is the activity pattern level which generates the number of different types of activities a person will choose during one year; the second level is the tour level which contains choices of tour destination, time of year, tour duration, and tour mode; and the lowest level is the stop level model system including the number, the purpose, and the location of each intermediate stop made during the inbound and outbound legs of the tour. Nationwide long distance travel data over the course of one year is adopted to estimate the parameters of the national travel demand model system. Data from multiple sources (e.g. Census Bureau, Airline Origin and Destination Survey, Amtrak, EIA, Boeing, CEDDS, and etc.) are collected and used to obtain the TAZ economic and demographic data and the transportation OD skim data for car, air and train travel. Each model component of the model system is validated through the with-out sample validation method. The model is implemented in the developed microsimulation platform which simulates each individual’s yearly long distance activities and travel in the U.S. The 2010 PUMS data is expanded according to the person weight in order to generate the total population data in base year. Based on the base year passenger long-distance model, we calibrated the alternative specific constants of the time of year choice model, and the travel mode choice model using the Airline Origin and Destination Survey (DB1B) which is the only observed data we have for calibration. The calibrated model is employed to predict the future year long distance travel demand based on the synthetic future year population which we call it base scenario. The future year population is generated using PopGen at the county level for 176 the future year of 2040. Multiple scenarios (including fuel price increase and high speed rail) in the future year are then analyzed and compared with the base scenario. The comparison results are as what we expected. People are sensitive to the fuel price change while driving for long distance travel. As the fuel price increases, people would decrease their number of car travel or reduce the travel distance of each car trip. Although the high speed rai (HSR) is operated along the selected regional corridor in terms of train travel time increase and more people would turn to train for their long distance travel, the car is still the most popular travel mode among the three travel modes with the largest share of the trips along the regional corridor. And after the HSR is operated along regional corridor, the main change of the trip (trip distributions) occurs along the corridor. The trip distributions at the national level are seldom affected. 9.2 Recommendations for Future Research In our national travel demand model, the high-level module generates the number of long distance activities by purpose based on the parameters called long distance trip rates. The long distance trip rates for each purpose were estimated using Multiple Classification Analysis (MCA) method based on 1995 ATS data. The value of the parameters is determined only by the person’s socioeconomic and demographic characteristics. Such model feature could result in the insensitivity of the number of long distance activities for each purpose to any transportation-related policies. For instance, in our future year scenario analysis, each person’s long distance activities of the three purposes don’t change with the scenario, as the socioeconomic and 177 demographic characteristics of the person or household don’t change. In reality, people’s number of long distance activities for each purpose could be affected by not only their socioeconomic and demographic attributes but also the level of service of the transportation network. Due to the data limitation, we simplified the theory of people’s choice on the number of long distance activities by purpose during one year. But in future, with the sufficient data, the method of predicting the long distance activity pattern during one year could be improved by using more advanced methodology (e.g. multinomial logit model) and incorporating transportation LOS attributes. The national travel demand model is developed based on the 1995 American Travel Survey (ATS) data which provides detailed information about people’s national long distance travel during one year. However, as we know that the main problem about the data is that it is more than 20 years old and has a limited sample size, and people’s travel behavior and the travel pattern could change during the 20 years. With the next iteration of the long distance travel survey of the U.S. household population in future, we can utilize the new survey data set with the proposed methodology to impute the missing information to improve the national travel demand model which can better reflect people’s decisions in long distance travel. For the passenger long distance model, we initially conducted 60 iterations of calibration due to time limitation, and it results in that the trip distribution by time of year hasn’t reached up to the expected pattern. With more iterations, the trip distribution by time of year should be what we expect. In the current calibration, we calibrate the alternative specific constants of the time of year choice model, and the 178 travel mode choice model using the Airline Origin and Destination Survey (DB1B) which is the only available data set to use for calibration. The airline OD data is aggregated at the state level for Maryland and the rest of the states as a whole for each quarter (which yields 16 variables) in order to reduce the calibration time. With more powerful computer, we could use the airline OD data at the TAZ level to conduct the calibration. Also, in our calibration, due to the data limitation (lack of OD data for car and train travel), only airline OD data is used. In future, with available OD data for car and train travel, the passenger national travel model could be calibrated by taking into account all the OD data for all the travel modes. The national travel demand model can simulate the travel pattern and travel behavior of each person in the U.S over the course of one year. It means that the simulation could use a large amount of RAM and the results would need a lot of storage to be saved. Currently, writing each person’s detailed trip and tour information of long distance activities during one year into database (e.g. MySQL) would take more time and require a lot more computer storage and resources. Therefore, we output more aggregated and summarized travel data, which will result in no access to the output of each model component and the detailed trip/tour information of each person. With distributed computation technique or other big data technique and more computer storage, the simulation program of the national travel demand model could be improved and optimized to reduce the running time and store the detailed trip/tour information of each person in the U.S. 179 Bibliography Allison, Paul D, 1982. Discrete-time methods for the analysis of event histories."Sociological methodology 13.1: 61-98. Amtrak, 2010. A-Vision-for-High-Speed-Rail-in-the-Northeast-Corridor. https://www.amtrak.com/ccurl/214/393/A-Vision-for-High-Speed-Rail-in-the- Northeast-Corridor.pdf Antoniou, C., Azevedo, C. L., Lu, L., Pereira, F., and Ben-Akiva, M. (2015), W- SPSA in practice: Approximation of weight matrices and calibration of traffic simulation models. Transportation Research Part C: Emerging Technologies, 59, 129-146. Axhausen, K. and Müller, K., 2010. Population synthesis for microsimulation: State of the art. Technical Report August. Swiss Federal Institute of Technology Zurich. Axhausen, KW., Schonfelder, S., Wolf, J., Oliveira, M., Samaga, U., 2003. 80 weeks of GPS-traces: approaches to enriching the trip information. Transportation Research Record, 1870, 46 -54 Baik, H., Trani, A. A., Hinze, N., Swingle, H., Ashiabor, S., & Seshadri, A., 2008. Forecasting Model for Air Taxi, Commercial Airline, And Automobile Demand in the United States. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2052, pp9-20. Battelle Memorial Institute, The Urban Institute, and University of Maryland, 2013. Design of a completely new approach for a national household travel survey instrument. Project report for Federal Highway Administration. Contract No. DTFH61-11-C-00039 Beckman, R. J., K. A. Baggerly and M. D. McKay (1996) Creating synthetic baseline populations, Transportation Research Part A: Policy and Practice, 30 (6) 415–429. Ben-Akiva, M., Cascetta, E., Coppola, P., Papola, A., & Velardi, V., 2010. High speed rail demand forecasting: Italian case study. In European Transport Conference, 2010. Ben-Akiva, M.E. and Lerman, S.R., 1985. Discrete choice analysis: theory and application to travel demand (Vol. 9). MIT press. Beser, M. and Algers, S., 2001. SAMPERS – The New Swedish National Travel Demand Forecasting Tool. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. 180 Bhat, C., 2005. A Multiple Discrete-Continuous Extreme Value Model: Formulation and Application to Discretionary Time-Use Decisions. Transportation Research Part B, Vol. 39, No. 8, pp. 679-707. Bhat, C.R., 1995. A Heteroscedastic Extreme Value Model of Intercity Travel Mode Choice. Transportation Research Part B, Vol. 29, No. 6, pp. 471-483. Billitteri, Thomas J. High-Speed Trains: Does the United States Need Supertrains?. [Online] library.cqpress.com. Retrieved on December 26, 2013. Boeing (Producer)., 2011. 757 Commercial Transport History. Retrieved from http://www.boeing.com/history/boeing/757.html Bohte, W., Maat, K., 2009. Deriving and validating trip destinations and modes for multi-day GPS-based travel surveys: a large-scale application in the Netherlands. Transportation Research Part C 17, 285–297 Börjesson, M., 2012. Forecasting demand for high speed rail. http://vti.diva- portal.org/smash/get/diva2:669361/FULLTEXT01.pdf Bostrom, R., 2006. Kentucky Statewide Travel Model (KYSTM). Presentation. Combined Kentucky-Tennessee Model Users Group Meeting, Bowling Green KY. October 26, 2006. Bowman, J.L., 2004. A comparison of population synthesizers used in microsimulation models of activity and travel demand. Unpublished working paper. http://jbowman. net/papers/2004. Bowman. Comparison_of_PopSyns. pdf. Bowman, J. L., and M. A. Bradley., 2006. Activity-based travel forecasting model for SACOG: Technical Memo Number 5: Intermediate Stop Location Models. Available at http://jbowman. Net Brog, W., Erl, E., & Schulze, B., 2004. DATELINE: Concept and Methodology. Paper presented at the 2004 European Transport Conference. Bureau of Transportation Statistics, 2015. http://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/publications/national_transp ortation_statistics/html/table_04_23.html Burge, P., Kim, C. W., & Rohr, C., 2011. Modelling Demand for Long-Distance Travel in Great Britain: Stated preference surveys to support the modelling of demand for high-speed rail. Burge, P., Rohr, C. and C. Kim., 2010. Who might travel by high-speed rail? Modeling Choices for long-distance travelers in the UK, European Transport Conference, Glasgow. 181 Burgess, A., Snelder, M., Martino, A., Fiorello, D., Bröcker, J., Schneekloth, N. & Rudzi-kaite, L., 2006. TRANS-TOOLS (TOOLS for Transport forecasting ANd Scenario testing) Deliverable 1. Funded by 6th Framework RTD Programme. TNO Inro, Delft, Netherlands. Cambridge Systematics, 2016, California High-Speed Rail Draft 2016 Business Plan- Ridership and Revenue Forecasting. Prepared for Parsons Brinckerhoff and California High-Speed Rail Authority https://www.hsr.ca.gov/docs/about/business_plans/DRAFT_2016_Business_Plan _Ridersihp_Revenue_Forecast.pdf Cambridge Systematics and Mark Bradley Research and Consulting, 2006. Bay Area/ California High-Speed Rail Ridership and Revenue Forecasting Study: Interregional Model System Development. Final Report. Prepared for Metropolitan Transportation Commission and the California High-Speed Rail Authority. Cambridge Systematics., 2008. National travel demand forecasting model phase I final scope. NCHRP Project (2008): 08-36. Retrieved from http://www.camsys.com/pubs/NCHRP08-36-70.pdf Census Bureau., 2008. Public Use Microdata Sample, 2000 Census of Population and Housing-Technical Documentation. https://www.census.gov/prod/cen2000/doc/pums.pdf Chen, C., Gong, H., b, Lawson, C., Bialostozky, E., 2010. Evaluating the feasibility of a passive travel survey collection in a complex urban environment: Lessons learned from the New York City case study. Transportation Research Part A 44, 830–840 Cohen, H., Horowitz, A., & Pendyala, R., 2008. Forecasting statewide freight toolkit. Washington, DC: Transportation Research Board. Congression Budget Office (CBO), 2008. Effects of Gasoline Prices on Driving Behavior and Vehicle Markets. https://www.cbo.gov/sites/default/files/110th- congress-2007-2008/reports/01-14-gasolineprices.pdf Contrino, H., & McGuckin, N., 2009. Demographics Matter Travel Demand, Options and Characteristics Among Minority Populations. Public Works Management & Policy, 13(4), 361-368 Daly, A.J., Rohr, C. (1998) Forecasting Demand for New Travel Alternatives. In: T Gärling, T Laitila, K Westin (ed.) Theoretical Foundation for Travel Choice Modeling, Pergamon. 182 Davidson, P., & Clarke, P., 2004. Preparation of OD Matrices for DATELINE. Paper presented at the 2004 European Transport Conference. Deming, W. E. and F. F. Stephan (1940) On the least squares adjustment of a sampled frequency table when the expected marginal totals are known, Annals of Mathemtical Statistics, 11 (4) 427–444. Deng, Z., Ji, M., 2010. Deriving Rules for Trip Purpose Identification from GPS Travel Survey Data and Land Use Data: A Machine Learning Approach. Traffic and Transportation Studies. p.768-777 Energy Information Administration (EIA)., http://www.eia.gov/. Accessed in 2016. Epstein, J. M., Parker, J., Cummings, D., & Hammond, R. A., 2008. Coupled Contagion Dynamics of Fear and Disease: Mathematical and Computational Explorations. PLoS One, Vol. 3, No. 12, p. e3955. Erhardt, G., Freedman, J., Stryker, A., Fujioka, H., & Anderson, R., 2015. Ohio long- distance travel model. Transportation Research Record: Journal of the Transportation Research Board. Federal Highway Administration., 2013. Understanding Long-Distance Traveler Behavior - Supporting a Long-Distance Passenger Travel Demand Model. FHWA-HRT-13-095. Forinash, C. V., and F. S. Koppelman., 1993. Application and Interpretation of Nested Logit Models of Intercity Mode Choice. In Transportation Research Record 1413, TRB, National Research Council, Washington, D.C., pp. 98–106. Fosgerau, M., 2002. PETRA—An Activity-based Approach to Travel Demand Analysis. In National Transport Models (pp. 134-145). Springer Berlin Heidelberg. GAO (United States Government Accountability Office), 2014. Impact of Fuel Price Increases on the Aviation Industry. Gaudry, M., 2001. Test of Nonlinearity, Modal Captivity And Spatial Competition Within the STEMM Multicountry Applications for Passengers. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. Giaimo, G. T., & Schiffer, R., 2005. Statewide Travel Demand Modeling: A Peer Exchange, Longboat Key, Florida, September 23-24, 2004. Transportation Research E-Circular, (E-C075). 183 Giaimo, G.T.,& Schiffer, R. (Eds.)., 2005. August. Statewide travel demand modeling: A peer exchange. Transportation Research Circular, #E-C075. Gokovali, U., Bahar, O. and Kozak, M., 2007. Determinants of length of stay: A practical use of survival analysis. Tourism Management, 28(3), pp.736-746. Griffin, T., Huang, Y., 2005. A Decision Tree Based Classification Model to Automate Trip Purpose Derivation. In the Proceedings of the 18th International Conference on Computer Applications in Industry and Engineering, Honolulu, Hawaii Grush, W., 1998. Usage and Vehicle Miles of Travel (VMT) per Capita. Highway Information Quarterly, 5(4). Gunn, H.F., 2001a. An Overview of European National Models. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. Guo, J. Y. and C. R. Bhat , 2007. Population synthesis for microsimulating travel behavior, Transportation Research Record, 2014 (12) 92–101. HCG and TOI., 1990. A Model System to Predict Fuel Use And Emissions from Private Travel in Norway from 1985 to 2025. Report to the Norwegian Ministry of Transport. Hague Consulting Group. Hess, S., Adler, T. & Polak, J.W., 2007. Modelling airport and airline choice behavior with the use of stated preference survey data, Transportation Research Part E, 43, pp. 221-233 Hofman, F., 2001. Application Areas for the Dutch National Model. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. Horowitz, A.J., 2006. Statewide travel forecasting models (NCHRP Synthesis #358). Transportation Research Board. Horowitz, A.J., 2008. White paper: Statewide travel demand forecasting. Requested by AASHTO and presented at the conference on meeting federal surface transportation requirements in statewide and metropolitan transportation planning. Hwang, H.L. & Rollow, J., 2000. Data Processing Procedures and Methodology for Estimating Trip Distances for the 1995 American Travel Survey (ATS). http://cta.ornl.gov/cta/Publications/Reports/ORNL_TM_2000_141.pdf Koppelman, F. S., 1989. Multidimensional model system for intercity travel choice behavior. Transportation Research Record, (1241). 184 Koppelman, F.S., & Sethi, V., 2005. Incorporating Variance and Covariance Heterogeneity in the Generalized Nested Logit model: An Application to Modeling Long Distance Travel Choice Behavior. Transportation Research Part B, Vol. 39, No. 9, pp. 825-853. Lee, J. H., K.-S. Chon, and C. Park., 2004. Accommodating Heterogeneity and Heteroscedasticity in Intercity Travel Mode Choice Model: Formulation and Application to Honam, South Korea, High-Speed Rail Demand Analysis. In Transportation Research Record: Journal of the Transportation Research Board, No. 1898, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 69–78. Lemp, J.D. and Kockelman, K.M., 2012. Strategic sampling for large choice sets in estimation and application. Transportation Research Part A: Policy and Practice, 46(3), pp.602-613. Li, G., 2004. Intercity Travel Demand: A Utility Consistent Simultaneous Trip Generation and Mode Choice Model. Doctoral dissertation. Interdisciplinary Program in Transportation, New Jersey Institute of Technology, Newark Lim, P.P. and Gargett, D., 2013, October. Population Synthesis for Travel Demand Forecasting. In Australasian Transport Research Forum (ATRF), 36th, 2013, Brisbane, Queensland, Australia. Lu, Y., Zhu, S., Zhang, L., 2013. Imputing Trip Purpose Based on GPS Travel Survey Data and Machine Learning Methods, Transportation Research Board 92nd Annual Meeting. Washington D.C, Paper Number: 13-3177 Lu, Y., and Zhang, L. (2015). Imputing trip purposes for long-distance travel. Transportation, 42(4), 581-595. Lundqvist, L., & Mattsson, L. G. (Eds.), 2001. National Transport Models: Recent Developments and Prospects. Springer. Mannering, F.L., 1983. An econometric analysis of vihicle use in multivehicle households. Transportation Research Part A, Vol. 17, No. 3, pp. 183-189. Nerella, S. and Bhat, C., 2004. Numerical analysis of effect of sampling of alternatives in discrete choice models. Transportation Research Record: Journal of the Transportation Research Board, (1894), pp.11-19. Outwater, M., Bradley, M., Ferdous, N., Trevino, S., & Lin, H. (2015). Foundational Knowledge to Support a Long-Distance Passenger Travel Demand Modeling Framework: Implementation Report. 185 Parsons Brinckerhoff, HBA Specto Incorporated, and EcoNorthwest., 2010. Oregon Statewide Integrated Model (SWIM2) Model Description. http://www.oregon.gov/ODOT/TD/TP/docs/statewide/swim2.pdf Peter Davidson Consultancy., 2000. MYSTIC Toward Origin-Destination Matrices for Europe. London. Rohr, C., Fox, J., Daly, A., Patruni, B., Patil, S. and F. Tsang, 2013. Modelling long- distance travel in the UK, Transport Research Record 2344, pp 145-152 Ryan, J, Maoh, H and Kanaroglou, P., 2009. Population synthesis: Comparing the major techniques using a small, complete population of firms’, Geographical Analysis, 41 (2) 181–203. Salvini, P. A. and E. J. Miller, 2005. ILUTE: An operational prototype of a comprehensive microsimulation model of urban systems, Networks and Spatial Economics, 5 (2) 217–234. Schönfelder, S., K. Axhausen, N. Antille, M. Bierlaire, and E. Lausanne., 2002. Exploring the potentials of automatically collected GPS data for travel behavior analysis - a Swedish data source. GI-Technologien für Verkehr und Logistik 13, 155-179. Sharma, S., Lyford, R., & Rossi, T., 1999. The New Hampshire Statewide Travel Model System (No. E-C011). http://onlinepubs.trb.org/onlinepubs/circulars/ec011/sharma.pdf Srinivasan, S. and L. Ma, 2009. Synthetic population generation: A heuristic data- fitting approach and validations, paper presented at the the 12th International Conference on Travel Behaviour Research (IATBR), Jaipur, December 2009. Srinivasan, S., L. Ma and K. Yathindra, 2008. Procedure for forecasting household characteristics for input to travel-demand models, Final Report, TRC-FDOT- 64011-2008, Transportation Research Center, University of Florida. http://www.fsutmsonline.net/images/uploads/reports/FDOT_BD545_79_rpt.pdf. Souleyrette, R.R., Hans, Z.N., & Pathak, S., 1996. Statewide transportation planning model and methodology development program. Ames: Iowa State University. Spall, J.C. (2003), Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control, Wiley. Stammer, Jr., R.E., 2002. Statewide Modeling Practices and Prototype Statewide Model Development. Final Report TNSPR-RES-1147. Prepared for Tennessee Department of Transportation. 186 Stopher, P., Clifford, E., Zhang, J., FitzGerald, C., 2008a. Deducing Mode and Purpose from GPS data. Working Paper of the Austrian Key Centre in Transport and Logistics. University of Sydney, Sydney, Australia. Stopher, P., FitzGerald, C., Zhang, J., 2008b. Search for a Global Positioning System device to measure personal travel. Transportation Research Part C 16(3), 350–369. Stopher, P.R., Greaves, S., FitzGerald,C., 2005. Developing and deploying a new wearable GPS device for transport applications. Paper presented to the 28th Australasian Transport Research Forum, Sydney, 28–30 September. Tipping, A., Schmahl, A., Duiven, F. 2015. The Impact of Reduced Oil Prices on the Transportation Sector. http://www.strategy- business.com/article/00312?gko=ae404 The National Center for Smart Growth Research and Education, University of Maryland and Parsons Brinckerhoff., 2011. ‘MSTM User Guide: Maryland Statewide Transportation Model’ Urban Land Use and Transportation Center, HBA Specto Incorporated., 2011. CSTDM09 - California Statewide Travel Demand Model. http://ultrans.its.ucdavis.edu/files/pecas/CSTDM09_ModelOverview_Final_0.pdf U.S Department of Transportation (USDOT), 2008. Impact Of High Oil Prices On Freight Transportation: Modal Shift Potential In Five Corridors Technical Report. http://www.marad.dot.gov/wp-content/uploads/pdf/Modal_Shift_Study_- _Technical_Report.pdf Van Nostrand, C., Sivaraman, V., & Pinjari, A. R., 2013. Analysis of Long-Distance Vacation Travel Demand in the United States: A Multiple Discrete-Continuous Choice Framework. Transportation, Vol. 40, No. 1, pp. 151-171. Wardman, M., 1988. A comparison of revealed preference and stated preference models of travel behaviour. Journal of Transport Economics and Policy, pp.71-91. Weiner, E., 1976. Assessing National Urban Transportation Policy Alternatives. Transportation Research, Vol. 10, No. 3, pp. 159-178. Widlert, S., 2001. National Models: How to Make It Happen. The Case of the Swedish National Model System: SAMPERS. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. 187 Williams, I., 2001. Designing the STREAMS model of Europe. In Lundqvist, L., & Mattsson, L.G. (Eds.), National Transport Models: Recent Developments and Prospects, Springer, New York. Winston, C., 1985. Research on Intercity Freight and Passenger Transportation: An Economist's Perspective. Transportation Research Part A, Vol. 19, No. 5-6, pp. 491-494. Witten, I. H., Frank, E., 2005, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition Wolf, J., Guensler, R., Bachman, W., 2001a. Elimination of the travel diary: an experiment to derive trip purpose from GPS travel data. In 80th Annual Meeting of the Transportation Research Board, Washington DC., p.24. Wolf, J., R. Guensler, and W. Bachman, 2001b. Elimination of the travel diary: Experiment to derive trip purpose from global positioning system travel data. Transportation Research Record: Journal of the Transportation Research Board 1768 (1), 125-134. Worsley, T.E., & Harris, R.C.E., 2001. GB traffic forecasts—status and development. In L. Lundqvist &L.-G. Mattsson (Eds.), National transport models: Recent developments and prospects. Stockholm: Swedish Transport and Communications Research Board Ye, X., K. Konduri, R. M. Pendyala, B. Sana and P. Waddell, 2009. A methodology to 14 match distributions of both household and person attributes in the generation of synthetic 15 populations, paper presented at the the 88th Annual Meeting of the Transportation Re- 16 search Board, Washington, D.C., January 2009. Zhang, L., Southworth, F., Xiong, C., & Sonnenberg, A., 2012. Methodological Options and Data Sources for the Development of Long-Distance Passenger Travel Demand Models: A Comprehensive Review. Transport Reviews, Vol. 32, No. 4, pp. 399-433. Zhang, L., Southworth, F., Xiong, C., & Sonnenberg, A., 2012. Methodological Options and Data Sources for the Development of Long-Distance Passenger Travel Demand Models: A Comprehensive Review. Zhang, L., Xiong, C., & Berger, K., 2010. Multimodal Inter-Regional Origin- Destination Demand Estimation: A Review of Methodologies and Their Applicability to National-Level Travel Analysis in the US. In the World Conference on Transport Research, Lisbon, Portugal.ss