ABSTRACT Title of Document: UNCERTAINTY ASSOCIATED WITH TRAVEL TIME PREDICTION: ADVANCED VOLATILITY APPROACHES AND ENSEMBLE METHODS YANRU ZHANG, PH.D, 2015 Directed By: ALI HAGHANI,PROFESSOR DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING Travel time effectively measures freeway traffic conditions. Easy access to this information provides the potential to alleviate traffic congestion and to increase the reliability in road networks. Accurate travel time information through Advanced Traveler Information Systems (ATIS) can provide guidance for travelers’ decisions on departure time, route, and mode choice, and reduce travelers’ stress and anxiety. In addition, travel time information can be used to present the current or future traffic state in a network and provide assistance for transportation agencies in proactively developing Advanced Traffic Management System (ATMS) strategies. Despite its importance, it is still a challenging task to model and estimate travel time, as traffic often has irregular fluctuations. These fluctuations result from the interactions among different vehicle-driver combinations and exogenous factors such as traffic incidents, weather, demand, and roadway conditions. Travel time is especially sensitive to the exogenous factors when operating at or near the roadway’s capacity, where congestion occurs. Small changes in traffic demand or the occurrence of an incident can greatly affect the travel time. As it is impossible to take into consideration every impact of these unpredictable exogenous factors in the modeling process, travel time prediction problem is often associated with uncertainty. This research uses innovative data mining approaches such as advanced statistical and machine learning algorithms to study uncertainty associated with travel time prediction. The final objective of this research is to develop more accurate and reliable travel time prediction models. UNCERTAINTY ASSOCIATED WITH TRAVEL TIME PREDICTION: ADVANCED VOLATILITY APPROACHES AND ENSEMBLE METHODS By Yanru Zhang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2015 Advisory Committee: Professor Haghani, Ali, Chair Professor Schonfeld, Paul M. Associate Professor Cirillo, Cinzia Assistant Professor Forman, Barton Associate Professor Alt, Frank B. © Copyright by Yanru Zhang 2015 ii Dedication To my parents, sister and husband, for their endless love, support and encouragement. iii Acknowledgements First and foremost, I would like to express my greatest gratitude to my advisor, Dr. Ali Haghani, for the continuous support of my PhD study and research, for his guidance, advice, patience, motivation, enthusiasm, and immense knowledge. I cannot thank him enough for all his contributions of time, ideas, and supports throughout my research and writing the dissertation. I respect him for his attitude towards life, and his generosity to his students. One simply could not wish for a better or friendlier supervisor. Special thanks to the members of my dissertation committee: Dr. Paul M. Schonfeld, Dr. Cinzia Cirillo, Dr. Barton Forman, and Dr. Frank B. Alt for generously giving their time, encouragement, valuable expertise and advice to improve my research. It is a great honor to have them on my committee. Their insights helped me to further improve my research work. I am grateful for the friendship and encouragement of all members in our graduate research group, especially Masoud Hamedi and Kaveh Farokhi Sadabadi for sharing their time and ideas through discussion of this topic. To all my friends, who have made my life so colorful and exciting, I cannot list all your names here, but you are always on my mind. Finally, I would like to dedicate this dissertation to my family especially my parents and my husband. My parents made many sacrifices to ensure I get the best possible education and have provided me with unconditional love, support and understanding. My husband always supports me in my academic pursuits and is truly special to me. I am especially thankful for the wonderful life that we share together. iv Table of Contents Dedication ..................................................................................................................... ii Acknowledgements ...................................................................................................... iii Table of Contents ......................................................................................................... iv List of Tables ............................................................................................................... vi List of Figures ............................................................................................................ viii Chapter 1: Introduction ................................................................................................. 1 1.1 Problem Statement .............................................................................................. 1 1.2 Research Objectives ............................................................................................ 3 1.3 Research Contributions ....................................................................................... 5 1.4 Dissertation Outline ............................................................................................ 8 Chapter 2: Literature Review ...................................................................................... 10 2.1 Parametric Approaches ..................................................................................... 11 2.1.1 Naïve Methods ........................................................................................... 11 2.1.2 Autoregressive Linear Processes ............................................................... 12 2.1.3 State Space Models .................................................................................... 12 2.2 Non-parametric Approaches ............................................................................. 13 2.2.1 Non-parametric Regression ....................................................................... 13 2.2.2 Neural Network .......................................................................................... 14 2.2.3 Other Artificial Intelligence Methods ........................................................ 16 2.3 Hybrid Approaches ........................................................................................... 16 2.3.1 Classification Based Approach .................................................................. 16 2.3.2 Kalman Filtering Based Approach............................................................. 17 2.3.3 Decomposition Technique ......................................................................... 18 2.3.4 Ensemble Trees .......................................................................................... 19 2.3.5 Other Combination Approaches ................................................................ 20 2.4 Prediction Interval Based Approaches .............................................................. 21 2.4.1 Ensemble Methods ..................................................................................... 22 2.4.2 Statistical Volatility Based Approach ........................................................ 23 2.5 Summary ........................................................................................................... 24 Chapter 3: Statistical Volatility Models for Reliable Travel Time Prediction ........... 27 3.1 Mean Prediction Models ................................................................................... 29 3.1.1 Theoretical Background of ARIMA Models ............................................. 30 3.1.2 ARIMA Model Optimization ..................................................................... 32 3.1.3 Model Evaluation Criterions ...................................................................... 33 3.2 Volatility Models .............................................................................................. 33 3.2.1 GARCH-type Models ................................................................................ 34 v 3.2.2 Component GARCH Models ..................................................................... 35 3.2.2 Stochastic Volatility Model ....................................................................... 39 3.2.3 Prediction Interval Estimation ................................................................... 43 3.3 Application of Component GARCH Models in Travel Time Prediction ......... 44 3.3.1 Modeling Conditional Mean ...................................................................... 45 3.3.2 Testing the ARCH Effect ........................................................................... 47 3.3.3 Estimating the Volatility Model................................................................. 48 3.3.4 Construct the Mean and Prediction Intervals ............................................. 49 3.3.5 Results and Discussion .............................................................................. 51 3.3.6 Summary .................................................................................................... 57 3.4 Application of Stochastic Volatility Model ...................................................... 58 3.4.1 Model Fitting ............................................................................................. 64 3.4.2 Results and Analysis .................................................................................. 65 3.4.3 Summary .................................................................................................... 70 Chapter 4: Ensemble Methods in Travel Time Prediction.......................................... 73 4.1 Common Types of Ensembles .......................................................................... 75 4.1.1 Bagging ...................................................................................................... 76 4.1.2 Boosting ..................................................................................................... 78 4.2 Ensemble Tree .................................................................................................. 79 4.2.1 Single Regression Tree .............................................................................. 81 4.2.3 Random Forest Regression ........................................................................ 83 4.2.4 Gradient Boosted Regression Tree ............................................................ 86 4.3 Application to Travel Time Prediction ............................................................. 90 4.3.1 Data Description and Preparation .............................................................. 91 4.3.2 Model Optimization ................................................................................... 94 4.3.3 Model Interpretation ................................................................................ 100 4.3.4 Model Comparison................................................................................... 104 4.3.5 Discussion and Conclusion ...................................................................... 108 Chapter 5: A Travel Time Prediction Framework .................................................... 110 5.1 Model Development........................................................................................ 110 5.2 Data Description and Preparation ................................................................... 111 5.3 Model Comparison.......................................................................................... 115 5.4 Chapter Summary ........................................................................................... 147 Chapter 6: Conclusion and Recommendations ......................................................... 148 6.1 Summary ......................................................................................................... 148 6.2 Conclusion ...................................................................................................... 151 6.3 Future Recommendation ................................................................................. 152 vi List of Tables Table 1 Estimated MPIL and PICP values for GARCH, C-GARCH and MC-GARCH models ......................................................................................................................... 53 Table 2. Selected segments for this study ................................................................... 61 Table 3 Performance measures of the mean equation ................................................ 68 Table 4 Selected Freeway Segments for the Study ..................................................... 92 Table 5 Basic Statistics of Travel Time Data ............................................................. 93 Table 6 Example of the Training/Testing Data File ................................................... 93 Table 7 Relative Influence of Input Variables for GBM Models with Learning Rate of 0.001for Multistep-ahead Prediction ........................................................................ 102 Table 8 Comparison of 5 Minutes Ahead Prediction for ARIMA, RF and GBM .... 105 Table 9 Comparison of 15 Minutes Ahead Prediction for ARIMA, RF and GBM .. 106 Table 10 Comparison of 30 Minutes Ahead Prediction for ARIMA, RF and GBM 106 Table 11 Selected Freeway Segment Information .................................................... 115 Table 12 Comparing the MAPE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) ................................................................................ 122 Table 13 Comparing the RMSE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) ................................................................................ 123 Table 14 Comparing the MPIL Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) ................................................................................ 124 Table 15 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) ................................................................................ 125 Table 16 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) ............................................................................................. 126 Table 17 Comparing the MAPE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) .................................................................................. 127 Table 18 Comparing the RMSE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) .................................................................................. 128 Table 19 Comparing the MPIL Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) .................................................................................. 129 Table 20 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) .................................................................................. 130 Table 21 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) ............................................................................................... 131 Table 22 Comparing the MAPE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) .................................................................................. 132 Table 23 Comparing the RMSE Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) .................................................................................. 133 Table 24 Comparing the MPIL Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) .................................................................................. 134 Table 25 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) .................................................................................. 135 Table 26 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) ............................................................................................... 136 vii Table 27 Comparing the MAPE Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday)............................................................................. 137 Table 28 Comparing the RMSE Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday)............................................................................. 138 Table 29 Comparing the MPIL Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday)............................................................................. 139 Table 30 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday)............................................................................. 140 Table 31 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday) .......................................................................................... 141 Table 32 Comparing the MAPE Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday)............................................................................. 142 Table 33 Comparing the RMSE Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday)............................................................................. 143 Table 34 Comparing the MPIL Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday)............................................................................. 144 Table 35 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday)............................................................................. 145 Table 36 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday) .......................................................................................... 146 viii List of Figures Figure 1. Concept of prediction interval [93] ............................................................. 28 Figure 2. Box Plot of Absolute Deviation from Predicted Mean ............................... 46 Figure 3. Multiplicative component GARCH forecasting results: decomposition of the volatility into its various components (32-33). ........................................................... 50 Figure 4 Predicted mean and PI for multiplicative component GARCH model ........ 51 Figure 5. Comparing performance of GARCH, C-GARCH and MC-GARCH models during peak hours ........................................................................................................ 55 Figure 6. Comparing performance of GARCH, C-GARCH and MC-GARCH during non-peak hours ............................................................................................................ 55 Figure 7. Comparison of prediction intervals constructed by GARCH, C-GARCH and MC-GARCH models .................................................................................................. 56 Figure 8. Bluetooth sensor location of the study ........................................................ 59 Figure 9. A scatter plot of travel times on four paths. ................................................ 63 Figure 10. Prediction results for peak hour travel time at four segments. .................. 67 Figure 11. Comparison of performance measures for six dataset by using ARIMA- GARCH and ARIMA-SV model with (a) 5 minute time interval (b) 15 minute time interval ........................................................................................................................ 70 Figure 12 The Bagging algorithm ............................................................................... 77 Figure 13. The Boosting algorithm ............................................................................. 79 Figure 14. Single regression tree ................................................................................ 82 Figure 15. Pseudo-code for random forest .................................................................. 84 Figure 16. Pseudo-code for generic gradient boosting ............................................... 88 Figure 17. The Relationship between MAPE and Number of Trees for Models Fitted with Seven Learning Rates and Four Levels of Interactions ...................................... 97 Figure 18. MAPE against Learning Rate for Models Fitted with Various Numbers of Trees and Different Levels of Interactions ................................................................. 98 Figure 19. MAPE against Tree Complexity for Models Fitted with Various Numbers of Trees and Different Learning Rate ......................................................................... 99 Figure 20. Three-dimensional Plots for The Joint Effects of Lag One Difference and Lag One Travel Time on Predicted Travel Time Value ........................................... 103 Figure 21. Sample Travel Time Prediction Results of the GBM method ................. 107 Figure 22. Selected Study Segment at the I95 Southbound Direction ...................... 112 Figure 23. Selected Study Segment at the I495 Eastbound Direction ...................... 113 Figure 24. Selected Study Segment at the MD295 Southbound Direction ............... 114 1 Chapter 1: Introduction Travel time is widely acknowledged as an effective measure of highway traffic conditions, which can be easily understood by both travelers and transportation agencies [1]. Access to accurate travel time information has the potential to alleviate traffic congestion, to minimize its negative environmental and societal side effects, and to increase road network reliability [2]. Advanced Traveler Information System (ATIS) provides travelers with accurate and timely traffic information via dynamic message signs, radio, and internet. The pre-trip travel time information gives guidance for travelers’ decisions for departure time, route, and mode choice, and reduces travelers’ stress and anxiety. In addition, travel time information can also be used to present the current or future traffic state in a network and to provide assistance for transportation agencies in proactively developing Advanced Traffic Management System (ATMS) strategies. For example, travel time is one of the performance measures in the Freeway Performance Measurement System (PeMS) developed by the California Department of Transportation [2]. The Split Cycle Offset Optimization Technique system and the Sydney Coordinated Adaptive Traffic system are two successful traffic operation systems that use travel time information as their module input [3]. Therefore, travel time information is a critical input and output for intelligent transportation system. 1.1 Problem Statement The success of ATIS and ATMS relies not only on the availability and accuracy of historical and real time traffic information, but also on future traffic 2 information. A wide range of methodologies in travel time forecasting has been proposed to model traffic characteristics and to produce short term forecasts. Most of these methods are based on the historical travel time data concurrently collected from various detection systems, such as vehicles with GPS or Bluetooth devices, electronic toll system, and video detection. Especially with the recent technology advances in vehicle tracking, direct and accurate travel time information can be easily obtained. The technology improvements make the development of an online travel time prediction algorithm more meaningful. Despite the proliferation of traffic prediction methodologies in the existing literature, modeling and estimating travel time is still a challenging task [4]. Travel time experiences strong fluctuations across different periods and traffic conditions. These fluctuations result from the interactions among different vehicle-driver combinations, and exogenous factors such as traffic incidents, weather, demand, and roadway conditions. Travel time is especially sensitive to the exogenous factors when operating at or near the roadway’s capacity, where congestion occurs. Small changes in traffic demand or the occurrence of an incident can greatly affect the travel time. As the impact of these unpredictable exogenous factors is impossible to be considered fully in the modeling process, travel time prediction problem is often associated with uncertainty. Addressing uncertainties associated with travel time, and therefore, travel time reliability has become a topic of interest in recent years. FHWA defines travel time reliability as “the consistency or dependability in travel times, as measured from day- to-day and/or across different times of the day” [18] and proposes several travel time 3 reliability measures: 90th or 95th percentile travel time, buffer index, and planning time index. For prediction purpose, another efficient measure is prediction intervals (PIs) [19], based on which one can assess the reliability of the travel time forecasting results. Prediction interval is an estimated interval, which covers the expected travel time value with a predetermined probability [20]. In other words, PIs give a likely range of the predicting results to represent the uncertainties associated with travel time. Availability of PIs allows the travelers and traffic managers to quantify the level of uncertainty associated with predicted travel time and thus to make multiple strategies on route and departure time choice to deal with the worst and best conditions. Wide PIs indicate a higher uncertainty of the future traffic conditions and travelers should expect extreme delays, while narrow PIs mean the traffic condition are relatively stable [21]. In brief, due to the dynamic and stochastic nature of traffic, travel time prediction is one of the most challenging tasks in ATIS and ATMS. In order to provide meaningful traffic information to travelers and traffic managers, it is critical to develop an accurate and reliable traffic prediction algorithm that not only reduces the absolute value of prediction error but also takes into consideration the uncertainty associated with travel time prediction. 1.2 Research Objectives The primary goal of this research is to identify uncertainties associated with travel time prediction. Both accuracy and reliability issues are addressed in terms of freeway travel time prediction. Prediction accuracy emphasizes the difference between the predicted and the actual value, or in other words, prediction error. Most 4 existing travel time prediction methods in the literature focus on improving travel time prediction accuracy without considering the uncertainty issue. On the other hand, reliability puts more emphasis on uncertainty associated with prediction. Instead of providing a point value (an average of travel time during a certain time interval), a prediction interval is proposed to represent how likely it will capture the observed value. To achieve both objectives, this research introduces and implements two types of data driven approaches to predict travel time and to assess uncertainty associated with predictions. These two types of methods are: statistical volatility models and ensemble methods. A statistical volatility model is promising in terms of modeling uncertainty, as it not only provides the opportunity to develop a more accurate mean model but also produce an effective and efficient prediction interval. This research proposes and compares different types of statistical volatility-based travel time prediction models. The preliminary study results indicate that a statistical volatility model can be a promising approach to account for uncertainties associated with prediction. In recent years, another type of prediction method—the ensemble method— has received increased interest in the prediction field. Instead of fitting a single “best” model, the ensemble method strategically combines multiple simple base models to optimize predictive performance. Drawing from insights and techniques from both statistical and machine learning methods, ensemble methods often achieve strong predictive performance. They are often less sensitive to missing data and outliers and are able to model complex relations among variables. Since traffic is a complex phenomenon, a single model may not be able to capture the complex relations. 5 Combining a group of individual base models potentially improves model performance. The second part of the dissertation will focus on developing ensemble- based travel time prediction models. This research involves following tasks: (1) Investigate the characteristics of travel time data collected from Bluetooth sensors and probe vehicles with a particular focus on the seasonality and variability of data during different time intervals. (2) Demonstrate the concept of prediction intervals in modeling uncertainties associated with travel time prediction. (3) Propose two innovative statistical volatility models that provide more reliable prediction through consideration of the changing behaviors of travel time variation. Both stochastic and seasonal (cyclical) characteristics of traffic data are addressed. (4) Propose an advanced ensemble algorithm for freeway travel time prediction to improve prediction accuracy. (5) Propose a new travel time prediction framework that is able to provide more accurate and reliable prediction, and evaluate and compare performance of different prediction algorithms comprehensively in terms of prediction accuracy and reliability. 1.3 Research Contributions This section lists the main contributions to the state-of-the-art research offered in this dissertation. We have developed two innovative component volatility-based travel time prediction models to better characterize long-term and short-term volatility and 6 cyclical patterns in travel time data. Because of the daily, weekly or even monthly recurrent traffic congestion, travel time data often show strong cyclical patterns. In the traffic prediction field, existing volatility models do not consider the possible cyclical patterns in the residual series, often referred to as a seasonal component. Conventional generalized autoregressive conditional heteroskedasticity (GARCH) models are often criticized as unsatisfactorily modeling data series that show pronounced seasonal patterns [5]. The decomposition technologies provide the potential to deal with trend and seasonal components in the data. Driven by the successful application of statistical volatility models in transportation analyses, the component GARCH models are proposed. The component GARCH models are similar to the structure of the GARCH model but include trend and seasonal elements. The component GARCH models allow for a more versatile structure with the potential to provide more accurate traffic volatility forecasting along freeway corridors. We have introduced advanced solution technologies, namely Bayesian inference using the Markov Chain Monte Carlo (MCMC) based on ancillarity- sufficiency interweaving strategy (ASIS) proposed by Gregor & Sylvia [6], for stochastic volatility (SV) model in travel time volatility forecasting. The proposed method greatly improves the efficiency and robustness of the SV model. We have compared the proposed SV model with the GARCH model by using freeway travel time data and have demonstrated the advanced SV model as a competitive alternative in modeling the volatility nature in traffic. 7 We have proposed a tree-based ensemble method to predict travel time on a freeway stretch by considering all relevant variables derived from historical travel time data. Belonging to the machine learning category, the tree-based ensemble methods often have superior prediction performance over classical statistical models. Driven by the successful applications of random forest methods in traffic parameter prediction, a gradient boosting tree (GBM)-based travel time prediction method is proposed to uncover hidden patterns in travel time data to enhance the accuracy and interpretability of the model. Different from the random forest algorithm that averages a large collection of trees from random sampling [7], the gradient boosting method sequentially generates base learners from a weighted version of the training data to strategically find the optimal combination of trees. Each step of adding another base learner is aimed at correcting the mistakes made by its previous learners. Therefore, the gradient boosting method has the potential to provide more accurate predictions. We have evaluated and tested the effect of different combinations of parameters on the gradient boosting method’s performance in travel time predictions comprehensively. One issue regarding the prediction accuracy of the GBM model is related to parameter optimization. The performance of the GBM model is largely influenced by its parameters, including the number of trees, learning rate and tree complexity (variable interactions). Therefore, there is a need to test the optimal combination of variables when developing the GBM model. Computational time is another issue when tree complexity or the number of trees increases. The tradeoff 8 between computational cost and model accuracy should also be considered when building the model. We have developed a new travel time prediction framework that takes advantage of both the GBM and SV models. Through the combination of these two models, the new proposed prediction framework potentially further improves the model performance. The proposed travel time prediction framework also provides an example of how to improve the overall performance of the travel time prediction model. 1.4 Dissertation Outline The rest of this dissertation is organized as follows. The next chapter reviews previous works in freeway travel time prediction. The existing literature is categorized as parametric, non-parametric, hybrid and prediction interval based approaches. Chapter 3 describes statistical volatility models in travel time reliability prediction. The model structure and the concept of prediction interval are introduced. The statistical volatility models are composed of two parts: mean and prediction interval. The mean prediction models and the volatility models (to construct prediction intervals) are discussed in Chapter 3.1 and in Chapter 3.2. Chapter 3.3 performs a case study using volatility models in travel time prediction. Chapter 4 describes ensemble methods in travel time prediction. Chapter 4.1 summarizes common types of ensemble methods. Chapters 4.2 and 4.3 discuss in detail different ensemble tree methods and the application of these ensemble methods in travel time prediction. Chapter 5 proposes a new travel time prediction framework. Chapter 6 concludes the dissertation and also provides further recommendations for future 9 research in this topic. 10 Chapter 2: Literature Review Short-term traffic prediction as a critical component in a real-time ITS environment has seen an explosion of interest since the 1980s. A large number of forecasting algorithms have been proposed in the literature. Because of the complexity of the traffic prediction problem, existing traffic prediction approaches are different from one another in different aspects. Vlahogianni et al. [8] suggested that the process of developing short-term traffic prediction can be divided into three essential clusters (scope, conceptual output and modeling) that involves two important issues (design and modeling parameters). The designing process mainly focuses on the objective of forecasting that includes what types of application, where to implement, and the desired output of the model. The modeling parameters procedure is the way to achieve the goal determined during the design process. This study belongs to the modeling parameters procedure. Since this study primarily focuses on freeway travel time prediction, we will emphasize the literature on different modeling techniques in this area. There are several review or comparative studies in existing short-term traffic prediction methods [8-11] and interested readers can refer to these studies for further references. Vlahogianni et al. [8] classified traffic prediction models as consisting of three modeling approaches: parametric, non-parametric and hybrid methods. The parametric approach usually assumes a specific form for the dependent and independent variables. The modeling process involves model identification, parameter estimation, model diagnostic checking and prediction. The parametric models often have more assumptions than non-parametric methods. If these 11 assumptions are satisfied by the data, the parametric approach can produce accurate estimation. Otherwise, the parametric approach can be misleading. The non- parametric models are data driven approaches that usually do not assume a specific structure of the data. These algorithms heavily depend on the quality of the available data. Another traffic forecasting approach is the hybrid method that combines different models or the same type of models with different initial values or parameters to obtain better prediction performances. In addition, as uncertainties are often involved in traffic prediction, prediction interval-based approaches consider the future traffic parameter (such as volume, speed or travel time) as a distribution instead of a point value. Mostly existing prediction-interval based approaches belong to the hybrid method. Because of its importance, we discuss this approach as a separate category. The following sections will summarize the major research findings of existing literature based on the taxonomy proposed by Vlahogianni et al. [4]. 2.1 Parametric Approaches 2.1.1 Naïve Methods The naïve method can be interpreted as a simple and easy implementation method without many model assumptions. Historical average and smoothing [12] techniques received extensive attention in practical applications [13, 14]. The historical average model simply averages historical traffic data on either a certain time of the day, day of the week or other time periods based on the assumption that traffic shows similar patterns throughout the day, week or year, e.g. traffic patterns day to day often show remarkable similarities and these patterns are useful for prediction. The historical average method has already been applied to the urban 12 traffic control systems (UTCS) [15] and other various traveler information systems [16, 17]. However, as traffic conditions are highly dynamic, the naïve method is often a poor predictor. 2.1.2 Autoregressive Linear Processes In the early 1990s, transportation researchers developed an alternative approach—autoregressive linear process such as the autoregressive integrated moving average (ARIMA) type models in predicting traffic. The ARIMA-type models were first introduced by Ahmed and Cook [18] and Levin and Tsao [19] in freeway traffic flow and occupancy prediction. Their studies indicated that the ARIMA-type models provide better forecasting accuracy compared with historical average and smoothing techniques. Applications of the ARIMA model in traffic parameters forecasting was also discussed in later studies [20-24]. As traffic parameters show spatial-temporal correlations, both Kamarianakis and Prastacos [22] and Min and Wynter [25] adopted a multivariate spatial-temporal autoregressive moving average model to predict traffic flow. Because of its well defined theoretical foundation and effectiveness in prediction [26], the ARIMA-type models gradually become standard methods to compare with newly developed forecasting models. However, the ARIMA-type models are sensitive to extreme values. This makes the ARIMA-type model less efficient when modeling data with large variations. 2.1.3 State Space Models State space models belong to the multivariate forecasting category, which can be applied to multiple inputs – multiple outputs systems. It is worth noting that the 13 ‘state space model’ and the more widely known ‘Kalman filter model’ refer to the same model structure. The term ‘state space’ refers to the model, the term ‘Kalman filter’ refers to the process of estimating and updating model parameters. Stathopoulos and Karlaftis [27] applied a multivariate state space model to predict flow at an urban signalized arterial. Ghosh et al. [28] applied a structural time series model to forecast traffic flow in a congested urban transportation network. The structural time series model is a special form of the state-space model that represents the observed time series as a sum of different components. Their study results indicate that the proposed model is computationally more efficient and can trace the evolution of each individual component separately. The Kalman filtering algorithm allows the parameters of the model to be updated with new data available. Therefore it enables dynamic traffic prediction [29-35]. 2.2 Non-parametric Approaches 2.2.1 Non-parametric Regression Nonparametric regression is a form of regression analysis that does not predetermine a specific form for the predictor. It relies on data to describe the relationship between dependent and independent variables and is based on the principle of pattern recognition and chaotic systems [36]. Smith and Demetsky [37] demonstrated the advantages of the nonparametric regression approach when compared with the neural network. Another study by Smith et al. [36] suggested that the heuristic forecast generation method improves the performance of the nonparametric regression but does not necessarily perform better than the seasonal ARIMA model. Clark [38] proposed a nonparametric regression technique described 14 as a k nearest neighbor (k-NN) model to predict traffic state variables. Davis and Nihan [39] applied the k-NN model in predicting freeway traffic. They suggested that the k-NN method is comparable to the linear time-series approach. Robinson and Polak [40] proposed the use of the k-NN technique to model urban link travel time. They discussed in details the selection of a distance metric, local estimation measure and value of k. Myung et al. [41] applied the k-NN to predict travel time with data from a vehicle detector system and an automatic toll collection system. Zou et al. [42] utilized both k-NN and a multi-topology neural network model in predicting freeway travel time. Their proposed model provides reliable travel time predictions in uncongested, congested, and transition traffic conditions. 2.2.2 Neural Network Van lint et al. [43] applied a state space neural network that is capable of dealing with the spatio-temporal relationships of traffic. Yin et al. [44] developed a fuzzy-neural model (FNM) to predict traffic flow in an urban network. Their model applied a gate network (GN) that divides input data into several clusters using a fuzzy approach, and then applied the expert network (EN) to specify the input-output relationship. Ishak et al. [45, 46] compared three different neural networks: simple recurrent networks (Jordan–Elman), partial recurrent networks (PRNs), and time- lagged feed forward networks (TLFN), with different input parameters for traffic prediction. Jiang and Adeli [47] proposed the dynamic time-delay wavelet neural network model in freeway traffic flow forecasting. The proposed model considers both the time of the day and the day of the week when predicting traffic flow. Liu et al. [48] applied the state-space neural network model to predict travel time when there 15 is missing data. Their proposed method is insensitive to missing data and can provide accurate forecasting. Quek et al. [49] applied a specific class of fuzzy neural network models in short term traffic flow prediction. The proposed model known as a pseudo outer-product fuzzy neural network using the truth-value-restriction method (POPFNN-TVR) was shown to outperform the conventional feed forward neural network using back propagation (BP) learning. Zheng et al. [50] suggested that a certain model has superior performance for a particular time period and combining single neural network predictors may improve forecasting accuracy. They developed a Bayesian combined neural network model that combines the back propagation and the radial basis function neural networks in traffic flow forecasting. The credit of each individual model is estimated based on the theory of conditional probability and Bayes’ rule and largely depends on the accumulative prediction performance of previous time step. Zeng & Zhang [51] applied four different neural network models and proposed one model in freeway travel time forecasting. These models include multilayer feed forward neural network, time-delay neural network, state-space neural network, and nonlinear autoregressive with exogenous inputs. In their study, they analyzed the effect of different input variables and the temporal-spatial inputs on model performance. Their study results indicate that the temporal-spatial inputs can greatly improve the model performance; the state-space neural network and the time delayed state-space neural network outperform the other models. 16 2.2.3 Other Artificial Intelligence Methods Sun et al. [52] considered spatial and temporal correlations of traffic flow among adjacent road links and developed a Bayesian network approach to forecast traffic flow. In their paper, the joint probability distribution between the upstream and downstream locations is described as a Gaussian mixture model (GMM) with parameters estimated via the competitive expectation maximization (CEM) algorithm. Hong [53] proposed a traffic flow forecasting model that combines the seasonal support vector regression model with chaotic simulated annealing algorithm(SSVRCSA). In his paper, the chaotic simulated annealing algorithm is proposed to determine the value of three parameters in a SVR model; seasonal adjustment factors are then applied to deal with the cyclic trend. 2.3 Hybrid Approaches 2.3.1 Classification Based Approach The classification-based hybrid approach classifies traffic data into different groups first and then assigns different models according to the characteristics of data in different groups. Danech-Pajouh and Aron [54] proposed an ATHENA model that groups data according to their similarities and then assigns a different linear model to each cluster. Antoniou et al. [55] developed a dynamic data-driven framework for traffic state estimation and prediction. Their model first clusters existing observations into several groups, and then predicts the future traffic state by modeling the evolution of traffic history states as a Markov process. By estimating a flexible model for each cluster, future traffic speed can be obtained. Van Der Voort et al. [56] 17 combined Kohonen maps with ARIMA time series. The study results were promising compared with the ATHENA model. Later, Chen et al. [57] used a self-organizing map (SOM) to initially classify traffic data into different groups and then applied the ARIMA and the multi-layer perception (MLP) model as two prediction methods. Their study results suggested that the SOM/ARIMA hybrid approach is more sensitive to missing data than the SOM/MLP hybrid approach. 2.3.2 Kalman Filtering Based Approach The Kalman filtering method is a promising method to train and update model parameters and has been applied to different kinds of forecasting models to enable continuous parameter updating. Yu et al. [58] applied the support vector machine (SVM) method to predict baseline travel time and used the Kalman filtering technique to adjust the prediction results with updated information. Stathopoulos and Dimitriou [59] proposed a forecasting approach that utilizes a fuzzy rule based system (FRBS) that nonlinearly combines traffic flow forecasting results from an online adaptive Kalman filter (KF) and an artificial neural network (ANN) model. Their study results indicate that the combined approach improves the forecasting accuracy compared with each individual model. Liu et al. [60] proposed the extended Kalman filter (EKF) method to train the parameters of the state-space neural networks (SSNN). The proposed algorithm was tested in an urban network and the study results indicate that the proposed method is 20 times faster than the SSNNLM model with a slightly worse forecasting accuracy. 18 2.3.3 Decomposition Technique Decomposition techniques decompose a complicated data set into small elements. In terms of prediction, decomposition techniques can be utilized to reduce noisy information in traffic data to improve their prediction performance [61]; they could also be used as the basis for combining different models [62]. Some popular decomposition techniques include: Fourier methods, discrete wavelet transform (DWT) and empirical mode decomposition (EMD). Hamad et al. [63] suggested that more accurate prediction of speed data can be obtained through decomposing the time series into its basic components. They utilized the empirical model decomposition to filter out unimportant elements and applied a multilayer, feed forward neural network with BP to predict freeway travel speed. Later, Chen and Wu [64] applied empirical model decomposition and gray theory [65] in predicting bus travel time. Wei and Chen [66] forecasted metro passenger flow through empirical model decomposition and neural networks. They applied the EMD method to decompose traffic flow as several intrinsic mode function (IMF) components and selected the important information as input for back- propagation neural networks (BPNN). Their study results indicate that treating the important and non-important IMF as different inputs of the BPNN would improve the forecasting accuracy. Jiang and Adeli [67] proposed a hybrid wavelet packet-ACF method to analyze traffic flow time series and concluded that the discrete wavelet packet transform method de-noises the signal even more effectively than the conventional wavelet transform. Xie et al. [68] applied a discrete wavelet decomposition method to remove noise in original traffic data and utilized the 19 Kalman filter prediction model to the modified data to predict future traffic. Their study results indicate that removing noise in the original traffic data has the potential to improve the performance of a Kalman filter model in traffic volume forecasting. Daniel et al. [61] applied a wavelet based method to remove noise in the original traffic data and applied self-organizing neural networks as the prediction method. Wang and Shi [69] developed a hybrid traffic speed forecasting model based on support vector machine (SVM) regression theory. In their study, they constructed a new kernel function using a wavelet function to capture the non-stationary characteristics of the data and then used the Phase Space Reconstruction theory to identify the input space dimension. They assumed that the collected data are often accompanied with measurement errors; therefore they applied wavelet de-noising method to remove the noise in the traffic speed data. 2.3.4 Ensemble Trees Leshem and Ritov [70] proposed a traffic flow prediction algorithm by combining Random Forests algorithm [7] into an Adaboost algorithm as a weak learner. The proposed algorithm is proved to be able to deal with missing data and is effective in predicting multiclass classification problems. Hamner [71] applied a random forest method in travel time prediction and their method is able to provide accurate travel time prediction. Wang [72] applied an ensemble bagging decision tree (ensemble BDT) to predict weather impact on airport capacity and demonstrated the superior performance of ensemble BDT compared with single SVM classifier. Ahmed and Abdel-Aty [73] utilized a stochastic gradient boosting method in identifying hazardous conditions based on traffic data collected from different 20 sensors. Their study results suggested that the proposed stochastic gradient boosting method has considerable advantages over classical statistical approaches. Similarly, Chung [74] applied boosted regression trees to study crash occurrence. Both studies utilized the boosting method to study classification problems. 2.3.5 Other Combination Approaches Zheng et al. [50] proposed a freeway traffic flow prediction method that combined a back propagation neural network and a radial basis function neural network based on Bayesian model combination approach. The output of the proposed model is a weighted combination of the output of the two neural networks and the weight is estimated based on conditional probability and Bayes’ rule. Hinsbergen et al. [75, 76] trained feed-forward neural networks and state-space neural networks using Bayesian inference theory. A simple average over all group members was used to combine neural networks into a group. Their study results indicate that the proposed framework provides a more accurate forecasting of both the mean and the prediction intervals. Zhang and Liu [77] predicted travel time index by utilizing six baseline individual predictors as basic combination components and combined them through four combined predictors including equal weights (EW), optimal weights (OW), minimum error (ME) and minimum variance (MV) methods. Here, travel time index is the ratio of average travel time and free flow travel time. Vlahogianni et al. [78] proposed a modular neural network prediction model that considers both spatial and temporal correlations of traffic data. In their proposed model, the spatial representation of traffic information collected from individual location is addressed through a system’s modularity. Each module consists of a time delayed feed-forward 21 neural network (TDNN) that represents the time evolution of traffic for a corresponding location. The temporal optimization of the input windows in each TDNN is through genetic algorithms (GAs). 2.4 Prediction Interval Based Approaches Although a wide range of approaches has been applied to the traffic prediction field and has shown promising predicting abilities, some of them have limited abilities to capture the uncertainty and variability of traffic, as they only provide a point estimate to represent future traffic conditions. Traffic condition is a complex phenomenon, as it is often affected by the interactions among different vehicles and exogenous factors such as incident, weather, demand, and roadway conditions. Small changes in current traffic conditions may greatly affect future travel time. For example, an incident during peak hour may result in extreme delays in the near future. Due to the highly dynamic nature of traffic, predicting travel time is often associated with uncertainty, especially during non-recurrent congestion when incident or bad weather occurs. A point estimate provides limited information regarding the uncertainty and unreliability of travel time. On the other hand, prediction intervals (PIs) have the potential to provide more reliable forecasting results by providing a confidence band to indicate how reliable the forecasting results are. There are few studies using prediction intervals to model uncertainties associated with travel time prediction. 22 2.4.1 Ensemble Methods Khosravi et al. [79] [80] developed different neural network based approaches to provide PIs to capture uncertainties in travel time. As there is always a mismatch between the predicted and actual values, PIs provide a range that can capture the uncertainty. Van Lint [81] proposed an ensemble of state-space neural network (SSNN) models to predict prediction intervals and mean travel time. The constructed prediction interval captures the uncertainty associated with travel time prediction. Zeng and Zhang [51] employed an ensemble method (using multiple instances of the same neural network model with different initial conditions) to derive a prediction band, but the method can be computationally intense. Fei et al. [82] proposed a Bayesian inference-based dynamic linear model that considers freeway travel time as the sum of the median of historical travel times, time varying random variations in travel time, and a model evolution error. Their proposed model prediction result is a travel time distribution that can generate a mean and a prediction interval representing uncertainty associated with travel time prediction. Van Hinsbergen [76] proposed an approach that combines neural networks in a group using Bayesian inference theory to predict travel time with prediction intervals. Li and Rose [83] developed a model that models average travel time and travel time variability separately to incorporate uncertainty in travel time prediction. All these studies indicate that travel prediction is a complex problem often associated with uncertainty. Prediction interval based approaches provide the potential to capture the dynamic changes of traffic. 23 2.4.2 Statistical Volatility Based Approach Besides the ensemble approaches, another popular method that was able to capture the uncertainty and variations of data is the statistical volatility approach[84]. This approach relaxes the constant variance assumption and models time changing variance as a function of its past values. As a result, the statistical volatility modeling approach can capture the dynamic changes of travel time variations and can provide more accurate PIs. The first volatility model, the ARCH model, was proposed by Engle in 1982 for financial analysis purposes [84]. Later, different variations of the ARCH model were formulated and the GARCH [85] model is one of the most widely used models. Driven by its successful applications in financial and other areas [86], transportation professionals began to apply the family of GARCH models to predict traffic volatilities. Kamarianakis et al. [87] suggested that traffic conditions are much more volatile during heavy traffic or congestion periods than at other times and effective modeling of variance can produce more accurate confidence intervals. They tested the performance of the ARIMA-GARCH model by using traffic flow data in an urban network and demonstrated that traffic flow data displayed time dependent volatilities. They also suggested that further studies should consider the asymmetric effects of positive and negative shocks. Zhang et al. [88] considered the asymmetric effects of positive and negative shocks and studied two asymmetric GARCH models: EGARCH and GJR-GARCH in travel time forecasting. Their study result indicated that the GJR-GARCH model performs better. Tsekeris and Stathopoulos [89] incorporated fractionally integrated components in both the conditional mean and the conditional variance equations and 24 proposed the ARFIMA-FIAPARCH model. They found that the proposed model improves the accuracy of predicted volatility. Similarly, Karlaftis and Vlahogianni [26] suggested over-differentiation leads to over-inflated MA terms and applied the ARFIMA-FIGARCH model for traffic flow prediction. Tsekeris and Stathopoulos [90] predicted urban traffic variability through a stochastic volatility modeling approach. Their study results demonstrated that the stochastic volatility model outperforms the GARCH model as a latent stochastic process and can better represent the speed variability dynamics than a stochastic process with a predetermined structure concerning the decaying impact of shocks. Yang et al. [3] applied seasonal ARIMA, ANN and historical mean methods as the mean equation and the GARCH model to predict urban vehicle travel time. Their study results indicate that the proper selection of the mean equation can lead to excellent results. Xia et al. [91] considered the relationship between flow and speed and proposed a VAR–MGARCH model to predict traffic flow and speed for urban roads. Guo [92] proposed an online algorithm of the autoregressive moving average (ARMA)-GARCH model trough Kalman filters in predicting traffic speed. 2.5 Summary In summary, traffic prediction algorithms can be categorized into parametric, non-parametric and hybrid approaches. The parametric approaches usually have a clear model structure and well established theoretical foundations. Compared with nonparametric approaches, they are easier to interpret. Some of them, for example, the historical average, have been implemented in existing traffic control devices for providing traffic information. However, this type of method sets a series of strict 25 model assumptions. Misusing the model with the wrong data (data that does not meet certain model assumptions) will lead to inaccurate prediction. As traffic in different locations shows different characteristics, it is necessary to understand both the model and the data to select the appropriate model. The non-parametric approach usually has fewer modeling assumptions compared with the parametric approach. Some popular methods include non- parametric regression, neural networks, SVM and other artificial intelligent methods. The structure of this type of model is usually developed based on the data. Especially, the neural network methods are analogous to a ‘black-box’. The users give some simple inputs to the model and get decent predictions, but they usually are unaware of the structure of the model. During the past few decades, the non-parametric methods attracted significant attention in the traffic prediction field because of their ability in modeling complex data. The hybrid methods consider traffic as a complex phenomenon that cannot be represented by a single model. By taking advantage of different models, it is aimed at improving model prediction performance. However, because it involves several different models, the hybrid methods are often complex. Besides prediction accuracy, it is also critical to develop a reliable prediction model that considers uncertainty associated with travel time prediction. Therefore, the prediction interval based approach is proposed to take into consideration this uncertainty. There are generally two categories: ensemble and statistical volatility methods. The ensemble method constructs the prediction interval through developing 26 different base models while the statistical volatility method models the evolution of the changing behavior of the variance part of traffic data. Although a large number of traffic prediction algorithms are proposed in the literature, prediction accuracy and reliability are still two challenging issues. In terms of prediction accuracy, most existing models can be highly accurate during non-peak hours as variations of travel time are not significant from day to day. However, prediction accuracy deteriorates during peak hour. Improving travel time prediction, especially during congested periods, is critical. To address this issue, it is important to develop an advanced prediction method that is able to model the complex relation of traffic data. As mentioned in the previous section, the ensemble methods have shown promising prediction results. This research studies and proposes a novel ensemble method for predicting travel time. At the same time, reliability is also an essential issue in travel time prediction. Since there is always a mismatch between the predicted and the actual value, there is a need to measure or predict this “mismatch” or uncertainty. Prediction interval based approaches are able to model this uncertainty. However, it is a relatively new concept and there is limited literature in this field. This research will further explore more advanced prediction interval based approaches to better model this uncertainty. 27 Chapter 3: Statistical Volatility Models for Reliable Travel Time Prediction The observed travel time can be decomposed into a conditional mean (u?) and a residual (r?) component. The traditional time series based travel time prediction methods study correlations between travel time at different time lags or at different locations, assuming constant variance of the data (r? is constant across different time intervals). Therefore, they only model the time variation of the data for the first order moment and predict the mean part u?. However, uncertainty often exists in the data, especially for travel time, which can be dramatically affected by some unexpected external factors, such as bottlenecks, traffic incidents, work zones, weather and special events. The point prediction results become less reliable because of the presence of these unexpected factors. As travelers are used to the daily congestion due to regular traffic demand, it is the unexpected delays that generate the most dissatisfaction. Modeling the uncertainties, referred to as the conditional standard deviation (or residual r?), would improve forecasting reliability. Equation (1) is the basic structure of a travel time prediction model. ?? = ?? + ?? (1) where ?? is the observed travel time at time ?, ?? represents the estimated conditional mean, and ?? is the residual part. Traditional prediction methods only focus on the estimated conditional mean (u?) component and treat the residual (??) part as having a constant variance. However, in real situations, the variations in traffic and travel times can be different during different time periods. Therefore, prediction interval based approaches are 28 proposed to model this uncertainty (the residual part ??). By providing a prediction interval, we have an idea of how likely this estimated range would capture future travel time. In other words, a prediction interval is an estimated range that captures the future observation, with a prescribed probability, given the current available observations. As illustrated in Figure 1, a prediction interval is comprised of an upper and lower prediction limit that indicates the accuracy of the model output with respect to the observed value. The traffic is a complex and uncertain system. Due to the uncertainty related with the data and the estimated model, there is often a mismatch between the model output and observed value. Models that only provide a point value (the predicted value) have limited abilities to capture the uncertainty and variability of travel time especially during congested situations. PIs provide a range to indicate how likely the travel time is during the next time interval. Therefore, PIs have the potential to capture the fluctuations and the stochastic traffic phenomena. Usually, a wider prediction interval is associated with larger variation in travel time. Figure 1. Concept of prediction interval [93] Upper Prediction Interval Lower Prediction Interval Predicted Observed Upper Bound Lower Bound Output Sample 29 The statistical volatility-based travel time prediction models include two parts: predicted mean value (the red triangle in Figure 1) and prediction interval (the green vertical lines). This section will first study efficient mean prediction models for short- term travel time prediction purposes. The second part studies the application of statistical volatility model in predicting the variance part of the travel time data, and the construction of travel time prediction intervals to account for uncertainties associated with prediction. 3.1 Mean Prediction Models The first step of the modeling stage was to estimate the mean of the data. In the literature of traffic parameters forecasting, different mean equation models have been tested. For example, Kamarianakis et al. [87] applied the ARIMA model as the mean equation of the volatility model. Karlaftis and Vlahogianni [26] proposed using the ARFIMA model to capture the long memory in the conditional mean. Yang et al. [3] adopted three different mean equations for volatility models: SARIMA, ANN, and historical average methods. Among these existing methods that have been proposed in the literature, the ARIMA-type model becomes one of the most widely used methods due to its ease of implementation and its well-known ability in traffic parameters modeling and forecasting [26]. Therefore, for the purpose of studying the performance of different volatility models, this section applies the ARIMA model as the mean equation. However, it is worth noting that a proper mean equation model should not be restricted to the ARMA-type model. Properly choosing a mean equation regarding the structure of the data can lead to better model performance [3]. 30 3.1.1 Theoretical Background of ARIMA Models This section introduces the ARIMA model to capture the mean of this variation. ARIMA models, as one of the most general classes of time series models, predict the current value based on its past values. The ????? (?, ?, ?) model is comprised of three parts: autoregressive ?? (?), integrated ? (?), and moving average ?? (?). If we define B as the backshift operator with B?x? = x???, The corresponding part of the ARIMA model for a given time series ?x?, x?…x?? consists of the following parts: The autoregressive part of order p is denoted as ?? (?) and is formulated as: x? = ϕ?x??? + ϕ?x??? +⋯+ ϕ?x??? + z? (2) where ϕ?…ϕ?are parameters, z? is a white noise process with zero mean and variance σ??. The equation can be rewritten in the following form by using the backshift operator: ?1 − ϕ?B − ϕ?B? −⋯− ϕ?B??x? = z? (3) Or more concisely as: ϕ(B)x? = z? (4) where ϕ(B) = 1 − ϕ?B − ϕ?B? −⋯− ϕ?B?. As many time series are non-stationary, it is necessary to transform the original data to a stationary series. There are several different ways to transfer the data to stationary, such as difference the data, remove the trend of the data (if the data contain a trend), taking the logarithm or square root of the series for data with non- constant variance. In travel time prediction, differencing the original data is most often used technique to achieve stationary. Here differencing means taking the 31 difference of two observations that are d periods apart. The integrated part with order ?, denoted as ? (?), means the dth difference of the original data: (1 − B)?x? (5) The moving average part with order q, denoted as ??(?), is the process that the current value of x? is a linear combination of a white noise series: x? = z? + θ?z??? + θ?z??? +⋯+ θ?z??? (6) where θ?…θ? are parameters, z?…z??? are white noise processes with zero mean and variance σ??. Equation (6) can be rewritten in a more concise form by using the backshift operator: x? = (1 + θ?B + θ?B? +⋯+ θ?B?)z? (7) x? = θ(B)z? (8) where θ(B) = ?1 + θ?B + θ?B? +⋯+ θ?B??. The ????? (?, ?, ?) model is a generalized version of the autoregressive integrated moving average process with ? as the number of autoregressive terms, ? as the number of differences, and ? as the number of lagged forecast errors: ?1 − ϕ?B − ϕ?B? −⋯− ϕ?B??(1 − B)?x? = (1 + θ?B + θ?B? +⋯+ θ?B?)z? (9) This can be written in a concise form as: ϕ(B)(1 − B)?x? = θ(B)z? (10) ?z??~WN(0, σ?) Readers who are interested in theoretical foundations of the ARIMA model could refer to the book “Time series analysis and its applications: with R examples" [94] for details. 32 3.1.2 ARIMA Model Optimization Order selection and parameter estimation are two major steps for the ARIMA model forecasting. Selecting a proper order of the ARIMA model is important for producing accurate forecasting results. From a prediction point of view, it is not a wise choice to select ? and ? arbitrarily large. Large ? or ? values potentially lead to over-fitting issues. To avoid over-fitting problems, a penalty factor is introduced to discourage a model with too many parameters. Some widely used criteria for model selection are the Akaike’s information criterion (AIC), the corrected Akaike information criterion, (AICc) and the Bayesian information criterion (BIC). The preferred model is the one with the minimum value of one for these criteria. This study selects the order of an appropriate ARIMA model based on the Akaike’s information criterion (AIC). The best model should be the one that has the smallest value of AIC: AIC = −2 log(L) + 2m (11) where ? is the likelihood of the data for the specific model and ? is the number of parameters selected for this model. This research utilizes the method proposed by Hyndman and Khandakar [95] to select the orders of the appropriate ARIMA model automatically. After determining the best orders of the ARIMA model, the parameters of the model are estimated through the maximum likelihood method. Detailed information on theoretical background and steps in fitting an ARIMA model can be found in [96]. 33 3.1.3 Model Evaluation Criterions There are several well established methods to evaluate model performance of the mean equation model. This research applies two measures of effectiveness to test the mean part: the root mean squared error (RMSE) and the mean absolute percentage error (MAPE). The RMSE is a frequently used measure of the difference between values predicted by a model and the actual observation. It is measured in the same unit as the original data. The MAPE is another commonly used measure of effectiveness. Different from the RMSE measures, the MAPE is expressed in percentage terms. Therefore, it provides us a general sense of the error even without knowledge of what constitutes a “big” error for the data set. The equations for the RMSE and MAPE are as follows: RMSE = (?(t(i) − a(i))?n ? ??? )?/? (12) MAPE = 1n??t(i) − a(i)t(i) ? ? ??? (13) where t(i) is the actual value, a(i) is the forecast value, and n is the total number of time periods. 3.2 Volatility Models As mentioned in Equation (1), the structure of the proposed method is the sum of the mean and the variance. Most traditional models only concentrate on the mean part and assume that the variance part simply satisfies the white noise properties. Volatility models relax the assumption and characterize the changing variance of the data through time. Volatility models aim at specifying how the conditional variance ?? evolves over time. Different ways to address the variance part ?? lead to different 34 kinds of volatility models. In general, there are two different categories: the GARCH- type model and the stochastic volatility model. 3.2.1 GARCH-type Models The GARCH-type models aim at capturing the changes of the variance part r?. The first volatility model was proposed by Engle [84] in 1982, termed as the Autoregressive Conditional Heteroskedasticity (ARCH) model. This model was originally used to capture the uncertainty of financial data. This type of uncertainty refers to the variances and covariance that change over time. In his research, the discrete time stochastic process r? is expressed as [97]: r? = σ?ϵ? (14) σ?? = Var(r?|F???) = Var(x?|F???) (15) Here ϵ? is an independent and identically distributed (i.i.d) process with zero mean and standard deviation one and F??? denotes information available throught − 1. The above equation forms the foundation of the volatility model. Its various extensions are all based on this equation. Different ways of modeling σ? lead to a wide variety of volatility models. Engle suggested in his paper that σ?? can be a linear function of past squared values of the r? process: σ?? = α? +?α?r???? ? ??? (16) Note thatα? is the intercept term with α? > 0, α? represents the unknown coefficient of r???? that satisfy α? ≥ 0 to ensure the conditional variance as positive, and m denotes the number of lags selected for the model. This structure clearly captures the cluster of volatilities. The current magnitude of r? is based on past values of r???? . 35 Therefore, a sudden large change in the data would more likely lead to another large change. In other words, the probability of obtaining a large variance is greater than that of obtaining a small variance if the past value of the variance is large. This structure also works in traffic forecasting. For example, congestions would create unexpected delays, which lead to dramatic increase in travel time. This phenomenon would usually last for considerable time periods. Although the ARCH process has been proven useful in modeling the uncertainty in data, it often requires a relatively long lag. In order to allow both a longer memory and a more flexible lag structure, a generalized version of the ARCH model – Generalized Autoregressive Conditional Heteroskedasticity (GARCH) –was proposed by Bollerslev [85], which included the past value of σ?? in the model structure: σ?? = α? +?α?r???? ? ??? +?β?σ???? ? ??? (17) in which a? > 0, a? ≥ 0, i = 1, … ,m,, β? ≥ 0, i = 1, … , s, ∑ (α? + β?) < 1??? (?,?)??? . The appealing feature of the GARCH model is that it takes into account that σ?? not only depends on past values of the variance part r???, but also on its own past values σ???? . The GARCH model can be interpreted as an ARMA process. Applications of the GARCH model in many different fields demonstrate its ability in modeling uncertainty in data. Lots of variations of the volatility model inherit this unique structure. One successful type of variant is termed as component GARCH model. 3.2.2 Component GARCH Models Component GARCH models aim at capturing the trend and seasonal (cyclical) components in data. In the traffic prediction field, no research has yet been conducted 36 to study the seasonality and trend in data through the component GARCH model. One innovation of this research is introducing the component GARCH model in travel time reliability prediction. Two different component GARCH models are proposed in this research: the component GARCH (C-GARCH) and the multiplicative component GARCH (MC-GARCH) models. As trend and seasonal components are often observed in data, it was argued that the conventional GARCH model is unable to provide adequate performance [98] if trend or seasonal components exist. Several variations of the GARCH model were developed to take into account the trend and seasonal volatility patterns. We term these models as component models that decompose the data as trend, seasonal, and random elements. The trend component represents long term changes in the level of the data series while the seasonal factor is the periodic fluctuations within the data series. Two different structures can be considered as basic component models: The additive model, where x? = s? + t? + e? (18) The multiplicative model, where x? = s? × t? × e? (19) where s? represents the seasonal effect, t? represents the trend, and e? represents the errors. The additive model is applied to data series where the magnitude of the seasonal fluctuation does not change regardless of the level of the data series. The multiplicative model applies to situations in which the seasonal variation increases/decreases with the level of the series. 37 Based on the additive model structure, the component GARCH (C-GARCH) model proposed by Engle and Lee [99] decomposes the conditional variance into a long term and a transitory component. The equation of the component GARCH model is as follows. σ?? = q? +?α?(r???? − q???) ? ??? +?β?(σ???? − q???) ? ??? (20) q? = α? + ρq??? + φ(r???? − σ???? ) (21) where α?, β?, ρ, φ are unknown parameters, as is α?. In this model, the intercept term q? is regarded as a time-varying process, which represents the long term component of the conditional variance. The difference of the conditional variance and the long term component σ???? − q??? is the transitory component that models the short-term volatilities. The multiplicative component GARCH model [5] assumes the variations increase with the level of data and decomposes the variance part into three multiplicative components: daily component d?, deterministic diurnal pattern s?, and stochastic intraday component q?,?. x?,? = u?,? + r?,? (22) r?,? = ?d?s?q?,?ε?,? (23) where the travel time at time index i in day t consists of the conditional mean u?,? and variance r?,? equations. ε?,? is the i.i.d (0,1) standardized innovation which can follow a normal, a student-t distribution, etc. The daily part d? models the variance of the data across different days. It can be derived from a multifactor risk model, a daily GARCH model or a multiple indicators model [5]. The deterministic diurnal component for each time index is estimated as: 38 s?? = 1T?r?,? ? d? ? ? (24) where T is the total number of days, t denotes the day, i denotes the regular spaced time intervals. As indicated in the above equation, the diurnal component at time index i is the average of variance at time index i scaled by its corresponding variance for each day. Therefore, the diurnal component represents the regular intraday variations. After estimating the daily and deterministic intraday components, the rest component in the variance part is regarded as stochastic, and can be regarded as a GARCH (p,q) process. The normalized residual is: z?,? = r?,? ?d?s?⁄ = ?q?,?ε?,? (25) where the stochastic intraday component q?,? is assumed to follow the GARCH process: q?,? = α? +?α?z?,???? ? ??? +?β?q?,??? ? ??? (26) From the perspective of travel time prediction, travel time exhibits both regular cyclical patterns (seasonal component) and stochastic patterns. Daily cyclical patterns distinguish travel time as peak hour and non-peak hour traffic. Stochastic patterns are the results of unexpected influential events, like bad weather conditions and traffic incidents. Capturing the time-varying features of traffic behavior is critical for travel time forecasting. In addition, decomposing data into cyclical and stochastic patterns provides a better understanding of the underlying structure of the data. The component GARCH model, a generalization of the traditional GARCH model, 39 considers the seasonal and trend components in the variance part through decomposition. 3.2.2 Stochastic Volatility Model Most of the existing studies focus on the GARCH-type model. There are limited studies applying a stochastic volatility type model in traffic prediction. Part of the reason is that the estimating process for the stochastic volatility model is much more complex compared with the GARCH-type model since a new parameter estimation method is introduced to improve the estimating performance of the stochastic volatility model. The conditional volatility of the GARCH model in Equation (14) is a deterministic function of past quantities. Provided that all relevant information is available, the model could be specified at the present time period. The stochastic volatility model is a competitive alternative to the GARCH-type model by modeling volatilities non-deterministically, which treats the volatility as a random process and evolves stochastically over time. Since traffic involves interactions among different factors such as demand, incident, and weather, this often makes future traffic conditions uncertain. Modeling the conditional variance as an unobserved stochastic process allows for a more flexible applications of the SV model and can account for the uncertainty inherent in traffic phenomena [90]. Based on the canonical model [100] of the stochastic volatility class, the volatility part r? of Equation (1) can be expressed as follows: r? = exp (h?/2)ε? (27) 40 h? = μ + ϕ(h??? − μ) + σv? (28) h?~N(μ, σ?1 − ϕ?) (29) where r? represents volatility of travel time during time interval t. ε? is a Gaussian white noise sequence with mean 0 and variance 1, and v? is also a Gaussian white noise sequence with mean 0 and variance 1 which is independent of ε?. The unobserved process h? is interpreted as a stochastic volatility process with parameters μ, ϕ and σ to be estimated. To setup the model, a prior distribution for parameters μ, ϕ and σ should be specified. According to [6], the level μ follows a normal distribution with mean m? and variance M?, or μ~N?m?, M??. To guarantee the persistence parameter ϕ ∈ (−1,1), (ϕ + 1)/2 follows Beta distribution with (ϕ + 1)/2 ~B(a?, b?), where a? and b? are positive parameters; the volatility of log variance σ?~B? ∙ χ?? = gamma(1/2,1/2B?), where B? is a single positive value that stands for the scaling of the transformed parameter σ?, χ?? denotes a chi-squared distribution with one degree of freedom. The posterior distributions of the desired variables are estimated through Bayesian inference via the Markov Chain Monte Carlo (MCMC) method. It should be noticed that by taking logarithms of the squared r? in Equation (27) , the SV model is transformed into a linear equation: log (r??) = h? + log (ε??) ε?~N(0,1) (30) If log (ε??) is approximated by a mixture of a normal distribution with m?? and s??? being the mean and the variance of the r?th mixture component respectively [6], the above equation reduces to the form of a conditionally Gaussian state space model: 41 r?? = m?? + h? + ϵ (31) where r?? = log (r??), and ϵ~N(0, s??? ). This linearization makes efficient MCMC sampling possible. Model reparameterization could potentially improve simulation efficiency in the volatility model. Denoting Equation (27) to (29) as the centered parameterization models, another version of the SV model is non-centered (NC) parameterization, where parameter μ is shifted from the state Equation (28) to the observation Equation (27) by setting h?? = (h? − μ)/σ. The non-centered (NC) parameterization form is given as: r?~N(0, exp (μ + σh??)) (32) h?? = ϕh???? + v? (33) where h?? = (h? − μ)/σ, v? is a Gaussian white noise sequence with mean 0 and variance 1. The choice of centered parameterization(C) or non-centered parameterization (NC) would dramatically affect the sampling efficiency. Both parameterizations have their advantages and disadvantages, which heavily depends on the true parameter values of the data generating process. There exists no ‘ultimate’ parameterization. Based on the ancillarity-sufficiency interweaving strategy (ASIS) introduced by Yu and Meng [101], Kastner and Fruhwirth-Schnatter [6] proposed a strategy by interweaving C and NC to overcome this deficiency. Their study results show that interweaving C and NC leads to a robustly efficient sampler that always outperform either parameterization (C or NC) with respect to parameter costs in terms of design 42 and computation. Their intuitive and efficient algorithm can be briefly summarized into six steps: Choose appropriate starting values and repeat the following steps: (1) Draw h from parameterization C. (2) Draw μ, ϕ, σ from parameterization C. (3) Move to non-centered parameterization NC by the simple deterministic transformation h?? = ????? . (4) Redraw μ, ϕ, σ from parameterization NC. (5) Move back to C by calculating h? = μ + σh?? for all t. (6) Draw the indicators from parameterization C. The detailed sampling steps in the ASIS involve extensive Bayesian approach with MCMC. In the following, a brief summary of the key ideas on how to estimate the parameters of the SV model is introduced. Readers are referred to [102] for further details. The following briefly explains the concept of Bayesian approach with MCMC. Consider a problem that given a set of data, the posterior distribution of P(θ|x) is desired in Bayesian analysis in order to estimate parameter θ. Bayesian analysis seeks to estimate parameter θ by combining the prior knowledge about the parameters with the data. Denote P(θ) as the specified prior distribution of θ. The posterior distribution P(θ|x) is calculated through the conditional probability: P(θ|x) = P(θ, x)P(x) = P(x|θ)P(θ)P(x) (34) where P(x|θ) denotes the likelihood function of the data for a given model, P(x) denotes the marginal distribution of x. According to Bayes’s rule: 43 P(θ|x) ∝ P(x|θ)P(θ) (35) The goal is to estimate θ through the posterior distribution P(θ|x). The Markov Chain Monte Carlo (MCMC) approach draws a sample from the posterior distribution and then calculates the estimator of θ. The Markov process is a stochastic process ?x? ? that the value of x? does not depend on the value of x? if the value of x? is given and s < t < h. ?x? ? is a Markov process if its conditional distribution function satisfies the following criterion: P(x?|x?, s ≤ t) = P(x?|x?), h > t. (36) The basic idea of MCMC method is to simulate a Markov chain that has the desired probability distribution which is P(θ|x). The ASIS algorithm estimates the model parameters in steps two and four through Bayesian inference using MCMC. 3.2.3 Prediction Interval Estimation Prediction intervals for the volatility model are estimated based on the idea of prediction intervals for regression models. To construct a prediction interval with 100(1 − α)% confidence, we assume that the error follows Gaussian distribution with zero mean and variance σ??. The prediction interval can be calculated as: (u? − z?/?σ?, u? + z?/?σ?) (37) where u? is the predicted mean, z?/? denotes the standard score corresponding to the cumulative probability level of α/2, and σ? is the prediction variance from a volatility model. As the concept of uncertainty or reliability is a relatively new area in traffic forecasting, there are few studies that provide criterions for PI assessment. One study by Khosravi et al. [79] suggested that two important aspects of PIs assessment should 44 be considered: coverage probability and length. Coverage probability measures the percentage of the targets that lie within the predicted PIs. It measures how effective the constructed prediction intervals are. The mathematical representation of PI coverage probability (PICP) is as follows: PICP = 1n?c? ? ??? (38) where c? = 1 if y? ∈ [L(x?), U(x?)]; otherwise, c? = 0. L(x?) and U(x?) represent upper and lower bounds of the prediction interval of x?, n is the total number of constructed PIs. On the other hand, another criterion called mean PI length (MPIL) measures the average length of the PIs. It measures how efficient the constructed prediction intervals are. Assume we have two models that provide PIs with the same coverage probability; the one that gives a narrower prediction band is more efficient. The following equation gives the definition of the MPIL: MPIL = 1n?(U(x?) − L(x?)) ? ??? (39) Therefore, both criteria should be considered when evaluating the volatility models. 3.3 Application of Component GARCH Models in Travel Time Prediction The performance of the GARCH, the C-GARCH, and the MC-GARCH models are investigated here by using data collected from Automatic Vehicle Identification (AVI) stations located along U.S. Highway 290 (or U.S. 290) in Houston, Texas. The entire study corridor is about five miles long and covers the Northwest Freeway in the westbound direction between I-610 and the junction of Farm to Market Road 1960 (FM1960). The IDs of the selected AVI stations are 29, 45 30, 31, 32, 33, and 34. The travel time between each pair of consecutive detectors were collected and aggregated into five-minute time intervals. Individual segment travel time at free-flow conditions is less than four minutes. The total period of the sample was the entire year of 2008 with missing data replaced by annual medians of the missing intervals. Since travel time patterns during weekdays and weekend are quite different (congestions are more likely to occur during weekdays), weekend data were excluded from the sample. As a result, 262 weekdays of travel time data that contain 75,456 five-minute observations are selected for this study. 3.3.1 Modeling Conditional Mean Traffic data often shows periodic patterns. Travel time increases and varies significantly during peak hours compared with travel time during non-peak hours. It is difficult to precisely predict traffic when congestion occurs. Point prediction methods are often unable to capture traffic variation in congested situations, therefore providing less reliable or accurate prediction. As the performance of the point prediction methods often decreases when congestion occurs, it is expected that the residual series (after removing the predicted mean by ARIMA model) show higher variations during peak hours. Figure 2 provides a boxplot of absolute deviation from the predicted mean for each 20-minute time interval (outliers have been removed from this plot). Each box statistically represents interval-to-interval and day-to-day variations of the residual series. The green line indicates the mean of the residual and the lower and upper boundary of each box is the 25th and 75th percentiles of the data for corresponding time intervals. As observed from this plot, statistics of each interval are different from 46 each other. Both the mean and the percentiles of the data are different at different time intervals. This indicates that the residual series vary over time, and the constant variance assumption of the traditional time series models is violated. This further proves that a volatility model, which relaxes the constant variance assumption, is necessary. In addition, there is a pronounced increased variation at the beginning of 15:00 hour; subsequently, the variation decreases at 19:00 hour. Comparatively, variations during other time periods (non-peak hours) are less significant. Figure 2. Box Plot of Absolute Deviation from Predicted Mean The other four studied segments also depicted similar diurnal patterns. It is clear that seasonal components exist in the residual time series. In addition, the mean 47 of the absolute deviations during non-peak hours are close to zero, which means that the ARIMA model provides adequate prediction during non-peak hours. On the other hand, the prediction performance of the ARIMA model decreases during peak hours, as both the mean and the 75 percent statistics increase. 3.3.2 Testing the ARCH Effect The basic assumption of the GARCH-type model is that square values of the residuals are correlated. Therefore, before applying different GARCH-type models to the data, there is a need to test if the data meet this assumption. Two tests are available: Ljung–Box statistics and Lagrange multiplier test [96]. This study chose the Ljung-Box statistics to test if the first lags of the squared residuals are uncorrelated. The Lijung-Box test is as follows: H?: ρ? = ρ? = ⋯ρ? = 0 Q(m) = N(N + 2)∑ ?????????? (40) where N is the number of data points under study, ρ? is the sample autocorrelation at lag h, and m is the number of lags being tested. The critical region for rejecting the null hypothesis at significance level α is: Q > χ???,?? In terms of p values, the null hypothesis will be rejected if the p value is less than α. In our study, the Lijung-Box test is applied to the residual data of all five segments. P values of the test for all studied segments are significantly less than 0.01. Therefore, the null hypothesis is rejected at the significance level of 0.01. That is to 48 say, correlations exist between squared values of residuals. The GARCH-type models are necessary. 3.3.3 Estimating the Volatility Model Similar to the ARIMA type model, estimating the volatility model also involves order selection and parameter estimation. Several studies indicate that GARCH family model with order of (1, 1) was found adequate in representing the volatility dynamics [26, 86, 89]. Therefore, GARCH, C-GARCH and MC-GARCH with order of (1, 1) were adopted for ease of implementation and comparison by using the R package ‘rugarch’[103]. Since the multiplicative component GARCH model decomposes the data into a daily component, a deterministic diurnal pattern and a stochastic intraday component, the first step is to model the daily component. This study terms the average volatility for each day as the daily component. The daily component is estimated through the standard GARCH (1, 1) process. There are 262 daily data in total; the first 242 data points are used as the training data. After removing the daily component, the deterministic diurnal part is estimated as the annual average of the residual data at each time interval. The normalized residuals (22) are then used to produce the stochastic intraday component. The volatility components estimated by the MC-GARCH model are displayed in five panels as shown in Figure 3. The top panel shows the observed values of the residual data series. The second panel gives the estimated conditional variance, being the product of the following three components: deterministic intraday (panel three), daily (panel four), and stochastic intraday (panel five) components. As indicated in this figure, the MC-GARCH model 49 is able to model the trend, seasonal, and stochastic components of the data. This feature provides better understanding of the basic structure of the data and is easy to interpret. For example, the intraday [Deterministic] components indicate the regular cyclical patterns of travel time volatility, the intraday [Stochastic] components specify daily variations due to demand variation, incident or other abnormal traffic phenomenon. 3.3.4 Construct the Mean and Prediction Intervals The final output of the proposed volatility models includes two measures: the predicted mean and the predicted PIs. The predicted mean part generally tells the expected value of travel time in the future, whereas the PIs tell how likely the observed value will lie within a certain range. In other words, wider PIs often indicate unreliable travel time and prediction. Thus, based on the combined information of predicted mean and PIs, travelers and operators would have a better sense of future traffic conditions. In this study, the ARIMA model provides mean values, and the prediction intervals are constructed according to Equation (37). Figure 4 plots some sample prediction results of the MC-GARCH model. The blue dot stands for observed travel-time data obtained at corresponding time intervals, and the red triangle stands for the predicted mean. The green lines represent the PIs constructed by the MC-GARCH model. It is obvious in this figure that there is always a mismatch between the predicted mean and the observed value. This partly results from the dynamic nature of traffic: travel time varies from time to time. Prediction intervals, on the other hand, are able to adequately capture this variation by covering most of 50 the observed values. Therefore, this model provides an effective and efficient way to measure uncertainty associated with future travel time. Figure 3. Multiplicative component GARCH forecasting results: decomposition of the volatility into its various components (32-33). 51 Figure 4 Predicted mean and PI for multiplicative component GARCH model 3.3.5 Results and Discussion PIs with different confidence levels are constructed through the GARCH, the C-GARCH and the MC-GARCH models for the five studied segments. The effectiveness and efficiency of the constructed PIs is evaluated based on the criteria of coverage probability and PI length. For each of the five studied segments, thirty days of five-minute travel-time data with 8,640 observations are used as the training 52 data set and ten days travel time data with 2880 observations are used as the comparison (testing) dataset. We estimate individual models for each of the five segments. PIs are constructed with 95 percent, 90 percent, and 85 percent confidence levels, respectively. Table 1 provides average MPIL and PICP values of the three models with different confidence levels during peak hours, nonpeak hours, and all day. During peak hours, the prediction interval coverage rates of the MC-GARCH are the highest compared to the C-GARCH and the GARCH model. For the 95 percent confidence level, the coverage rate of the MC-GARCH model (90.86 percent) is 2.86 percent higher than the second highest model (GARCH 88.00 percent). For the 90 percent confidence level, the coverage rate of the MC-GARCH model (87.35 percent) is 4.33 percent higher than the second highest model (GARCH 83.02 percent). For the 85 percent confidence level, the coverage rate of the MC-GARCH model (83.96 percent) is 5.18 percent higher than the second highest model (GARCH 78.78 percent). As the confidence level decreases, the advantage of the MC-GARCH model becomes obvious, proving that the MC-GARCH model is able to capture the volatility of traffic data during peak hours. In terms of mean prediction interval length, the C- GARCH is the smallest. However, on average, the C-GARCH model only reduced the length by 0.75 compared with the GARCH model. During nonpeak hours, the MC-GARCH model also provides the highest coverage. But the advantage of MC-GARCH model is not obvious compared with the GARCH and the C-GARCH models in terms of either MPIL or PICP during non- peak hours. This is expected as travel time is relatively stable with small variations 53 and the trend and seasonal patterns are not obvious during this period. Therefore, performance of these three models should be similar during non-peak hours. Investigating the all-day performance of these three models indicates that the MC-GARCH model provides the highest PICP value, whereas both the C-GARCH and the GARCH models give lower MPIL values compared with the MC-GARCH model. Table 1 Estimated MPIL and PICP values for GARCH, C-GARCH and MC- GARCH models Confidence Model Peak Hours Nonpeak Hours All Day Level MPIL PICP MPIL PICP MPIL PICP 95% GARCH 115.98 88.00% 44.53 95.72% 56.69 94.41% C-GARCH 115.1 87.80% 45.13 95.10% 57.04 93.86% MC- GARCH 139.74 90.86% 45.09 95.87% 61.2 95.01% 90% GARCH 97.04 83.02% 37.26 92.30% 47.43 90.72% C-GARCH 96.31 82.57% 37.77 91.42% 47.73 89.92% MC- GARCH 116.93 87.35% 37.73 93.08% 51.21 92.10% 85% GARCH 85.21 78.61% 32.72 89.07% 41.65 87.29% C-GARCH 84.56 78.78% 33.16 87.79% 41.91 86.26% MC- GARCH 102.67 83.96% 33.13 90.47% 44.96 89.36% In general, based on the estimation results, we can conclude that the MC- GARCH model tends to cover more targets compared with the C-GARCH and the GARCH model, especially during peak hours. The C-GARCH and GARCH models give a lower prediction band compared with MC-GARCH with the compromise of lower coverage rate. Since coverage rate of the C-GARCH and the GARCH model 54 are much lower than corresponding confidence level during peak hours (for example, PIs of both the GARCH and the C-GARCH model cover around 78 percent of the targets for 85 percent confidence level), the MC-GARCH model generates more effective PIs, although a little bit wider than others. To check the consistency of each model’s performance, Figure 5 and Figure 6 compare PICP values of the GARCH, the C-GARCH, and the MC-GARCH models at different confidence levels for individual segments. As shown in these figures, the orange, blue and green columns represent the GARCH, the C-GARCH and the MC- GARCH models; columns with different patterns represent different confidence levels. During peak hours (Figure 5), the MC-GARCH model generates PIs with the highest coverage for all segments. The advantage of the MC-GARCH model is significant with the highest difference of 6.94 percent compared with PIs (at the 85 percent confidence level) provided by the C-GARCH model for segment 33-34. On the other hand, the PICP values of the GARCH and the C-GARCH models are similar. The largest difference of PICP values between the GARCH and the C- GARCH is 1.63 percent. During non-peak hours (Figure 6), all three models provide high coverage rate at corresponding confidence levels. Differences among individual models during non-peak hours are not as significant as during peak hours. Comparing all three models’ performance between peak and non-peak hours suggests that peak hour coverage is relatively low, as traffic variations increase during peak hours. In addition, PI lengths during peak hours are also longer than PI lengths during non-peak hours. This is because the uncertainties during peak hours are more evident compared with non-peak hours. 55 Figure 5. Comparing performance of GARCH, C-GARCH and MC-GARCH models during peak hours Figure 6. Comparing performance of GARCH, C-GARCH and MC-GARCH during non-peak hours Figure 7 provides an intuitive comparison of one-day peak hour PIs at the 95 percent confidence level constructed by the MC-GARCH, the C-GARCH and the GARCH models. The green dash lines stand for PIs of the MC-GARCH model, the 75.00% 77.00% 79.00% 81.00% 83.00% 85.00% 87.00% 89.00% 91.00% 93.00% 95.00% 29-30 30-31 31-32 32-33 33-34 PI C P (% ) Segments GARCH (95%) C-GARCH (95%) MC-GARCH (95%) GARCH (90%) C-GARCH (90%) MC-GARCH (90%) GARCH (85%) C-GARCH (85%) MC-GARCH (85%) 80.00% 82.00% 84.00% 86.00% 88.00% 90.00% 92.00% 94.00% 96.00% 98.00% 100.00% 29-30 30-31 31-32 32-33 33-34 PI C P (% ) Segments GARCH (95%) C-GARCH (95%) MC-GARCH (95%) GARCH (90%) C-GARCH (90%) MC-GARCH (90%) GARCH (85%) C-GARCH (85%) MC-GARCH (85%) 56 yellow dash lines stand for PIs of the C-GARCH model, and the pink dash lines stand for PIs of the GARCH model. As shown in this figure, the PIs constructed by the C- GARCH and the GARCH almost overlap. It has also been depicted in Table 1 that there is no significant difference between MPIL and PICP values of the C-GARCH and the GARCH models. It seems the effect of the long-term component in the C- GARCH model is limited in this case. On the other hand, PIs of the MC-GARCH model are different from both the C-GARCH and the GARCH models. The MC- GARCH model tends to cover more targets by increasing the width of the prediction intervals at certain time intervals (Points identified by blue arrows). Overall, the C- GARCH and the GARCH models create PIs similar to each other. Compared with the C-GARCH and the GARCH models, the MC-GARCH model tends to cover more targets by increasing the length of its PIs during certain time intervals. Figure 7. Comparison of prediction intervals constructed by GARCH, C- GARCH and MC-GARCH models 57 3.3.6 Summary As uncertainty associated with travel time prediction becomes an important topic for implementing an intelligent transportation system, statistical volatility models provide a promising way to generate more accurate PIs that account for variability in travel time prediction. The traditional GARCH model is argued to be inadequate when modeling data that show pronounced seasonal patterns. This study developed the C-GARCH and the MC-GARCH models in travel time prediction. To empirically evaluate performances of the proposed models, this study tested the GARCH, the C-GARCH and the MC-GARCH models by using freeway travel time data collected from Automatic Vehicle Identification (AVI) stations located along U.S. Highway 290 (or U.S. 290) in Houston, Texas. The forecasting results of the proposed models are attractive, especially during peak hours. The findings of this study include: The proposed MC-GARCH model outperforms the GARCH and the C- GARCH models during peak hour prediction. A case study of the five selected segments highlighted the strength of the MC-GARCH model in providing more effective PIs in terms of coverage rate. The idea of decomposing travel time volatility is promising when data show cyclic patterns. By decomposing travel time volatility into daily, diurnal and stochastic components, the MC-GARCH model is able to capture uniqueness of each component and captures the seasonal effect of data. The C-GARCH model treats travel time as a long term and transitory components. It works best if there is a trend. Based on the case study, the performance of PIs constructed by the C-GARCH model and the GARCH model are 58 similar to each other. The effect of the long term volatility component in the C- GARCH model is not significant in this case. During non-peak hours, there is no obvious advantage of all three models in terms of MPIL and PICP. This is partly due to the fact that travel time during nonpeak hours is relatively stable, with small variations around the mean. The trend and seasonal patterns are not obvious during this period. Component GARCH models decompose travel time data into long term, short term and cyclical components. If there are cyclical components in the data, the MC- GARCH model has the potential to better capture uncertainties associated with travel time. In addition, the MC-GARCH model decomposes traffic volatility into several different components that can be easily interpreted and estimated. In this study, the daily component and the normalized residuals are modeled as a simple GARCH model. Besides the GARCH model, the daily component can also be estimated through multifactor risk model as suggested by Engle and Sokalska [5]. It is also worth trying different variations of GARCH models to estimate the normalized residuals. In addition, this study treats the intraday component as an average term. Further study could also explore different ways in defining the intraday component. 3.4 Application of Stochastic Volatility Model The stochastic volatility based method is investigated here by using travel time data collected with Bluetooth sensors along an 18-mile long corridor in Connecticut. The Bluetooth sensors (Figure 8) were temporarily installed by the University of Maryland team at Interstate 95 (I-95) to collect travel time information between October 19, 2012 and October 28, 2012. Bluetooth technique enables digital 59 devices interconnect with each other using short-range wireless communications. Many mobile phones, car radios or other personal devices come equipped with Bluetooth wireless capability to communicate with other Bluetooth-enabled devices anywhere from 1 m to about 100 m (300 ft). In the context of travel time collection, the Bluetooth detector captures the electronic identifier, or tag, called Machine Access Control (MAC) address, in each Bluetooth enabled device and places a timestamp when the vehicle enters the detection range of the sensor. As the same vehicle passes subsequent detectors, the detected MAC can be matched allowing the calculation of travel time between these two locations. The Bluetooth detectors require at least two detectors to obtain travel time information. The Bluetooth detector has the advantage of providing more accurate traffic data with relatively low- cost installation [104]. Figure 8. Bluetooth sensor location of the study 60 Raw data from Bluetooth sensors are MAC IDs of the detected Bluetooth devices along with their detection time stored in a removable memory card. Sample travel time for a particular freeway stretch is obtained by matching the MAC ID between two Bluetooth sensors located at the endpoint of the freeway stretch. Figure 8 depicts the location of the selected Bluetooth sensors named as “T, N, S, L, F, G, H, I, A”. This study selects six segments as indicated in Table 2. Each path is comprised of a head and a rear sensor. Individual vehicle travel time for each path is the time it takes for the same vehicle to be detected by both the head and rear sensors. Raw data are filtered through a four-step offline filtering algorithm proposed by Haghani et al. [104] to extract every five-minute ground truth travel time. Because traffic patterns during weekdays are significantly different from weekends, this study only focused on weekday travel time patterns. We separate travel time data for each pair of detectors under study as training dataset and testing dataset: the training dataset is every five minutes travel time from October 22, 2012 to October 25, 2012 with 1152 observations; the testing dataset is data obtained on October 26, 2012 with 288 observations. Figure plots one week (excludes weekend) aggregated travel time over every five-minute time interval on four segments. The scatter plots (path one, two, five, six) illustrate considerable variations in travel time at each time interval over different days, especially during peak hours. Rush-hour traffic normally occurs between 3 pm and 8 pm on segments one and two, and between 6 am and 10 am on segments five and six. Travel time variations during non-peak and peak hours show different patterns. Variations of travel time during non-peak hours are much smaller than that 61 during peak hours. The considerable variations of travel time across different times of the day can be attributed to several factors: demand variations over different times and days, variations of driver behavior under different kind of weather conditions, and incidents that disrupt normal traffic. These exogenous factors are often unpredictable, which makes travel time prediction a complex problem. Therefore, it is critical to treat traffic phenomena as a stochastic process. Table 2. Selected segments for this study Path ID Head Sensor Rear Sensor Distance (mile) Starting at Ending at Standard Measured 1 T N 2.7 2.69 Fairfield Ave/Exit 14 CT-33/CT-136/Exit 17 2 N S 5.8 5.94 CT-33/CT-136/Exit 17 Bronson Rd/Exit 20 3 A I 1.5 1.51 Broad St/Exit 32 Surf Ave/Exit 30 4 I H 2.4 2.36 Surf Ave/Exit 30 CT-25/CT-8/Exit 27 5 H G 1.9 1.97 CT-25/CT-8/Exit 27 Fairfield Ave/State St/Exit 25 6 F L 1.7 1.82 US-1/Exit 23 Bronson Rd/Exit 20 62 Figure 9. A scatter plot of travel times on four paths. 63 Figure 9 (Cont’d). A scatter plot of travel times on four paths. 64 3.4.1 Model Fitting Both the GARCH and the SV models are tested to fit the second part of Equation 1. As the process to estimate the GARCH model has been already described in our previous study [88], the following will focus on the estimation of the SV model. To perform Bayesian inference in the SV model, there is a need to specify parameters of the prior distribution of ?, ? and ?. As it already mentioned that ? follows Gaussian distribution, ? follows Beta distribution and ?? follows ?? ∙ ???. , we need to specify five parameters: mean ?? and standard deviation ?? of the normal distribution for ?; ?? and ?? of the beta distribution for (? + 1)/2; and ?? for the scaling of the transformed parameter ??.After specifying parameters of the prior distributions, the SV model is ready to be estimated by applying the MCMC method. The estimation is performed by using 5000 MCMC draws after a burn-in of 100 for each data set (burn-in means throwing away some iteration at the beginning of an MCMC run). The Bayesian inference using MCMC based on ancillarity-sufficiency interweaving strategy (ASIS) is implemented through the statistical software R package developed by Gregor & Sylvia [6]. To obtain the estimated prediction intervals, we sample 5000 random variables from a normal distribution with mean zero and variance one to obtain ?? in Equation 4. According to Equation 4, 5000 samplings of ?? could be obtained. PIs with confidence level of (1 − ?)100% can be derived by taking ?/2 and (1 − ?/2) percentile of ??. Figure shows travel time prediction results of the ARIMA-SV model for four segments during peak hours. Both predicted mean and PIs with a confidence level of 65 95% have been provided. The ARIMA predicted value is marked as a red triangle lying at the center of each prediction interval (green vertical line segment). This plot indicates that there is always a mismatch between the observed and the predicted value, which is denoted as the prediction error. On the other hand, PIs constructed by the ARIMA-SV model cover most of the observed values. As indicated in this figure, most of the blue dots (the observed values) lie within the green vertical line segments (the PIs). Depending on the level of uncertainty in the data, the length of the PI for each time interval varies. Wider PI indicates higher uncertainty about the predicted travel time, while narrower PI indicates lower uncertainty. Providing PIs to travelers assist them to schedule their trips with more confidence. 3.4.2 Results and Analysis To assess the effectiveness of the ARIMA-GARCH and the ARIMA-SV model, we compare these models for the six studied segments at two different time intervals: five and fifteen minute aggregation time intervals. Performance measures in terms of MAPE, RMSE, PICP and MPIL are summarized. Table 3 provides performance measure of the mean equation for the six studied segments. Since both models use the same ARIMA model to predict the mean value of travel time for each segment, each segment has only one MAPE and one RMSE value at each aggregation level. The tabulated results show that ARIMA model provides adequate prediction of future travel time in turns of MAPE, as it ranges from 3.45% to 4.47% at five minute time intervals, and ranges from 1.97% to 6.68% at fifteen minute time intervals. On the other hand, the RMSE value ranges from 3.79 up to 22.94 indicating that the absolute error is proportional to the variation of the data. 66 Figure 10. Prediction results for peak hour travel time at four segments. 67 Figure 10 (Cont’d). Prediction results for peak hour travel time at four segments. In Table 3, RMSE values for path six (8.41 at five minute time interval and 11.93 at fifteen minute time interval) are the smallest among these four segments, 68 while RMSE values for path two (22.94 at five minute time interval and 34.42 at fifteen minute time interval) are among the largest. The ARIMA estimations for path three and four are the best in terms of both MAPE and RMSE values. This is because travel times on these two segments remain at a certain level with slight variations during the studied time period, which proves that ARIMA model performs better when the data are relatively stable. In contrast, the forecasting accuracy of ARIMA model will decreases as the variation of the data increases. Table 3 Performance measures of the mean equation Path ID Time Interval 1 2 3 4 5 6 Average Value MAPE 5 min 4.47% 4.10% 3.45% 3.77% 4.72% 4.54% 4.18% 15 min 6.68% 4.55% 1.97% 2.75% 4.79% 4.11% 4.14% RMSE 5 min 19.67 22.94 3.79 6.82 11.82 8.41 12.24 15 min 41.34 34.42 2.14 5.01 18.45 11.93 18.88 PICP and MPIL measure the prediction accuracy of the variance part. PICP measures the coverage probabilities of the prediction intervals. In our study, the prediction intervals for both ARIMA-GARCH and ARIMA-SV models are set with 95% confidence. Figure 11 compares PICP and MPIL values of both models at two different aggregation levels: five minute and fifteen minute. ARIMA-SV model provides higher coverage in most cases. At both five minute and fifteen minute aggregation levels, only in one out of five cases, the PICP value of the ARIMA-SV model is less than that of the ARIMA-GARCH model. The PICP value of the ARIMA-SV model ranges from 93.75% to 98.96% at five minute aggregation time 69 interval, and ranges from 95.83% to 97.92% at fifteen minute aggregation time interval. The ARIMA-SV model is capable to construct accurate PIs with predefined confidence level (95% in our study). Compared with the ARIMA-GARCH model, the ARIMA-SV model, in most cases, outperforms the ARIMA-GARCH model in terms of PCIP measure. Regarding the MPIL value, which measures the width of the prediction interval, both models give similar performance. Comparing the MPIL values of both models for different segments indicates MPIL for segments 3 and 4 have the smallest value. This result demonstrates that the width of the prediction interval depends on the variation of traffic. As travel time for path three and four are relatively stable with minor variations, the MPIL for both models is low. Taking five minute aggregation time interval for example, the MPIL values for path three are 14.62 (ARIMA- GARCH) and 9.85 (ARIMA-SV) respectively, and 29.75 (ARIMA-GARCH) and 30.76 (ARIMA-SV) for path four. The MPIL value for segment two is the highest with 87.55 for the ARIMA-SV model. This is because the travel time varies the most on this path. It can be concluded that length of the constructed PI is sensitive to the variation of travel time on the studied segments, with higher variation leading to wider prediction interval, and vice versa. Since the width of prediction intervals provided by the ARIMA-SV and the ARIMA-GARCH models are similar and the ARIMA-SV model provides relatively higher coverage, the ARIMA-SV model outperforms the ARIMA-GARCH model in general. 70 Figure 11. Comparison of performance measures for six dataset by using ARIMA-GARCH and ARIMA-SV model with (a) 5 minute time interval (b) 15 minute time interval 3.4.3 Summary This study introduced an advanced stochastic volatility model to construct a prediction interval for each prediction point to capture this uncertainty. The proposed method was tested by using travel time data collected from Bluetooth sensors located (b) (a) 71 along a freeway corridor in Connecticut. The proposed ARIMA-SV model was compared with the more widely used ARIMA-GARCH model. Different from the GARCH type models that assume deterministic nature of traffic volatility, the SV model considers this volatility as a non-deterministic process by specifying the variance follows some latent stochastic process. An advanced Monte Carlo Markov Chain estimation method for stochastic volatility model was applied to travel time reliability forecasting. The empirical experiment showed that the ARIMA-SV model outperforms the ARIMA-GARCH model in terms of coverage probability and the length of PIs, as both models construct PIs that cover most of the observed value and the ARIMA-SV model tends to provide narrower PIs. In addition, by comparing the constructed PIs across different segments and at different types of aggregation time interval, it is revealed that the length of the constructed PI is sensitive to the variation of travel time. Higher variation leads to wider prediction interval, and lower variation is associated with narrower prediction interval. The width of the prediction interval indicates the variations of the prediction results and therefore provides a measure for the reliability of predicted travel time. In summary, the proposed ARIMA-SV method shows its advantage in constructing more accurate prediction intervals. It accurately and effectively covers most of the observed values of travel time compared with the ARIMA-GARCH model. From a practical point of view, the proposed ARIMA-SV model provides not only a mean but also an upper and lower bound of future travel time, therefore capturing the uncertainty associated with prediction. As the mean value is unable to provide information regarding the variability of future traffic, the prediction interval 72 can provide an upper and lower bounds that capture future travel time with predetermined confidence level. Therefore, the proposed model can be used in real time to provide more reliable and informative future traffic information for travelers. The ARIMA-SV model can be regarded as a promising algorithm to disseminate traffic information to travelers through traveler information systems to provide guidance for pre-trip planning as well as en-route navigation. Further research includes studying trend and seasonal patterns in the residuals series and to comprehensively evaluate different volatility model applications in travel time prediction field 73 Chapter 4: Ensemble Methods in Travel Time Prediction Chapter 3 discussed different volatility models in capturing the uncertainty associated with travel time prediction. As discussed in equation (1), the travel time prediction can be regarded as a mean model plus a volatility model. In Chapter 3, the ARIMA model is used to predict the mean of travel time. To improve the mean model prediction accuracy, in this chapter a new ensemble based travel time prediction algorithm is proposed. In recent years, ensemble based algorithms reached a celebrity status in solving prediction and classification problems. They have been applied to different fields and have achieved great success [105]. The $1 Million Netflix Prize competition is a famous example: the winning team ensembles different algorithms to predict user rating for films, which produces the best accuracy among all participates [106]. Within all different ensemble methods, the tree-based ensemble method is a popular one. Instead of fitting a single “best” model, the tree-based ensemble method strategically combines multiple simple tree models to optimize predictive performance. Drawing on insights and techniques from both statistical and machine learning methods [107], the tree-based ensemble method not only achieves strong predictive performance, but also identifies and interprets relevant variables and interactions. Interpretability of the tree-based ensemble model enables transportation decision makers to better understand the output of the model and is critical in analyzing relations between traffic and their influential factors. In addition, the tree- based ensemble method can handle different types of predictor variables, requires little data preprocessing, and can fit complex nonlinear relationship[107]. These 74 properties make the tree-based ensemble methods good candidates in solving transportation problems, such as traffic prediction and incident classification. However, there are limited studies on the application of tree-based ensemble methods in transportation field. To the best of our knowledge, research on gradient boosting tree in travel time prediction has not been fully documented to date. This section first introduces the theoretical background of ensemble methods and then develops different ensemble models in predicting travel time. Within these methods, a tree-based ensemble method is proposed to predict travel time on a freeway stretch by considering all relevant variables derived from historical travel time data. Belonging to the machine learning category, the tree-based ensemble methods often have superior prediction performance over traditional prediction methods. Driven by the successful application of random forest in traffic parameter prediction, a gradient boosting tree-based travel time prediction method is proposed to uncover hidden patterns in travel time data to enhance the accuracy and interpretability of the model. Different from the random forest algorithm that averages a large collection of trees from random sampling [7], the gradient boosting method sequentially generates base learners from a weighted version of the training data to strategically find the optimal combination of trees. Each step of adding another base learner is aimed at correcting the mistakes made by its previous learners. Therefore, the gradient boosting method has the potential to provide more accurate predictions. The following sections will first discuss two common ensemble techniques: bagging and boosting. And then introduces the single regression tree model that is 75 used as a base learner for ensemble tree and two different ensemble tree models: random forest regression and gradient boosted regression tree. The last section applies the random forest and gradient boosted regression tree in travel time prediction. 4.1 Common Types of Ensembles The ensemble based algorithms consist of multiple base models (such as decision trees, neural networks), and each base model provides an alternative solution to the problem, whose predictions are combined in some way (typically by weighted or unweighted voting or averaging) to produce the final model output. Combining predictions of a group of individual base models often generates more stable and accurate prediction than the one provided by any of the individual base models included in the ensemble. The essential idea behind the ensemble methods is often used in our daily lives. We usually seek others’ opinions when making decisions. Through weighted combination of these ideas, we can make more informed decisions. The success of ensemble methods largely depends on diversity of base models that compose the ensemble. Combining results from several base models is useful only if individual models provide different output, or in other words, they disagree with each other on some inputs. Ensemble methods reduce total error through correcting mistakes made by individual models. There is no advantage of combining models that make similar mistakes. Strategically combining individual base models that make different errors (or mistakes) can reduce the total error of the model. Diversity of individual models can be achieved through using different training datasets or using different training parameters for individual models. Different ways to train and combine a number of base learners lead to diverse ensemble algorithms. Bagging and 76 boosting are two popular ensemble techniques that utilize different re-sampling methods to create diverse training data for obtaining different base models. 4.1.1 Bagging Bagging, or Bootstrap aggregating, was proposed by Leo Breiman in 1994 [108] to improve prediction accuracy. It is one of the earliest and most intuitive ensemble algorithms with good performance. It belongs to parallel ensemble methods. One advantage of the bagging method is that its theoretical foundation support parallel computing, thus its training speed can be accelerated through parallel computing. The basic idea of the parallel ensemble methods is to reduce the error by combining prediction results from independent base models. Though it is practically difficult to generate independent base models, as they are trained from the same training data, dependency can be reduced by introducing randomness during model training process. Another vital element for the success of the bagging method is the instability of the individual prediction method. If a diverse set of base models is generated from perturbed training set, then bagging can improve prediction accuracy. To obtain diverse base models from similar training data set, the base learner should be weak. For example, decision tree is one of the most popular weak learners in ensemble method. There are two key steps in bagging: bootstrap sampling and aggregation. Bootstrap techniques were originally developed to estimate sampling distribution of an estimator from limited data by sampling with replacement from the original data. In recent development of ensemble techniques, bootstrap techniques 77 have also been used to generate diverse subset of data for training base models. For a given training data set with sample size ?, bagging generates ? new training sets, each with sample size ?, by sampling from the original training data set uniformly and with replacement. By sampling with replacement, some observations appear more than once in the bootstrap sample, other observations may not appear in the sample. The ? basic models are trained using the newly generated ? training sets and combined through averaging (regression problem) or majority voting (classification problem). Figure 12 illustrates basic steps for the bagging process. Given a data set with total number of data sample ? and ? pair of input and output variables. Determine the total number of base model ? as ?: For ? = 1 ?? ? do Draw a random sample ?∗ of size ? with replacement from the training data. Grow a base model ??(?) using the training sample ?∗. Output the constructed base model ??(?). End; Output the prediction of the ensemble trees for a given new input ?: ??∑ ??(?)???? ; Figure 12 The Bagging algorithm 78 4.1.2 Boosting Different from bagging, boosting method generates base learners sequentially. Therefore, it belongs to sequential ensemble methods. The basic idea of sequential methods is to explore the dependencies between each learner. Prediction accuracy is improved through developing multiple models in sequence by putting emphasis on these training cases that are difficult to estimate. In boosting training process, examples that are difficult to estimate by the previous learners appear more often in the training data set than examples that are correctly estimated. The boosting method attempts to develop base learners that are able to correct the mistakes made by previous learners. The birth of the boosting method is from the answer [109] to Kearns' question [110]: Is a set of weak leaners equivalent to a single strong learner? A weak leaner is an algorithm that performs only slightly better than random guessing; a strong learner is more accurate prediction or classification algorithm that is arbitrarily well correlated with the problem. The answer to this question is important. It is often easier to estimate a weak learner compared with a strong learner. Schapire [110] proves that the answer is positive by applying boosting algorithms (Figure 13) to combine many weak learners into a single and high accurate learner. The major difference between Bagging and Boosting methods is that Boosting method strategically resamples the training data to provide the most informative information for each consecutive model. The adjusted distribution during each step of creating a new base model is based on the error produced by the previous models. Unlike bagging method that each sample is uniformly selected to produce the training 79 dataset, the probability of selecting individual example is not equal for the Boosting algorithm. Samples that are misclassified or incorrectly estimated have more chances to be selected with a higher weight. Therefore, each newly created model focuses more on the samples that have been misclassified by its previous models. Given a data sample distribution ? and ? pair of input and output variables. Determine the total number of base model ? as ?: Define the initial training sample distribution as ?? = ? For ? = 1 ?? ? do Train a base model ??(?) from the training sample distribution ??. Compute the error of the learner. Adjust the distribution ??. to ????. to make the mistake of the learner more evident. Output the constructed base model ??(?). End; Output the prediction of the ensemble trees for a given new input ?: ??∑ ??(?)???? ; Figure 13. The Boosting algorithm 4.2 Ensemble Tree The success of ensemble methods essentially depends on diversity of base models that compose the ensemble. The combination of the results of several base 80 models is useful only if individual models provide different output, or in other words, they disagree with each other on some inputs. There is no benefit to combine models that make similar mistakes. Strategic combination of individual base models that produce different errors can reduce the total error of the model. Ensemble method is most effective if each model’s output is independent or negatively correlated. Diversity of individual models can be achieved through using different training dataset or using different training parameters for individual models. The previously discussed bagging and boosting based ensemble methods utilize different re-sampling techniques to create diverse training data, therefore obtain different base models. In order to produce diverse base models despite similar training dataset, the base models are often forced to be weak. Therefore, perturbing the training data can generate different model outputs. Trees are commonly used as base learners for ensembles since they can be sensitive to small perturbations in the training data and a light change can lead to different regression trees. This unique property makes them good candidates for ensemble. In addition, trees are fast and easy algorithms, which reduce computation time and complexity. Both bagging and boosting based ensemble methods can use trees as base learners. Tree based ensemble methods build a large number of de- correlated trees and then average the results from individual trees. The benefit of using ensemble tree is that through averaging, the variance can be reduced. Details will be illustrated in the later sections. In general, there are three successful tree based ensemble methods: Bagged tree, random forest, and boosted tree. The following 81 paragraphs will briefly explain theoretical background of a single regression tree and then illustrates how to construct different ensemble trees. 4.2.1 Single Regression Tree A single tree model partitions the feature space into a set of regions and fit a simple model for each region, for example fit a constant for each region. For simplicity, consider a regression problem with response variable ? and two independent variables ?? and ??. We first split the space into two regions and model the response Y (mean of Y) individually in each region. Then we continue to split each individual region into two more regions and continue the process until some stopping rules are met. The upper panel of Figure 14 partitions the feature space into five regions ???, ??, ??, ??, ??? according to two variables ?? and ?? using four split- points ??, ??, ??, ?? . During each partition process, the best fit is achieved through the selection of variables and a split-point. The lower panel of figure is a binary tree representation of the same model. We now consider a generalized version of the above example: a regression problem consists of p inputs with one response variable. For example, we have N observations, each observation consists of (??, ???, ???…??? , … ???) for ? = 1,2, … , ?, ? = 1,2, … ?. The feature space is partitioned into ? regions ??, ??, … , ??. The regression tree needs to automatically select the splitting variable and split-point and calculate the response for each region. Often the response for each region is treated as a constant ??. If the optimization criterion is to minimize the sum of squares, the best ?? is the just the average of ?? in region ??. To decide the best splitting variable and the split-point, a greedy algorithm is implemented. For each splitting variable, the 82 best split-point can be determined by scanning all possible values, which can be done quickly. By scanning through all input variables, finding the best pair of splitting variable and split-points is feasible. A single regression tree is the basic learner for bagged regression tree, random forest and gradient boosting regression tree methods. Figure 14. Single regression tree ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? (a) ??< ?? ??< ?? ??< ?? ??< ?? ?? ?? ?? ?? ?? (b) 83 4.2.3 Random Forest Regression Random forest was developed by Leo Breiman [7] in 2001. It combines two powerful machine-learning techniques: Breiman's "bagging" idea [108] and the random features selection introduced by Ho [111, 112] and Amit and Geman [113]. In bagging or bootstrap aggregation, each individual based model is trained on the bootstrap sample from the training data. The bootstrap techniques were originally developed to estimate sampling distribution of an estimator from limited data by sampling with replacement from the original data. In recent development of ensemble techniques, bootstrap has been used to generate a diverse subset of data for training base models. For a given training data set with sample size n, bagging generates k new training sets, each with sample size n, by sampling from the original training data set uniformly and with replacement. Through sampling with replacement, some observations appear more than once in the bootstrap sample, while other observations will be ‘left out’ of the sample. Then, k base models are trained using the newly generated k training set and combined through averaging (regression problem) or majority voting (classification problem). Using a tree as the base learner, we can obtain perhaps the simplest and easiest ensemble tree, the bagged tree. Each tree in the ensemble is grown on data samples that were randomly drawn with replacement from the original data. The success of the bagged regression tree depends on if diverse trees are generated from different bootstrapped training dataset. However, with lots of data, we usually learn the same regression tree. Averaging output of these trees does not improve prediction accuracy. 84 Given a data set with total number of data sample ? and ? input variables. Initially determine the total number (?) of trees to be generated and the number ? < ? of variables used for each individual tree: For ? = 1 ?? ? do Draw a random sample ?∗ of size ? with replacement from the training data (This is also referred as bootstrap sample. This sample will be the training data to grow the tree.); Grow a random forest tree ?? using the training sample ?∗ through the following loop: Do until (the minimum node size ???? is reached) For the terminal node of the tree; Randomly select ? variables out of the ? variables; Select the best pair of split variable/point among the ? variables; Split the node into two daughter nodes; End; Output the constructed tree ??(?); End; Output the prediction of the ensemble trees for a given new input ?: ??∑ ??(?)???? ; Figure 15. Pseudo-code for random forest 85 Random forest is a further development of the bagged regression tree. It is still based on the bootstrapped sampling to grow individual trees. Instead of using all features, it only allows a random subset of features at each splitting node of the tree. Therefore, it enforces diversity between base learners. Figure 15 illustrates the basic steps for random forest. Random forest improves forecasting accuracy through variance reduction by averaging many noisy but approximately unbiased trees. According to [114], the variance of a random forest with total number (K) of trees is: ρσ? + 1 − ρK σ? (41) Where σ? indicates the variance of individual tree, ρ denotes correlation between the trees, and K is the total number of trees in the ensemble. It is obvious that by increasing the total number of trees K, the second term tends to be zero. Therefore, the variance of a random forest depends on three things: (1) The correlation ρ between any pair of trees: Decreasing the correlation decreases the total variance. This can be achieved by: randomly selecting m out of the p variables to split at each splitting node when growing a tree on a bootstrapped dataset. Reducing m, reduces both the correlation between trees and the strength of individual tree, and vice versa. Therefore, there is a need to find the optimal value of m for certain dataset. (2) The variance ?? of each individual tree, or in other words, the strength of each individual tree: Strengthening the performance of each individual tree can decrease the total variance of the model. 86 (3) The total number of trees K: The second term of the equation can be reduced by increasing K. Therefore, we should train adequate number of trees to make sure the second term of the equation goes to zero. In general, random forest is based on the idea of bagging but enforces diversity of each individual tree through random feature selection. The theoretical background of random forest supports parallel computing therefore its training speed can be accelerated through parallel computing. The prediction performance of the random forest is influenced mainly by three factors: correlation between individual trees, the performance of each tree and the total number of trees. 4.2.4 Gradient Boosted Regression Tree Typically, a boosted method is to fit multiple base models that minimize a certain loss function averaged over the training data, such as squared-error or absolute error. The loss function measures the amount the predicted value deviates from the actual value. One of the approximate solutions to this problem is by using forward stagewise modeling approach. The forward stagewise approach sequentially adds new base models without changing parameters and coefficients of models that have already been added. The gradient boosted regression tree takes advantage of tree based methods and boosting. It can handle different types of input variables, is insensitive to outliers, can model complex nonlinear relationships, and can automatically handle interaction effects between variables. In addition, fitting multiple trees improves the prediction performance compared with single tree. Typically, the boosted regression tree performs better than the traditional prediction method. 87 In terms of regression problem, the boosted method is a form of “functional gradient decent”. It is an optimization technique that minimizes a certain loss function by adding a base model, at each step, that best reduces the loss function. The first step of the gradient boosted regression tree method is to fit an optimal constant model, a single terminal node tree. The pseudo code for the generic gradient boosting method is shown in Figure 16 [114, 115]. Friedman proposes a modification to the gradient boosting method that uses a regression tree of fixed size as the base learner. The modified version improves the quality of fit of each base learner [116]. Assume that the number of leaves for each tree is ?. The tree partitions the input space into ? disjoint regions ?ᮟ , ?Ჟ , … , ??? and predict a constant value ??? in each region ???. The regression tree can be formally expressed as: ??(?) =? ????(? ∈ ???)???? (42) where ??? ∈ ???? = ?1, ?? ? ∈ ??? 0, ??ℎ?????? Using the regression tree to replace ??(??) in the generic gradient boosting method, the model updating equation and gradient descent step size: ??(?) = ????(?) + ????(?) (43) ?? = ???????? ?(?? , ????(??) + ???(??) )???? (44) become ??(?) = ????(?) +? ??????(? ∈ ???)???? (45) 88 ?? = ???????? ?(?? , ????(??) +? ?????(? ∈ ???)???? ) ? ??? (46) Initialize ??(?) to be a constant, ??(?) = ??????? ∑ ?(?? , ?)???? . For ? = 1 ?? ? do For ? = 1 to ? do Compute the negative gradient ??? = −?????? , ?(??)???(??) ??????? End; Fit a regression tree ??(?) to predict the targets ??? from covariates ?? for all training data. Compute a gradient descent step size as ?? = ???????? ?(?? , ????(??) + ???(??) )???? Update the model as ??(?) = ????(?) + ????(?) End; Output the model ??(?) Figure 16. Pseudo-code for generic gradient boosting Using a separate optimal ???for each of the tree’s regions ???, ??? could be discarded. The model updating rule becomes: 89 ??(?) = ????(?) +? ????(? ∈ ???)???? (47) ??? = ??????? ? ?(?? , ????(??) +? ???? ∈ ???????? )??∈??? (48) The gradient boosting regression tree builds the model in a stage-wise fashion and updates the model by minimizing the expected value of certain loss function. With many trees added to the model, the fitted model may achieve an arbitrarily small training error. However, fitting the model too closely to the training data can lead to poor generalization ability. By increasing the number of iterations, the model becomes complex and minor fluctuations in data will be exaggerated. This leads to poor prediction performance on unseen data (testing data). It is necessary to determine the optimal number of iterations (or the number of trees) M to minimize future risks associated with prediction. Over-fitting can be prevented through controlling the number of gradient boosting iterations, or more effectively, scaling the contribution of each tree by a factor of J ∈ (0,1]. This implies changing the model updating equation (47) to f?(x) = f???(x) + J ∙? ρ??I(x ∈ R??)???? (49) Parameter J, referred to as learning rate, controls the contribution of each base model by shrinking its contribution with a factor 0 < J ≤ 1. There is a tradeoff between the number of iterations and the learning rate. With the same number of iterations, a smaller value of learning rate tends to lead to a higher training risk. Smaller value of J requires a larger number of M to obtain the same training risk. In general, a small J (J < 0.1) with a large M is preferable. 90 Another parameter, tree complexity, also influences performance of the algorithm. Tree complexity refers to the number of nodes in a tree. The optimal size of each tree can be estimated separately when building the ensembles. By simply assuming each tree is the last one in the model, we usually expect large trees, especially during the early iterations. This is a poor assumption and potentially degrades the model performance and increases computation complexity. One simple solution is to restrict all trees to be the same size C. Therefore, for the entire process, we only need to determine one value of C to optimally estimate the data. Gains from increased C are greater with larger data sets. As large data sets provide more detailed information about the problem, increasing the value of C would capture complex variable interactions in data. The tree size C constrains the interaction level of each model. Namely, C − 1 is the maximum level of interaction effects for a tree with size C. Therefore, the size of the trees reflects the maximum depth of variable interactions. The GBM model strategically adds each base model to minimize a certain loss function. It uses a stage-wise sampling strategy, which put more emphasis on samples that are difficult to be estimated. This distinguishes itself from random forest that trains each base model from random sampling with replacement and equal probability. Performance of the GBM model is influenced by the number of trees, learning rate and variable interactions. Optimal performance of the model can be achieved through carefully selecting the best combination of these parameters. 4.3 Application to Travel Time Prediction Accurate travel time prediction relies on how much information we could extract from available data. As traffic is often a complex phenomenon that involves 91 non-linear and chaos characteristics, it is often difficult to use an exact equation to represent this phenomenon. Data driven approach becomes a promising area in modeling and predicting traffic. In recent years, tree-based ensemble methods have shown promising results in prediction field. Developing tree-based ensemble method in travel time prediction can potentially improve prediction accuracy. This section discusses in details on how to apply the GBM model in travel time prediction. 4.3.1 Data Description and Preparation Real-word travel time data provided by a private-sector company, INRIX, are used for this study. INRIX derives travel times from its smart driver network, which aggregates traffic data from probe vehicles and traditional sensor sources. Probe vehicles utilized include: taxis, airport shuttles, service delivery vans, long-haul trucks, consumer vehicles, and GPS enabled consumer smartphones and so on. Traffic sensors range from inductive-loop detectors, radar sensors, to toll tag readers. The data fusion methods are proprietary and travel times are reported on TMC segments. This study utilized travel time data from five TMC segments located along I-95 southbound in Maryland. Table 4 shows location information for the five selected TMCs, which includes corresponding TMC code, start and end location, and length of each segment. Travel time data observed in 2012 were downloaded from the Regional Integrated Transportation Information System (RITIS) website[117], and was aggregated into every five minutes time interval. The quality of the data is excellent with less than a 1% missing rate; 561 out of 105408 observations are missing for most segments. Given the small amount of missing values, this study simply replaced the missing values with the mean of its closest surrounding values. 92 Table 4 Selected Freeway Segments for the Study Section TMC Start End Miles Latitude Longitude Latitude Longitude I 110-04421 39.218482 -76.726905 39.200843 -76.760999 2.2 II 110N04421 39.200843 -76.760999 39.192756 -76.771665 0.8 III 110-04420 39.192756 -76.771665 39.182237 -76.78368 0.97 IV 110N04420 39.182237 -76.78368 39.175368 -76.794578 0.75 V 110N04419 39.160086 -76.823761 39.156238 -76.834836 0.66 Table 5 summarizes the basic statistics of collected travel time data in 2012, including: mean value, standard deviation (SD), the 25th, 50th, 75th and 95th percentiles of travel time, minimum (min), and maximum (max) observations. To prepare the input data for the model, we considered all possible variables that are relevant to future travel time. This led to ten input variables for the prediction model: three most recent travel time observations, three most recent trends of travel time (travel time growth rate over two consecutive time steps), time, day, week and month of the observation. Table 6 is an example of the data set (training and testing set) used for our study. The first ten columns are the input variables and the last column is the corresponding output of the model. The output of the model is travel time at time lag ? denoted as TT? . The ten variables that are used as input to predict travel time at time step ? are as follows: TT???, TT???,TT??? are three most recent travel time observations at time steps ? − 1, ? − 2, and ? − 3, ∆TT??? = TT??? − TT??? is the growth rate over two consecutive time steps ? − 1 and ? − 2, time of day is represented by every five minute time step indexed from 1 to 288, week is indexed 93 from 0 to 6 to represent from Sunday to Saturday, day is the day when the vehicle is detected (from 1 to 31), and month is the month information for the observation (from 1 to 12). Table 5 Basic Statistics of Travel Time Data Section Mean SD 25th 50th 75th 95th Min Max I 2.01 0.52 1.92 1.96 2.01 2.17 1.77 26.41 II 0.73 0.19 0.69 0.71 0.73 0.78 0.65 9.66 III 0.89 0.21 0.85 0.87 0.89 1.00 0.78 22.42 IV 0.70 0.24 0.66 0.67 0.69 0.74 0.61 9.06 V 0.60 0.15 0.58 0.59 0.61 0.63 0.53 7.90 Table 6 Example of the Training/Testing Data File ????? ????? ????? ∆????? ∆????? ∆????? Time of day Day Week Month ??? 1.88 1.86 1.94 0.02 -0.08 -0.07 283 1 0 1 1.88 1.88 1.88 1.86 0 0.02 -0.08 284 1 0 1 1.87 1.87 1.88 1.88 -0.01 0 0.02 285 1 0 1 1.89 1.89 1.87 1.88 0.02 -0.01 0 286 1 0 1 1.9 1.9 1.89 1.87 0.01 0.02 -0.01 287 1 0 1 1.93 1.93 1.9 1.89 0.03 0.01 0.02 288 1 0 1 2.01 2.01 1.93 1.9 0.08 0.03 0.01 1 2 1 1 2.01 2.01 2.01 1.93 0 0.08 0.03 2 2 1 1 2.01 2.01 2.01 2.01 0 0 0.08 3 2 1 1 1.81 1.81 2.01 2.01 -0.2 0 0 4 2 1 1 1.77 1.77 1.81 2.01 -0.04 -0.2 0 5 2 1 1 1.99 … … … … … … … … … … … 94 4.3.2 Model Optimization To optimize the model, it is critical to know the effect of different combinations of parameters on the model’s performance. Based on this information, we can then select the optimal parameters to achieve a lower prediction error. This section demonstrates how performance varies with different choices of parameters (number of trees ?, learning rate ? and interaction ?) by using two months’ traffic data as training data and the following seven days’ data as testing data. Using traffic data from freeway segment one, we fitted GBM models with various numbers of trees (1 - 8000), learning rates (0.5-0.0005) and variable interactions (1-4). Figure, Figure 18, and Figure 19 show the influence of different parameters (?, ?,and ?) on the prediction errors. In these plots, we use mean absolute percentage error (MAPE) to represent prediction error. To study the effect of parameter ? on prediction accuracy, Figure plots the relationship MAPE and ? (with different value of ? and ?). The parameter ? indicates how many base models are included in the ensemble. In terms of estimation, arbitrary accuracy can be achieved through increasing ?. But with too many trees, over-fitting may occur, which affects prediction performance on ‘unseen data’ (samples not included in the training data set). In Figure (a), MAPE value decreases as ? increases with ? = 1. The slopes of the lines are different for different value of ?. The line with ? = 0.0005 has smooth slope as the contribution of each additional tree becomes limited with a small learning rate. On the other hand, a higher learning rate, such as ? = 0.5, can be too fast so that it reaches its minimum error with ? = 600. To continue increasing ? will increase prediction error, where over- fitting occurs. This effect becomes even more obvious if we allow more variable 95 interactions, which makes the individual tree more complex. As shown in Figure (b), (c), and (d), higher learning rates (such as ? = 0.5, ? = 0.01, ? = 0.05) reach their best prediction performances with relatively fewer numbers of trees (? = 200) and can easily over-fit the model if more trees are included. In general, we should guarantee enough trees to model the complexity of the data and, at the same time, prevent over-fitting with too many trees. Figure 18 plots the effect of learning rate on MAPE value. Learning rate adjusts the contribution of each additional tree with a factor ?, 0 < ? < 1. A smaller value of ? limits the contribution of each tree in the model and often requires more trees to be added. Depending on the complexity and the number of trees in the ensemble, the optimal value of ? can be different. Taking Figure 18 (a) for example, with ? = 1000, prediction error (MAPE) increases if we continue to decrease ? after a certain level (? = 0.05). This is because the contributions of individual trees become limited if ? is below a certain level and the current value of ? becomes insufficient. In order to obtain better prediction results, ? should be increased with a decreased value of ?. On the other hand using higher value of ? leads to fewer trees needed to achieve better performance. But a higher ? usually cannot achieve the minimum error. In Figure 18 (b), ? = 0.5 is fitted with relatively fewer trees, but did not achieve an error as small as the one with ? = 0.1. With a higher value of ?, increasing ? leads to poorer prediction performance (Fitting ? = 8000 with ? = 0.5), because over-fitting occurs. Therefore, a smaller learning rate with a larger number of trees is preferable, but computational time would also be increased with 96 many trees being fitted. There needs to be a balance between computational time and prediction accuracy. Figure 17. The Relationship between MAPE and Number of Trees for Models Fitted with Seven Learning Rates and Four Levels of Interactions 0.02 0.025 0.03 0.035 0.04 0.045 0.05 1 200 600 1000 1400 2000 6000 8000 M AP E Number of Trees (a) Interaction = 1 J=0.5 J=0.1 J=0.05 J=0.01 J=0.005 J=0.001 J=0.0005 0.02 0.025 0.03 0.035 0.04 0.045 0.05 1 200 600 1000 1400 2000 6000 8000 M AP E Number of Trees (b) Interaction = 2 J=0.5 J=0.1 J=0.05 J=0.01 J=0.005 J=0.001 J=0.0005 97 Figure 17 (Cont’d). The Relationship between MAPE and Number of Trees for Models Fitted with Seven Learning Rates and Four Levels of Interactions 0.02 0.025 0.03 0.035 0.04 0.045 0.05 1 200 600 1000 1400 2000 6000 8000 M AP E Number of Trees (c) Interaction = 3 J=0.5 J=0.1 J=0.05 J=0.01 J=0.005 J=0.001 J=0.0005 0.02 0.025 0.03 0.035 0.04 0.045 0.05 1 200 600 1000 1400 2000 6000 8000 M AP E Number of Trees (d) Interaction = 4 J=0.5 J=0.1 J=0.05 J=0.01 J=0.005 J=0.001 J=0.0005 98 Figure 18. MAPE against Learning Rate for Models Fitted with Various Numbers of Trees and Different Levels of Interactions Tree complexity, or variable interaction (?), also influences model performance as shown in Figure 19. With ? = 0.5 (Figure 19 (a)) and many trees fitted, the MAPE value increases as ? increases. This increasing rate of MAPE value (the slope of the lines) becomes more obvious with higher ? value. This is because a higher ? value makes each individual base model share a higher contribution to the 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.5 0.1 0.05 0.01 0.005 0.001 5.00E-04 M AP E Learning Rate (a) Interaction = 1 M=1 M=200 M=1000 M=2000 M=6000 M=8000 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.5 0.1 0.05 0.01 0.005 0.001 5.00E-04 M AP E Learning Rate (b) Interaction = 4 M=1 M=200 M=1000 M=2000 M=6000 M=8000 99 model output, with many trees fitted and more complex individual trees, over-fitting easily occurs. By reducing ? = 0.0005 (Figure 19 (b)), the MAPE value decreases as ? increases. This is partly due to the fact that the detailed information from the data can be modelled with a higher level of variable interaction. So, a smaller learning rate restricts each additional tree’s contribution to the model and prevents over-fitting. Figure 19. MAPE against Tree Complexity for Models Fitted with Various Numbers of Trees and Different Learning Rate 0.02 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.03 1 2 3 4 M AP E Variable Interaction (a) Learning Rate = 0.5 M=1000 M=2000 M=8000 0.02 0.025 0.03 0.035 0.04 0.045 1 2 3 4 M AP E Variable Interaction (b) Learning Rate = 0.0005 M=1000 M=2000 M=8000 100 In general, a slower learning rate with a larger number of trees in the model is preferable to a faster learning rate with a smaller number of trees. A slower learning rate shrinks the contribution of each tree more and therefore allows smoother approach to the optimal performance and provides more reliable prediction results. However, when a large number of trees are fitted, model complexity also increases and requires more computational time. There is a need to consider the tradeoff between prediction accuracy and computational time. In addition, the level of variable interaction also affects the optimal selection of learning rate and number of trees. A higher level of variable interaction leads to a more complex model and requires fewer trees to be fitted with a given learning rate. 4.3.3 Model Interpretation Inputs of the model, or predictor variables, usually have different influences on the output (response variable). To explore the individual input variable’s influence on the response variable, we can gain better insight into the data. Breiman et al. (1984)[118] proposed a method to measure the relative influence of each predictor variable on the model output for a single decision tree. This relative influence of a predictor variable is measured based on the number of times this variable is selected to split a currently terminal region (or node) into two sub-regions, weighted by the least-squares improvement for the model as a result of this split. Friedman (2001) [115] generalized this criterion to additional tree expansions by simply averaging this criterion over all trees. The relative influence of each individual variable is scaled so that the sum of them for all the input variables equals to 100. A higher value indicates stronger influence of the input variable to the model. 101 Table 7 gives the relative influence of each input variable to the model output with different prediction horizons. It is obvious that each input variable contributes differently to the response (output of the model). For all three cases, the immediate previous travel time TT??? contributes the most to the predicted travel time. This is expected, as the immediate previous traffic condition will influence traffic in the near future. Therefore, it is closely related with future travel time. The changing rate of travel time ∆TT??? also has higher influence on the model output. ∆TT??? is the difference of travel time between every two consecutive time steps. It indicates the changing behavior of traffic. For example, a positive value of ∆TT??? indicates the increasing trend of travel time and higher positive values are highly correlated with congestion. Another interesting result indicated in Table 7 is that the influence of time of day becomes more significant when we increase the prediction horizon. For a prediction 30 minutes ahead, the time of the day variable contributes 17.43% to the model output. The time of day variable is associated with the periodic feature of travel time: travel time usually increases during peak hours and maintains at a certain value during non-peak hours. With increased prediction horizon, immediate past traffic information cannot be provided and the impact of the most recent available travel time become less significant. More information can be obtained through the level of travel time during certain time interval. Therefore, contribution of the time of day variable increases with increased prediction horizon. 102 Table 7 Relative Influence of Input Variables for GBM Models with Learning Rate of 0.001for Multistep-ahead Prediction Variable 5-Min (1-Step) Ahead 15-Min (3-Step) Ahead 30-Min (6-Step) Ahead Rank Relative Importance Rank Relative Importance Rank Relative Importance ????? 1 96.34% 1 81.47% 1 66.33% ????? 3 0.26% 7 0.48% 8 0.89% ????? 7 0.04% 9 0.27% 7 1.63% ∆????? 2 2.95% 3 6.29% 3 7.61% ∆????? 9 0.02% 6 0.51% 9 0.1% ∆????? 6 0.09% 8 0.27% 10 0.08% Time of day 4 0.16% 2 7.35% 2 17.43% Day 10 0% 10 0.26% 6 1.65% Week 5 0.1% 5 1.05% 5 2.04% Month 8 0.04% 4 2.06% 4 2.24% Since TT??? and ∆TT??? are two most important variables for one-step-ahead prediction, we plot their joint influence on model output (Figure 20). The x-axis defines the lag one difference ∆TT???, the y-axis is the lag one travel time TT???, and the z-axis represents the predicted travel time TT??. As indicated in Figure 20, there is a positive correlation between the lag one travel time TT??? and the predicted travel time TT??. Traffic conditions are often consistent within a short period. If congestion occurs, it will last for several minutes or even hours. Therefore, a high value of travel time at time step t − 1 would more likely followed by another high value of travel time. The influence of travel time difference ∆TT??? on TT?? is more evident if ∆TT??? 103 is a positive value. In other words, if previous travel time continues to increase, the future travel time will also tend to increase. But, if the previous travel time decreases or maintains a certain value, the difference of travel time ∆TT??? will have less impact on future travel time. In general, higher values of both TT??? and ∆TT??? often indicate that congestion occurs. Therefore, they are more likely to be followed by a higher travel time value. Figure 20. Three-dimensional Plots for the Joint Effects of Lag One Difference and Lag One Travel Time on Predicted Travel Time Value 104 4.3.4 Model Comparison To test the effectiveness of the GBM model, this section comprehensively evaluates prediction performance of the ARIMA, the RF and the GBM models. The ARIMA model is one of the widely recognized benchmark models for traffic parameters forecasting. Prediction is based on regression of its current and past values. Optimization of the ARIMA model involves order selection and parameter estimation. Detailed information on theoretical background and steps in fitting an ARIMA model can be found in [96]. We use two months’ training data and seven days’ testing data to compare these three models. Prediction accuracy of the three models are compared on the basis of 5-minute (1-step-ahead), 15-minute (3-step-ahead) and 30 minute (6-step-ahead) ahead prediction. We test different combinations of variables for both the GBM and the RF models during the training process and select the best model according to their MAPE values. The best orders of the ARIMA model are selected based on the method proposed by Hyndman and Khandakar [95], and the parameters of the ARIMA model are estimated based on maximum likelihood method. Table 8 through Table 10 compare these three models’ prediction performance based on MAPE value. Both the RF and the GBM model outperform the ARIMA model in the one-step- ahead prediction as shown in Table 8. The GBM and the RF perform similarly, where RF has slightly lower MAPE value. As the prediction horizon increases, all three models’ performances drop. Comparatively, the GBM method is less sensitive to prediction horizon and maintains a good prediction performance. As indicated in Table 9 and Table 10, the GBM model outperforms both ARIMA and RF models, 9 105 out of 10 cases. In general, the RF and the GBM methods perform better than the ARIMA model in the one-step-ahead prediction. By increasing prediction horizons, the difference among these three models becomes obvious, with the GBM model being the most accurate, compared with the RF and the ARIMA models. Therefore, we conclude that both the RF and the GBM models are promising algorithms in travel time prediction as they are more accurate compared with the ARIMA model. The advantage of the GBM model becomes even obvious in multi-step-ahead predictions. Table 8 Comparison of 5 Minutes Ahead Prediction for ARIMA, RF and GBM Figure 21 shows three days’ forecasting results provided by the GBM model. The blue line stands for the predicted travel time by the GBM model, the red cross stands for the observed (original) value of travel time. We could see that the overall performance of the GBM model is good not only in normal traffic conditions (lower panel of Figure 21), but also is effective during traffic transitional period. On 3/1/2012 (upper panel of Figure 21), there are three recorded incidents between 6:31 and 7:16, which are possible reasons for the traffic state changing from uncongested to congested. I II III IV V ARIMA 2.27% 2.47% 2.35% 2.59% 2.03% RF 2.10% 2.40% 2.28% 2.46% 2.04% GBM 2.14% 2.42% 2.29% 2.46% 2.01% 2.00% 2.20% 2.40% 2.60% 2.80% 3.00% M AP E 5-Min Ahead 106 Table 9 Comparison of 15 Minutes Ahead Prediction for ARIMA, RF and GBM Table 10 Comparison of 30 Minutes Ahead Prediction for ARIMA, RF and GBM The GBM model is able to capture this sudden change. On 3/2/2012 (middle panel of Figure 21), rain begins falling shortly after 16:00 and ends at 19:00. Congestion occurs during this period and travel time increases. The GBM model also adequately captures this congestion. Theoretically, the GBM model is able to handle complex interactions among input variables and can fit complex nonlinear I II III IV V ARIMA 3.86% 4.44% 3.84% 4.20% 2.90% RF 3.46% 3.77% 3.46% 3.75% 2.78% GBM 3.33% 3.63% 3.48% 3.59% 2.77% 2.50% 3.00% 3.50% 4.00% 4.50% 5.00% M AP E 15-Min Ahead I II III IV V ARIMA 4.75% 4.88% 4.20% 4.57% 3.01% RF 3.93% 4.39% 3.88% 4.42% 2.85% GBM 3.80% 3.79% 3.74% 4.19% 2.82% 2.50% 3.00% 3.50% 4.00% 4.50% 5.00% M AP E 30-Min Ahead 107 relationship. Therefore, the GBM model is able to model non-linear characteristics of dynamic traffic systems and leads to superior prediction performance. Figure 21. Sample Travel Time Prediction Results of the GBM method Incident 108 4.3.5 Discussion and Conclusion The GBM model has its unique feature that distinguishes it from other popular ensemble methods, such as bagged trees and random forests. Both bagged trees and random forests are able to reduce variance more than single trees through averaging. Random forests enhance diversity through randomly selecting a subset of variables at each splitting node. The training sample is produced from bootstrap sampling with its distribution similar to the original training set. The bias of the model cannot be reduced through averaging. On the other hand, the GBM model grows trees sequentially by adjusting the weight of the training data distribution to minimize certain loss function. It reduces both model bias through forward stage-wise modeling and reduces variance through averaging[107]. The proposed GBM-based travel time prediction method has considerable advantages over classical statistical approaches and other ensemble methods. Especially, it has superior performance in terms of prediction accuracy. There are limited studies discussed the RF model application in traffic prediction [70, 71]. To the best knowledge of the author, we did not find any studies on the application of the GBM model with freeway travel time data. There is no comparison and discussion on the performance of the RF and the GBM models in traffic prediction. The GBM model can handle sharp discontinuities, an important feature when modeling abnormal traffic conditions where traffic states change from uncongested to congested, and vice versa. Based on data used in this study, the GBM model is able to capture the sudden changes of traffic (for example, in incidents or raining conditions). In addition, the GBM model can automatically select relevant 109 variables, fit accurate models, and identify and model parameter interactions. More importantly, different from other machine learning algorithms as a ‘black-box’, the relative importance or contribution of input variables are also discussed through the GBM model. This is critical for us to get powerful insight into the structure of the data. In addition, comparisons of the ARIMA, the RF and the GBM models indicate that both the RF and the GBM models outperform the conventional statistical model, ARIMA model; the advantage of the GBM becomes more evident for multi-step- ahead prediction. One issue regarding the application of the GBM model in travel time prediction is related with parameter optimization. As addressed in the model optimization section, the performance of the GBM model is largely influenced by its parameters, including number of trees, learning rate and tree complexity (variable interactions). Therefore, there is a need to test the optimal combination of variables when developing the GBM model. Computational time is another issue when tree complexity or the number of trees increases. The tradeoff between computation cost and model accuracy should also be considered when building the model. In short, the GBM model has its considerable advantages in freeway travel time prediction. In particular, as traffic data becomes more readily available, more information can be accessed to study traffic phenomena. The capability of the GBM model in handling different types of input variables, in modeling complex nonlinear relationship makes it a promising algorithm for travel time prediction. 110 Chapter 5: A Travel Time Prediction Framework The previous two chapters discussed the application of different volatility models and ensemble methods in travel time prediction. The volatility model is aimed at improving the reliability forecasting (the variance part), while the ensemble methods are aimed at increasing the prediction accuracy of the mean part. From the discussion of Chapter 3, we know that the stochastic volatility model outperforms the traditional GARCH model in terms of constructing more efficient and effective PIs. Chapter 4 demonstrated the advantage of the GBM model in improving travel time prediction accuracy. In this chapter, we aim at developing a travel time prediction framework to improve both prediction accuracy and reliability. 5.1 Model Development In Chapter 3, we mentioned that the observed travel time can be decomposed into a conditional mean (u?) and a residual (r?) component: ?? = ?? + ?? (50) where ?? is the observed travel time at time ?, ?? represents the estimated conditional mean, and ?? is the residual part. To develop an accurate travel time prediction framework, we need to carefully select both the mean model and the model for the residual part. The ARIMA mean model was used in Chapter 3 to estimate and predict the mean part of travel time for comparison purposes. This does not necessarily indicate the ARIMA model is the best model. Since we demonstrated the superior prediction performance of the GBM model (Chapter 4) in predicting the mean part of the travel time data series, we replaced the ARIMA model with the GBM model to 111 predict the mean part ?? of the time series and then applied the stochastic volatility model in modeling the variance part of the data. We call the newly developed prediction framework as the GBM-SV model. The ultimate goal is to further improve the prediction accuracy. 5.2 Data Description and Preparation To comprehensively compare the proposed model’s performance, this study uses travel time data from three different freeway stretches. As shown in Figure 22, Figure 23, Figure 24, the selected freeway are the I95 southbound direction, the I495 eastbound direction and the MD295 Southbound direction. Each freeway stretch includes multiple segments, which are covered by multiple TMCs. Travel time information for individual TMC segment can be obtained from data provided by INRIX. Based on traffic information collected from these nine TMCs, path travel time for the three segments of study can be estimated. Details of calculating the path travel time for multiple TMC segments can be found in Hamedi, et al. [49]. Then, the travel time information for each segment is aggregated into five minutes time intervals. Table 11 provides detailed information for the 11 selected segments that includes: segment ID, road on which the segment is located, latitude and longitude of the start point and end point of the segment, length and the average travel time of each segment. For comparison purposes, we select the segments with similar length (most of the segments are around 2-4 miles). Average travel time of each segment is the average of five minutes travel times over the entire year. 112 Figure 22. Selected Study Segment at the I95 Southbound Direction Segment 1 Segment 4 Segment 2 Segment 3 113 Figure 23. Selected Study Segment at the I495 Eastbound Direction Segment 5 Segment 7 Segment 6 114 Figure 24. Selected Study Segment at the MD295 Southbound Direction Segment 8 Segment 9 Segment 10 Segment 11 115 Table 11 Selected Freeway Segment Information Segment Road Start End Length Travel Time (min) Latitude Longitude Latitude Longitude 1 I95 39.21848 -76.7269 39.20084 -76.761 2.2 2.00 2 I95 39.20084 -76.761 39.17537 -76.7946 2.53 2.31 3 I95 39.17537 -76.7946 39.15624 -76.8348 2.56 2.32 4 I95 39.15624 -76.8348 39.12223 -76.8675 2.99 2.65 5 I495 39.02016 -76.9582 39.01534 -77.0051 2.59 3.43 6 I495 39.01534 -77.0051 39.01356 -77.0453 2.25 2.64 7 I495 39.01356 -77.0453 39.01631 -77.0981 3.43 3.61 8 MD295 39.21099 -76.6823 39.16517 -76.7361 4.26 4.06 9 MD295 39.16517 -76.7361 39.13714 -76.7573 2.28 2.35 10 MD295 39.13714 -76.7573 39.11021 -76.7839 2.37 2.56 11 MD295 39.11021 -76.7839 39.06913 -76.8316 3.84 4.00 5.3 Model Comparison To test the GBM-SV model’s prediction performance, this section evaluates the GBM-SV with the ARIMA-GARCH model by using travel time data from 11 freeway segments. We use two months’ training data to develop the model and then predict travel time for different days. Prediction accuracy of the two models are compared on the basis of 5-minute (1-step-ahead), 10-minute (2-step-ahead),15- minute (3-step-ahead), 20-minute (4-step-ahead), 25-minute (5-step-ahead), and 30 minute (6-step-ahead) ahead prediction using 11 freeway segments. The model selection procedure (parameter selection and optimization of the GBM and the stochastic volatility model) is the same as discussed in Chapter 3 and Chapter 4. To 116 compare the prediction performance of the GBM-SV model, the ARIMA-GARCH model is used as the bench mark model. Five criterions are introduced to evaluate the model performance. The MAPE and RMSE criterion are used to measure the prediction accuracy of the mean part, while the MPIL, PICP and the PI ratio are used to measure the effective and efficiency of the PIs. Please refer to section 3.1.3 and 3.2.3 for the definition of the MAPE, RMSE, MPIL and PICP criterions. In this section, a new PIs performance measurement is introduced: PI ratio. ?? ????? = ????(??????? ) (51) where ???? and ???? are the coverage probability and average length of the PIs of the model, MTT is mean of travel time. The ??????? indicates the ratio of the average PIs length over the average value of travel time. Since we prefer a larger value of PICP with a smaller value of ??????? ratio, the higher value of the PI ratio, the better the constructed PIs are. Both the GBM-SV and the ARIMA-GARCH models are evaluated according to the above mentioned five criterions. In this section, we randomly select five days to predict their travel time: 2012-11-08, 2012-11-13, 2012-11-26, 2012-12-05, 2012-12- 19. Table 12 - Table 36 provide the MAPE, RMSE, MPIL, PICP and PI-Ratio criterions for prediction results of the 11 freeway segments during these five days periods and the prediction horizons are from 1-step-ahead up to 6-step-ahead. Both the MAPE and the RMSE criterions measure the prediction accuracy of the model (the mean part). The lower values of both criterions indicate more accurate predictions of the mean values of the time series. By closely examining the MAPE 117 and the RMSE values for the GBM-SV and ARIMA-GARCH under different scenarios (Different freeway segment, number of steps ahead prediction and days), we could see from the following tables that the GBM-SV model produces lower MAPE and RMSE values in most of the cases. There are only a few cases that the ARIMA-GARCH model provides better performances over the GBM-SV model. In the following MAPE and RMSE tables, we highlighted (values in boldface) the cases that the ARIMA-GARCH model has a lower value compared with the GBM-SV model. There is only a small number of highlighted cases, which is to say that the GBM-SV model performs better in most cases. Therefore, we could conclude that the GBM-SV model provides more accurate predictions of the mean part of the time series. The next step is to measure the quality of the constructed prediction intervals for each model. Here, we use the PICP, MPIL and the PI-Ratio criterions to measure the quality of the PIs. As we explained in section 3.2.3, a prediction interval of higher PICP value and lower MPIL value is desirable. When comparing PIs of two different models, we have to consider PICP and MPIL values of each model at the same time. As we know that a higher MPIL value (a wider PI) may potentially be related with a higher PICP value, and vice versa. In order to compare both measurements at the same time, the PI-Ratio criterion is introduced here. According to the definition of equation (51), the PI-Ratio considers both the coverage probability and the length of the PIs. In general, the higher value of the PI-Ratio, the better the model is in terms of the quality of the PIs. By looking at the PI-Ratio tables of the GBM-SV and the ARIMA-GARCH model during different days, we could also see that the GBM-SV 118 model provides higher PI-Ratio in most cases, in other words, the GBM-SV model constructs more efficient and effective PIs in most cases. Only for a few cases, the ARIMA-GARCH model provides higher PI-Ratio (We also highlighted the cases when the ARIMA-GARCH model provides better PIs). In order to have a straight forward comparison of the GBM-SV and ARIMA- GARCH models’ performance, Figure 25 through Figure 29 summarize the models’ performance by averaging the performance over 11 segments and 5 days. In terms of mean performance, the average MAPE and RMSE values of the GBM-SV model are lower than the ARIMA-GARCH model. By increasing the number of steps ahead prediction, the advantages of the GBM-SV model become more significant. When comparing the PIs constructed by both models, the GBM-SV also shows its advantage as it has higher PICP and PI-ratio values while has lower MPIL value at the same time. Figure 25 Average MAPE Value over the 11 Segments and 5 Days 0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00% 7.00% 8.00% 9.00% 10.00% Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 M AP E Va lu e GBM-SV ARIMA-GARCH 119 Figure 26 Average RMSE Value over the 11 Segments and 5 Days Figure 27 Average PICP Value over the 11 Segments and 5 Days 0.00 10.00 20.00 30.00 40.00 50.00 60.00 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 RM SE V al ue GBM-SV ARIMA-GARCH 78.00% 80.00% 82.00% 84.00% 86.00% 88.00% 90.00% 92.00% 94.00% 96.00% 98.00% Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 PI CP V al ue GBM-SV ARIMA-GARCH 120 Figure 28 Average MPIL Value over the 11 Segments and 5 Days Figure 29 Average PI-Ratio over the 11 Segments and 5 Days To sum up, by comparing the GBM-SV and the ARIMA-GARCH models’ performance under different scenarios (combination of 11 freeway segments, 5 days and 6 steps ahead prediction), the GBM-SV model shows its superior prediction performance over the traditional ARIMA-GARCH model in terms of both the 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 M PI L V al ue GBM-SV ARIMA-GARCH 0.00 1.00 2.00 3.00 4.00 5.00 6.00 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 PI -R at io GBM-SV ARIMA-GARCH 121 prediction accuracy (prediction of the mean part) and the reliability of the prediction (the quality of the constructed PIs). 122 Table 12 Comparing the MAPE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-08 Thursday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 2.51% 3.49% 3.92% 4.25% 4.45% 4.50% ARIMA-GARCH 2.41% 3.76% 4.44% 4.99% 5.49% 5.87% 2 GBM-SV 2.70% 4.12% 4.96% 5.66% 5.85% 5.93% ARIMA-GARCH 2.72% 4.21% 5.22% 5.92% 6.27% 6.57% 3 GBM-SV 2.34% 3.18% 3.53% 3.66% 3.82% 3.83% ARIMA-GARCH 2.39% 3.45% 3.87% 4.12% 4.23% 4.53% 4 GBM-SV 2.48% 3.77% 4.53% 5.19% 5.71% 6.00% ARIMA-GARCH 2.48% 3.98% 4.89% 5.55% 6.17% 6.68% 5 GBM-SV 4.03% 6.58% 8.42% 10.05% 11.32% 12.49% ARIMA-GARCH 4.01% 7.16% 9.55% 11.52% 13.44% 15.42% 6 GBM-SV 4.43% 7.40% 9.53% 10.91% 12.28% 14.00% ARIMA-GARCH 4.33% 7.84% 11.12% 13.78% 16.43% 19.20% 7 GBM-SV 3.57% 5.49% 6.97% 8.35% 9.66% 11.28% ARIMA-GARCH 3.83% 6.21% 8.40% 10.77% 13.02% 15.72% 8 GBM-SV 1.95% 2.93% 3.25% 3.37% 3.42% 3.40% ARIMA-GARCH 1.79% 2.64% 2.87% 2.95% 2.94% 2.91% 9 GBM-SV 4.07% 5.77% 6.87% 7.63% 8.72% 9.70% ARIMA-GARCH 4.32% 6.68% 8.46% 9.28% 10.00% 10.97% 10 GBM-SV 4.30% 6.97% 8.95% 10.78% 11.87% 13.03% ARIMA-GARCH 4.64% 8.13% 10.73% 13.16% 15.51% 17.61% 11 GBM-SV 4.93% 8.56% 11.49% 13.75% 15.71% 17.27% ARIMA-GARCH 5.45% 9.53% 13.30% 16.19% 18.85% 21.24% 123 Table 13 Comparing the RMSE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-08 Thursday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 6.41 9.71 11.64 13.60 14.56 14.97 ARIMA-GARCH 5.85 9.98 12.67 14.97 16.73 17.82 2 GBM-SV 7.92 13.09 16.36 19.06 19.02 19.14 ARIMA-GARCH 7.61 12.96 16.75 19.36 21.37 22.79 3 GBM-SV 5.48 8.41 10.42 11.35 11.46 11.62 ARIMA-GARCH 5.84 8.48 9.91 10.87 11.24 11.83 4 GBM-SV 12.06 18.80 23.32 27.27 31.16 34.16 ARIMA-GARCH 11.47 20.99 28.58 34.06 38.52 42.47 5 GBM-SV 16.45 28.24 38.12 45.56 50.69 56.23 ARIMA-GARCH 17.27 30.50 40.39 47.59 53.42 59.65 6 GBM-SV 26.46 47.22 62.31 69.94 77.24 85.63 ARIMA-GARCH 24.06 43.40 57.84 67.98 78.04 87.00 7 GBM-SV 46.58 73.68 102.83 131.74 152.06 171.46 ARIMA-GARCH 45.48 73.01 101.04 133.53 157.92 187.33 8 GBM-SV 8.56 14.99 17.76 18.77 19.17 18.36 ARIMA-GARCH 7.68 12.01 13.53 13.76 13.69 13.52 9 GBM-SV 16.36 24.08 28.80 32.24 35.04 37.85 ARIMA-GARCH 16.59 25.20 31.01 34.40 37.38 41.12 10 GBM-SV 31.90 50.08 63.87 73.66 80.72 85.59 ARIMA-GARCH 33.02 54.54 71.97 84.35 94.45 100.08 11 GBM-SV 54.14 89.54 116.77 135.89 150.58 164.29 ARIMA-GARCH 56.93 92.92 123.24 146.46 163.55 178.74 124 Table 14 Comparing the MPIL Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-08 Thursday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 17.35 17.34 17.32 17.29 17.25 17.20 ARIMA-GARCH 17.42 17.81 18.06 18.21 18.29 18.33 2 GBM-SV 19.35 18.92 18.56 18.24 18.05 17.87 ARIMA-GARCH 22.50 22.89 23.16 23.36 23.51 23.65 3 GBM-SV 16.78 16.65 16.52 16.40 16.29 16.18 ARIMA-GARCH 17.63 17.48 17.32 17.16 17.02 16.88 4 GBM-SV 20.45 19.72 19.13 18.73 18.49 18.27 ARIMA-GARCH 22.21 21.71 21.13 20.56 20.06 19.63 5 GBM-SV 56.70 59.66 62.17 64.26 66.16 67.95 ARIMA-GARCH 53.39 57.60 61.86 66.21 70.69 75.34 6 GBM-SV 58.82 58.52 58.04 57.63 57.33 56.72 ARIMA-GARCH 64.38 72.16 80.64 89.93 100.16 111.52 7 GBM-SV 75.73 73.58 71.45 69.67 67.99 66.24 ARIMA-GARCH 88.45 92.98 97.56 102.17 107.00 112.13 8 GBM-SV 29.88 31.19 32.18 32.88 33.50 33.93 ARIMA-GARCH 26.19 27.20 27.83 28.23 28.49 28.65 9 GBM-SV 37.96 37.65 37.24 36.79 36.38 35.83 ARIMA-GARCH 45.88 47.32 48.75 50.19 51.62 53.05 10 GBM-SV 71.03 72.94 74.08 74.31 74.78 74.76 ARIMA-GARCH 76.81 83.12 89.87 97.06 104.73 112.91 11 GBM-SV 123.20 124.64 124.86 123.62 122.29 119.78 ARIMA-GARCH 155.06 186.17 222.57 265.60 316.42 376.61 125 Table 15 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 95.83% 90.63% 88.54% 88.19% 87.15% 86.11% ARIMA-GARCH 94.10% 88.89% 87.15% 86.11% 82.99% 83.68% 2 GBM-SV 92.36% 82.99% 81.25% 78.13% 79.51% 79.17% ARIMA-GARCH 94.79% 86.46% 84.03% 83.33% 81.25% 80.56% 3 GBM-SV 91.67% 84.38% 84.03% 83.68% 83.68% 80.90% ARIMA-GARCH 92.71% 86.11% 81.25% 80.21% 79.86% 77.78% 4 GBM-SV 93.75% 85.76% 84.38% 80.21% 80.21% 79.86% ARIMA-GARCH 92.71% 85.76% 81.94% 82.99% 79.86% 80.90% 5 GBM-SV 94.44% 89.24% 84.72% 81.94% 81.94% 79.17% ARIMA-GARCH 94.44% 88.19% 80.90% 76.39% 73.26% 72.22% 6 GBM-SV 95.14% 86.11% 82.99% 80.90% 81.60% 79.17% ARIMA-GARCH 93.06% 85.42% 79.17% 78.13% 76.39% 74.65% 7 GBM-SV 94.44% 87.15% 85.07% 83.68% 82.64% 80.90% ARIMA-GARCH 93.75% 86.11% 85.07% 82.29% 84.03% 83.68% 8 GBM-SV 95.83% 90.28% 90.63% 90.28% 91.67% 90.28% ARIMA-GARCH 94.44% 89.24% 88.89% 89.93% 90.28% 91.67% 9 GBM-SV 93.75% 89.24% 87.15% 86.81% 84.03% 81.60% ARIMA-GARCH 94.10% 86.46% 84.03% 80.56% 79.17% 80.21% 10 GBM-SV 95.49% 88.89% 84.38% 82.29% 81.60% 80.56% ARIMA-GARCH 94.79% 89.24% 85.07% 83.68% 83.33% 83.68% 11 GBM-SV 94.79% 84.72% 80.56% 77.08% 74.65% 73.61% ARIMA-GARCH 95.83% 89.93% 87.85% 87.15% 86.81% 86.81% 126 Table 16 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-08 Thursday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 6.79 6.40 6.25 6.23 6.16 6.09 ARIMA-GARCH 6.67 6.16 5.96 5.84 5.60 5.64 2 GBM-SV 6.81 6.25 6.23 6.09 6.26 6.28 ARIMA-GARCH 6.00 5.36 5.12 5.02 4.85 4.77 3 GBM-SV 7.69 7.12 7.15 7.17 7.20 7.01 ARIMA-GARCH 7.43 6.96 6.62 6.60 6.63 6.51 4 GBM-SV 7.71 7.28 7.35 7.11 7.20 7.21 ARIMA-GARCH 7.01 6.60 6.44 6.68 6.57 6.78 5 GBM-SV 3.23 2.93 2.69 2.54 2.48 2.35 ARIMA-GARCH 3.40 2.95 2.53 2.25 2.03 1.89 6 GBM-SV 3.06 2.77 2.66 2.59 2.60 2.52 ARIMA-GARCH 2.74 2.23 1.83 1.61 1.40 1.22 7 GBM-SV 3.56 3.28 3.21 3.16 3.13 3.09 ARIMA-GARCH 3.07 2.67 2.50 2.30 2.24 2.12 8 GBM-SV 7.73 6.98 6.79 6.63 6.61 6.43 ARIMA-GARCH 8.69 7.89 7.68 7.65 7.61 7.67 9 GBM-SV 3.63 3.49 3.45 3.47 3.41 3.36 ARIMA-GARCH 2.99 2.65 2.48 2.30 2.18 2.14 10 GBM-SV 2.43 2.18 2.01 1.94 1.89 1.86 ARIMA-GARCH 2.26 1.96 1.73 1.57 1.45 1.36 11 GBM-SV 2.39 2.08 1.95 1.85 1.78 1.76 ARIMA-GARCH 1.94 1.50 1.22 1.01 0.84 0.70 127 Table 17 Comparing the MAPE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-13 Tuesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 2.61% 4.13% 4.55% 5.01% 5.19% 5.28% ARIMA-GARCH 2.54% 4.08% 4.92% 5.59% 6.18% 6.60% 2 GBM-SV 2.59% 3.95% 4.67% 5.24% 5.57% 5.71% ARIMA-GARCH 2.52% 4.11% 4.97% 5.39% 5.80% 6.26% 3 GBM-SV 2.73% 4.10% 5.22% 5.56% 5.67% 5.81% ARIMA-GARCH 2.53% 4.00% 4.99% 5.71% 6.39% 6.92% 4 GBM-SV 3.37% 5.13% 6.41% 7.37% 8.32% 8.88% ARIMA-GARCH 3.48% 5.24% 6.99% 8.44% 9.67% 11.12% 5 GBM-SV 3.25% 5.16% 6.39% 7.34% 8.50% 9.57% ARIMA-GARCH 3.49% 6.21% 8.59% 10.86% 13.25% 15.56% 6 GBM-SV 3.22% 5.31% 6.80% 8.01% 8.91% 9.53% ARIMA-GARCH 3.39% 6.01% 8.49% 10.94% 12.71% 14.42% 7 GBM-SV 2.54% 4.02% 4.70% 4.95% 5.49% 5.97% ARIMA-GARCH 2.43% 3.99% 4.98% 5.50% 6.10% 6.90% 8 GBM-SV 2.96% 4.58% 5.47% 6.35% 7.07% 7.70% ARIMA-GARCH 2.90% 4.90% 6.19% 7.30% 8.29% 9.50% 9 GBM-SV 4.10% 6.25% 7.45% 8.16% 8.93% 9.62% ARIMA-GARCH 4.29% 6.94% 8.61% 10.05% 11.33% 12.66% 10 GBM-SV 4.01% 6.25% 7.43% 8.08% 8.76% 9.24% ARIMA-GARCH 3.91% 7.00% 9.05% 10.44% 11.29% 12.38% 11 GBM-SV 4.40% 6.99% 8.78% 10.39% 11.50% 12.19% ARIMA-GARCH 4.30% 7.20% 9.65% 11.41% 13.09% 14.48% 128 Table 18 Comparing the RMSE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-13 Tuesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 5.60 12.76 13.15 13.86 13.39 12.95 ARIMA-GARCH 5.85 10.34 12.40 14.01 15.42 16.62 2 GBM-SV 7.14 11.06 13.48 14.64 15.52 15.92 ARIMA-GARCH 7.27 11.54 14.09 14.90 15.88 17.14 3 GBM-SV 8.05 12.74 17.71 20.14 21.85 22.67 ARIMA-GARCH 8.42 14.62 18.96 22.12 24.63 27.23 4 GBM-SV 27.04 37.97 47.39 60.99 68.63 73.84 ARIMA-GARCH 27.52 41.99 58.04 74.25 88.42 106.17 5 GBM-SV 35.39 65.65 93.17 117.31 140.45 157.42 ARIMA-GARCH 39.97 74.48 106.55 137.39 166.00 189.85 6 GBM-SV 33.56 49.80 73.43 95.10 110.21 122.21 ARIMA-GARCH 39.05 63.27 92.53 115.28 127.30 141.91 7 GBM-SV 17.94 32.02 40.08 41.45 43.16 46.06 ARIMA-GARCH 17.69 31.55 38.88 42.36 45.08 49.44 8 GBM-SV 27.50 43.65 55.95 66.08 75.35 83.55 ARIMA-GARCH 22.81 41.06 55.44 69.13 85.04 103.55 9 GBM-SV 23.46 33.05 39.49 43.93 48.62 53.77 ARIMA-GARCH 24.80 35.23 43.03 49.76 56.54 63.00 10 GBM-SV 16.98 27.42 34.26 37.72 40.29 42.70 ARIMA-GARCH 16.93 30.32 40.65 47.19 52.55 57.38 11 GBM-SV 34.57 55.46 66.83 76.40 82.30 86.60 ARIMA-GARCH 32.34 52.70 67.08 78.63 86.64 93.29 129 Table 19 Comparing the MPIL Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-13 Tuesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 20.91 21.36 21.44 21.43 21.43 21.43 ARIMA-GARCH 24.56 25.56 26.06 26.50 26.89 27.23 2 GBM-SV 21.39 21.66 21.84 21.94 22.00 22.08 ARIMA-GARCH 20.96 21.30 21.57 21.74 21.87 21.98 3 GBM-SV 20.93 20.78 20.59 20.41 20.24 20.10 ARIMA-GARCH 21.70 21.76 21.76 21.73 21.68 21.62 4 GBM-SV 34.69 32.16 30.19 28.66 27.48 26.63 ARIMA-GARCH 54.29 61.15 68.30 76.02 84.47 93.55 5 GBM-SV 79.63 81.62 82.70 83.63 84.39 85.10 ARIMA-GARCH 89.18 94.43 99.88 105.56 111.48 117.66 6 GBM-SV 54.62 54.98 54.73 54.74 54.68 54.55 ARIMA-GARCH 69.94 78.18 86.90 96.28 106.41 117.37 7 GBM-SV 39.22 40.36 41.48 42.31 43.11 43.88 ARIMA-GARCH 43.08 45.70 47.71 49.29 50.56 51.58 8 GBM-SV 50.67 48.65 46.90 45.45 44.28 43.40 ARIMA-GARCH 50.08 49.43 48.73 48.00 47.26 46.53 9 GBM-SV 54.80 55.19 55.47 55.27 54.96 54.66 ARIMA-GARCH 70.39 76.18 82.40 89.13 96.39 104.27 10 GBM-SV 54.29 56.90 58.54 59.70 60.52 61.18 ARIMA-GARCH 54.34 57.83 61.23 64.75 68.41 72.23 11 GBM-SV 94.44 96.08 96.21 95.68 95.22 94.31 ARIMA-GARCH 100.92 107.24 113.88 120.87 128.20 135.93 130 Table 20 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 96.53% 90.28% 86.81% 87.50% 86.11% 85.76% ARIMA-GARCH 97.92% 90.63% 86.46% 86.11% 84.72% 82.99% 2 GBM-SV 95.14% 86.81% 82.29% 79.86% 77.78% 78.47% ARIMA-GARCH 93.06% 85.07% 79.51% 78.13% 75.00% 72.92% 3 GBM-SV 94.10% 84.38% 79.86% 77.78% 79.51% 78.47% ARIMA-GARCH 92.01% 84.38% 80.90% 76.04% 75.35% 75.00% 4 GBM-SV 92.36% 85.07% 80.21% 80.56% 78.82% 79.86% ARIMA-GARCH 95.14% 90.28% 90.28% 89.24% 88.54% 88.89% 5 GBM-SV 95.49% 89.58% 85.42% 85.42% 83.33% 81.94% ARIMA-GARCH 96.18% 83.68% 74.31% 67.36% 60.07% 51.39% 6 GBM-SV 96.53% 90.28% 87.15% 83.33% 82.64% 81.94% ARIMA-GARCH 96.88% 89.24% 86.81% 82.99% 82.29% 84.03% 7 GBM-SV 96.88% 91.32% 90.63% 90.28% 88.19% 88.54% ARIMA-GARCH 98.26% 95.49% 93.40% 94.44% 92.01% 89.58% 8 GBM-SV 92.71% 84.38% 80.90% 81.60% 79.17% 78.13% ARIMA-GARCH 92.36% 80.90% 78.13% 77.43% 76.39% 74.65% 9 GBM-SV 94.44% 87.15% 84.03% 83.33% 83.68% 81.94% ARIMA-GARCH 95.49% 92.36% 88.54% 88.54% 88.19% 87.85% 10 GBM-SV 93.75% 90.63% 88.19% 89.93% 88.19% 87.15% ARIMA-GARCH 94.79% 86.81% 83.68% 82.99% 83.68% 85.07% 11 GBM-SV 93.40% 87.85% 82.99% 78.13% 76.74% 74.65% ARIMA-GARCH 95.83% 87.50% 82.29% 77.78% 78.13% 76.04% 131 Table 21 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-13 Tuesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 5.71 5.22 4.98 5.02 4.92 4.89 ARIMA-GARCH 4.97 4.43 4.15 4.06 3.94 3.81 2 GBM-SV 6.34 5.70 5.35 5.16 5.02 5.04 ARIMA-GARCH 6.37 5.73 5.29 5.15 4.92 4.75 3 GBM-SV 6.61 5.94 5.67 5.54 5.70 5.65 ARIMA-GARCH 6.25 5.71 5.47 5.14 5.11 5.10 4 GBM-SV 4.99 4.90 4.88 5.10 5.15 5.33 ARIMA-GARCH 3.31 2.79 2.51 2.24 2.01 1.84 5 GBM-SV 3.69 3.31 3.05 2.97 2.82 2.69 ARIMA-GARCH 3.35 2.71 2.25 1.90 1.58 1.27 6 GBM-SV 3.68 3.36 3.20 3.00 2.93 2.87 ARIMA-GARCH 2.92 2.38 2.06 1.75 1.55 1.42 7 GBM-SV 5.55 5.07 4.90 4.77 4.58 4.51 ARIMA-GARCH 5.16 4.74 4.45 4.37 4.16 3.97 8 GBM-SV 5.24 4.87 4.77 4.90 4.82 4.79 ARIMA-GARCH 5.42 4.83 4.75 4.81 4.85 4.86 9 GBM-SV 3.01 2.71 2.57 2.53 2.52 2.46 ARIMA-GARCH 2.40 2.13 1.88 1.73 1.58 1.45 10 GBM-SV 2.98 2.73 2.57 2.56 2.46 2.39 ARIMA-GARCH 3.05 2.62 2.39 2.24 2.13 2.05 11 GBM-SV 2.93 2.68 2.50 2.34 2.28 2.22 ARIMA-GARCH 2.86 2.46 2.18 1.94 1.84 1.69 132 Table 22 Comparing the MAPE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-26 Monday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 2.29% 3.14% 3.47% 3.56% 3.62% 3.74% ARIMA-GARCH 2.71% 4.76% 6.09% 6.73% 7.18% 7.59% 2 GBM-SV 2.15% 3.16% 3.75% 4.08% 4.29% 4.57% ARIMA-GARCH 2.40% 3.60% 4.16% 4.50% 4.67% 4.87% 3 GBM-SV 1.69% 2.31% 2.50% 2.59% 2.69% 2.76% ARIMA-GARCH 1.77% 2.66% 3.00% 3.19% 3.31% 3.42% 4 GBM-SV 1.62% 2.22% 2.34% 2.45% 2.60% 2.69% ARIMA-GARCH 1.75% 2.30% 2.24% 2.22% 2.21% 2.19% 5 GBM-SV 3.58% 5.70% 6.89% 8.00% 9.20% 10.18% ARIMA-GARCH 4.25% 7.29% 9.21% 10.32% 11.40% 12.82% 6 GBM-SV 3.45% 5.11% 5.82% 6.36% 6.68% 6.90% ARIMA-GARCH 4.13% 6.81% 8.65% 9.81% 10.65% 11.37% 7 GBM-SV 2.76% 4.35% 5.27% 6.01% 6.60% 7.25% ARIMA-GARCH 3.25% 5.70% 6.62% 7.23% 7.64% 8.01% 8 GBM-SV 1.78% 2.53% 2.79% 2.88% 2.91% 2.98% ARIMA-GARCH 1.72% 2.65% 3.01% 3.17% 3.15% 3.06% 9 GBM-SV 4.69% 7.42% 8.74% 9.87% 10.49% 10.76% ARIMA-GARCH 4.73% 7.64% 8.95% 9.91% 10.47% 10.94% 10 GBM-SV 3.57% 5.26% 6.32% 7.39% 8.31% 9.11% ARIMA-GARCH 3.69% 6.11% 7.99% 9.93% 11.72% 13.35% 11 GBM-SV 3.35% 4.95% 5.65% 6.15% 6.70% 6.96% ARIMA-GARCH 3.22% 5.39% 6.57% 7.22% 7.98% 8.47% 133 Table 23 Comparing the RMSE Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-26 Monday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 5.36 7.56 8.72 9.00 9.38 9.76 ARIMA-GARCH 5.85 10.01 13.14 14.57 15.46 15.98 2 GBM-SV 5.22 8.45 10.88 12.60 13.93 15.17 ARIMA-GARCH 6.18 10.09 12.41 13.91 15.12 15.88 3 GBM-SV 3.22 4.46 4.90 5.22 5.52 5.70 ARIMA-GARCH 3.35 5.09 6.00 6.55 6.88 7.16 4 GBM-SV 7.95 10.47 10.36 12.12 14.96 17.25 ARIMA-GARCH 7.61 9.45 8.14 7.54 7.44 7.33 5 GBM-SV 41.08 65.08 69.68 73.84 90.49 101.81 ARIMA-GARCH 42.32 77.40 85.79 93.04 116.71 135.73 6 GBM-SV 21.42 42.74 54.35 62.43 64.48 65.56 ARIMA-GARCH 25.07 49.05 67.69 80.17 86.82 90.03 7 GBM-SV 20.96 36.69 52.01 65.65 73.00 77.64 ARIMA-GARCH 25.31 44.72 56.90 67.39 74.19 78.98 8 GBM-SV 5.65 7.96 8.67 8.89 8.95 9.06 ARIMA-GARCH 5.40 8.05 9.21 9.62 9.56 9.23 9 GBM-SV 16.39 26.82 31.52 35.00 37.29 38.89 ARIMA-GARCH 16.23 27.00 32.03 35.92 39.11 41.06 10 GBM-SV 21.00 32.48 40.63 49.17 56.78 62.48 ARIMA-GARCH 21.10 34.22 44.45 57.12 69.30 78.69 11 GBM-SV 23.39 36.34 40.29 43.06 47.02 49.60 ARIMA-GARCH 22.45 37.44 44.94 49.10 55.27 59.87 134 Table 24 Comparing the MPIL Values of the GBM-SV and the ARIMA- GARCH Models (2012-11-26 Monday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 18.54 19.39 20.23 20.98 21.70 22.35 ARIMA-GARCH 24.16 24.65 25.26 25.77 26.25 26.76 2 GBM-SV 18.42 19.09 19.64 20.08 20.46 20.82 ARIMA-GARCH 20.81 22.40 23.85 25.24 26.62 27.95 3 GBM-SV 15.32 15.96 16.49 16.93 17.23 17.52 ARIMA-GARCH 14.77 16.03 16.94 17.64 18.19 18.62 4 GBM-SV 16.51 17.41 18.13 18.69 19.15 19.55 ARIMA-GARCH 19.88 22.22 23.23 23.70 23.91 24.02 5 GBM-SV 55.54 55.03 54.50 54.03 53.73 53.50 ARIMA-GARCH 65.35 67.49 69.20 70.62 71.85 72.97 6 GBM-SV 39.81 40.21 40.41 40.57 40.67 40.96 ARIMA-GARCH 50.53 53.63 56.66 59.67 62.69 65.71 7 GBM-SV 37.38 37.74 38.08 38.29 38.51 38.71 ARIMA-GARCH 46.01 47.41 48.44 49.19 49.72 50.08 8 GBM-SV 28.03 29.76 31.15 32.28 33.09 33.82 ARIMA-GARCH 28.12 28.98 29.66 30.21 30.64 30.98 9 GBM-SV 40.40 39.64 38.95 38.26 37.66 37.06 ARIMA-GARCH 42.75 43.27 43.76 44.16 44.47 44.72 10 GBM-SV 46.97 47.46 47.63 47.85 47.80 47.85 ARIMA-GARCH 55.41 58.18 61.14 64.24 67.45 70.80 11 GBM-SV 54.24 54.88 55.05 55.35 55.36 55.41 ARIMA-GARCH 55.39 58.25 60.82 63.14 65.29 67.27 135 Table 25 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 96.88% 92.71% 90.63% 92.01% 91.32% 90.97% ARIMA-GARCH 94.44% 87.50% 79.51% 74.65% 75.00% 74.31% 2 GBM-SV 95.14% 89.93% 86.46% 87.15% 86.46% 86.11% ARIMA-GARCH 94.79% 90.28% 88.89% 88.89% 90.97% 92.71% 3 GBM-SV 98.61% 92.36% 90.63% 91.32% 90.63% 89.58% ARIMA-GARCH 96.88% 90.28% 88.19% 86.46% 85.76% 86.46% 4 GBM-SV 98.26% 95.14% 94.10% 95.14% 96.18% 95.83% ARIMA-GARCH 98.96% 96.88% 97.57% 96.53% 97.22% 97.92% 5 GBM-SV 95.49% 87.85% 85.42% 82.99% 82.64% 81.94% ARIMA-GARCH 95.14% 85.07% 82.29% 84.03% 84.03% 85.76% 6 GBM-SV 95.49% 90.97% 90.63% 90.63% 88.89% 87.85% ARIMA-GARCH 95.14% 85.07% 81.94% 83.68% 82.29% 82.99% 7 GBM-SV 95.14% 92.01% 89.24% 88.89% 86.46% 86.46% ARIMA-GARCH 97.22% 86.11% 85.76% 84.72% 85.42% 86.11% 8 GBM-SV 97.22% 92.36% 92.71% 92.36% 92.01% 93.40% ARIMA-GARCH 97.92% 93.06% 88.89% 87.15% 88.89% 92.71% 9 GBM-SV 93.06% 85.42% 82.29% 79.17% 79.17% 77.78% ARIMA-GARCH 93.40% 83.33% 83.33% 81.94% 81.94% 80.90% 10 GBM-SV 95.49% 89.58% 86.46% 84.72% 85.07% 85.42% ARIMA-GARCH 95.49% 88.19% 84.03% 80.56% 80.90% 80.90% 11 GBM-SV 95.49% 90.28% 86.11% 84.03% 84.38% 83.68% ARIMA-GARCH 95.14% 90.28% 86.81% 82.99% 84.72% 85.07% 136 Table 26 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-11-26 Monday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 6.33 5.79 5.42 5.30 5.08 4.91 ARIMA-GARCH 4.75 4.31 3.82 3.52 3.47 3.37 2 GBM-SV 7.16 6.54 6.12 6.04 5.88 5.77 ARIMA-GARCH 6.29 5.56 5.13 4.84 4.69 4.55 3 GBM-SV 8.88 8.00 7.60 7.47 7.28 7.09 ARIMA-GARCH 9.03 7.75 7.16 6.74 6.49 6.39 4 GBM-SV 9.40 8.64 8.21 8.07 7.97 7.79 ARIMA-GARCH 7.85 6.88 6.62 6.42 6.41 6.43 5 GBM-SV 3.67 3.42 3.36 3.29 3.29 3.27 ARIMA-GARCH 3.08 2.65 2.48 2.45 2.39 2.39 6 GBM-SV 4.07 3.83 3.77 3.74 3.63 3.55 ARIMA-GARCH 3.17 2.66 2.40 2.31 2.14 2.04 7 GBM-SV 5.54 5.32 5.13 5.09 4.92 4.89 ARIMA-GARCH 4.58 3.94 3.84 3.74 3.73 3.73 8 GBM-SV 8.25 7.41 7.13 6.86 6.67 6.63 ARIMA-GARCH 8.25 7.61 7.11 6.85 6.90 7.12 9 GBM-SV 3.38 3.17 3.11 3.04 3.08 3.07 ARIMA-GARCH 3.16 2.76 2.72 2.63 2.60 2.54 10 GBM-SV 3.62 3.35 3.20 3.10 3.10 3.09 ARIMA-GARCH 3.10 2.73 2.48 2.26 2.16 2.06 11 GBM-SV 4.31 4.04 3.84 3.72 3.72 3.68 ARIMA-GARCH 4.18 3.77 3.47 3.20 3.15 3.07 137 Table 27 Comparing the MAPE Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-05 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 2.19% 2.96% 3.25% 3.40% 3.40% 3.43% ARIMA-GARCH 2.36% 3.74% 4.59% 4.76% 4.99% 5.08% 2 GBM-SV 2.04% 2.89% 3.34% 3.68% 3.92% 4.03% ARIMA-GARCH 2.21% 3.43% 4.11% 4.37% 4.47% 4.70% 3 GBM-SV 1.67% 2.23% 2.41% 2.55% 2.57% 2.52% ARIMA-GARCH 1.70% 2.43% 2.74% 2.88% 3.01% 3.07% 4 GBM-SV 1.62% 2.17% 2.40% 2.46% 2.55% 2.58% ARIMA-GARCH 1.66% 2.20% 2.42% 2.52% 2.56% 2.62% 5 GBM-SV 3.09% 4.67% 6.04% 7.23% 8.31% 9.40% ARIMA-GARCH 3.28% 5.97% 8.05% 9.61% 11.20% 12.87% 6 GBM-SV 2.98% 4.64% 5.54% 6.08% 6.41% 6.69% ARIMA-GARCH 3.11% 5.34% 6.89% 8.16% 9.20% 10.00% 7 GBM-SV 2.47% 3.53% 3.97% 4.31% 4.71% 4.98% ARIMA-GARCH 2.49% 3.80% 4.38% 4.80% 5.05% 5.25% 8 GBM-SV 1.79% 2.33% 2.48% 2.50% 2.48% 2.57% ARIMA-GARCH 1.92% 2.73% 3.05% 3.03% 2.98% 2.96% 9 GBM-SV 3.04% 4.90% 5.90% 6.58% 7.11% 7.62% ARIMA-GARCH 3.17% 5.21% 6.55% 7.27% 7.64% 8.10% 10 GBM-SV 3.21% 5.03% 6.06% 6.80% 7.35% 8.00% ARIMA-GARCH 3.39% 5.86% 7.72% 9.24% 10.38% 11.60% 11 GBM-SV 3.09% 4.67% 5.52% 6.20% 6.99% 7.50% ARIMA-GARCH 3.30% 5.79% 7.13% 8.02% 8.70% 9.49% 138 Table 28 Comparing the RMSE Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-05 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 4.69 7.83 9.68 10.63 10.78 10.72 ARIMA-GARCH 5.31 9.12 11.57 12.64 12.92 13.18 2 GBM-SV 4.69 8.12 10.00 11.36 11.84 12.28 ARIMA-GARCH 4.90 8.18 10.09 10.99 11.61 12.19 3 GBM-SV 3.38 4.98 5.76 6.18 6.25 6.14 ARIMA-GARCH 3.42 5.14 6.02 6.46 6.64 6.77 4 GBM-SV 4.78 6.47 7.49 7.98 8.75 8.56 ARIMA-GARCH 4.83 6.57 7.52 8.03 8.36 8.53 5 GBM-SV 14.86 23.51 30.18 36.76 42.89 48.83 ARIMA-GARCH 15.47 28.47 40.08 51.79 64.36 75.95 6 GBM-SV 10.44 18.45 22.76 24.38 25.54 25.74 ARIMA-GARCH 10.80 19.56 26.07 30.76 34.64 37.62 7 GBM-SV 8.85 15.09 17.77 19.31 20.73 22.42 ARIMA-GARCH 9.21 15.86 20.42 23.57 25.98 28.14 8 GBM-SV 6.58 8.32 8.65 8.62 8.47 8.82 ARIMA-GARCH 7.03 9.38 9.94 9.83 9.77 9.84 9 GBM-SV 10.52 18.63 22.77 24.97 26.98 28.97 ARIMA-GARCH 10.32 18.60 23.56 25.94 27.55 28.90 10 GBM-SV 22.26 33.15 40.41 46.60 51.98 56.52 ARIMA-GARCH 23.68 37.37 47.79 57.30 66.44 74.13 11 GBM-SV 22.74 32.30 37.81 42.95 47.27 48.80 ARIMA-GARCH 25.12 37.99 46.65 52.74 59.70 62.11 139 Table 29 Comparing the MPIL Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-05 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 15.15 15.46 15.71 15.93 16.07 16.19 ARIMA-GARCH 16.15 16.57 16.90 17.18 17.42 17.62 2 GBM-SV 17.03 17.57 18.04 18.38 18.72 18.96 ARIMA-GARCH 18.77 20.58 22.22 23.67 25.05 26.33 3 GBM-SV 13.33 13.57 13.75 13.93 14.05 14.14 ARIMA-GARCH 13.03 13.16 13.27 13.36 13.44 13.51 4 GBM-SV 15.74 16.16 16.43 16.55 16.68 16.76 ARIMA-GARCH 18.10 18.87 19.25 19.43 19.52 19.57 5 GBM-SV 40.82 41.33 41.76 42.06 42.22 42.34 ARIMA-GARCH 43.79 46.46 48.76 50.79 52.64 54.33 6 GBM-SV 28.49 29.11 29.65 30.03 30.41 30.69 ARIMA-GARCH 31.33 32.63 33.84 34.98 36.05 37.06 7 GBM-SV 28.46 28.53 28.48 28.46 28.35 28.24 ARIMA-GARCH 28.77 28.90 28.95 28.95 28.92 28.89 8 GBM-SV 27.47 28.59 29.38 29.88 30.33 30.67 ARIMA-GARCH 25.51 26.09 26.54 26.91 27.23 27.51 9 GBM-SV 29.51 30.59 31.28 31.89 32.32 32.76 ARIMA-GARCH 30.96 34.95 39.09 43.44 48.10 53.09 10 GBM-SV 49.02 50.01 50.44 50.39 50.43 50.39 ARIMA-GARCH 55.59 63.74 72.69 82.62 93.70 106.09 11 GBM-SV 59.63 62.64 65.25 67.36 68.79 70.22 ARIMA-GARCH 63.80 72.41 81.26 90.69 100.77 111.69 140 Table 30 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 96.53% 90.97% 88.89% 89.58% 89.24% 88.89% ARIMA-GARCH 94.10% 85.76% 81.25% 79.86% 80.90% 80.21% 2 GBM-SV 96.88% 89.24% 90.28% 89.58% 88.89% 89.24% ARIMA-GARCH 96.53% 89.24% 90.28% 90.28% 89.93% 90.63% 3 GBM-SV 95.83% 90.28% 90.63% 89.58% 89.58% 89.93% ARIMA-GARCH 96.18% 88.19% 87.50% 83.68% 84.72% 84.03% 4 GBM-SV 95.49% 94.10% 93.40% 93.75% 92.71% 91.67% ARIMA-GARCH 97.92% 95.83% 96.18% 94.44% 95.49% 95.14% 5 GBM-SV 94.10% 88.89% 83.68% 81.25% 77.43% 76.74% ARIMA-GARCH 94.79% 85.42% 78.82% 76.04% 77.08% 76.04% 6 GBM-SV 95.83% 86.46% 83.68% 84.72% 83.68% 80.21% ARIMA-GARCH 94.79% 86.81% 82.64% 81.60% 79.51% 77.78% 7 GBM-SV 96.53% 88.89% 84.38% 83.68% 81.60% 78.13% ARIMA-GARCH 95.14% 85.07% 80.90% 79.17% 79.17% 79.86% 8 GBM-SV 94.79% 89.24% 89.24% 91.32% 92.01% 90.28% ARIMA-GARCH 92.01% 84.38% 84.38% 85.42% 85.07% 84.72% 9 GBM-SV 93.40% 89.58% 86.81% 85.42% 85.07% 83.33% ARIMA-GARCH 95.83% 90.63% 88.19% 88.54% 90.63% 92.01% 10 GBM-SV 95.14% 87.85% 86.11% 85.42% 85.42% 87.15% ARIMA-GARCH 95.14% 88.19% 89.24% 89.58% 88.54% 90.28% 11 GBM-SV 94.44% 91.32% 89.58% 87.50% 85.76% 86.11% ARIMA-GARCH 96.18% 89.93% 88.89% 90.97% 90.28% 89.24% 141 Table 31 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-12-05 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 7.69 7.11 6.83 6.79 6.70 6.63 ARIMA-GARCH 7.04 6.26 5.82 5.63 5.63 5.52 2 GBM-SV 7.83 7.00 6.91 6.74 6.58 6.52 ARIMA-GARCH 7.09 5.98 5.60 5.26 4.96 4.76 3 GBM-SV 9.96 9.20 9.11 8.89 8.82 8.81 ARIMA-GARCH 10.26 9.32 9.17 8.71 8.77 8.65 4 GBM-SV 9.65 9.25 9.03 9.00 8.83 8.68 ARIMA-GARCH 8.61 8.07 7.94 7.71 7.76 7.71 5 GBM-SV 5.01 4.68 4.38 4.24 4.03 3.99 ARIMA-GARCH 4.69 3.97 3.49 3.22 3.14 3.00 6 GBM-SV 5.37 4.75 4.53 4.54 4.44 4.23 ARIMA-GARCH 4.83 4.24 3.89 3.71 3.50 3.32 7 GBM-SV 7.27 6.69 6.37 6.33 6.21 5.98 ARIMA-GARCH 7.07 6.27 5.94 5.79 5.78 5.82 8 GBM-SV 8.28 7.49 7.28 7.34 7.29 7.08 ARIMA-GARCH 8.66 7.75 7.62 7.61 7.49 7.39 9 GBM-SV 4.36 4.05 3.85 3.72 3.66 3.55 ARIMA-GARCH 4.27 3.58 3.11 2.81 2.60 2.39 10 GBM-SV 3.44 3.09 2.98 2.94 2.93 2.98 ARIMA-GARCH 3.06 2.47 2.19 1.94 1.69 1.52 11 GBM-SV 3.95 3.63 3.41 3.22 3.08 3.03 ARIMA-GARCH 3.76 3.10 2.73 2.50 2.23 1.99 142 Table 32 Comparing the MAPE Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-19 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 2.10% 2.68% 2.84% 2.78% 2.90% 2.92% ARIMA-GARCH 2.41% 3.62% 4.23% 4.24% 4.42% 4.68% 2 GBM-SV 1.81% 2.34% 2.51% 2.56% 2.64% 2.65% ARIMA-GARCH 2.01% 2.85% 3.08% 3.18% 3.25% 3.32% 3 GBM-SV 1.61% 2.21% 2.37% 2.41% 2.34% 2.29% ARIMA-GARCH 1.57% 2.27% 2.47% 2.56% 2.58% 2.62% 4 GBM-SV 1.39% 1.92% 2.10% 2.12% 2.13% 2.12% ARIMA-GARCH 1.41% 1.91% 2.06% 2.13% 2.17% 2.20% 5 GBM-SV 3.46% 5.68% 7.37% 8.29% 9.06% 9.50% ARIMA-GARCH 3.68% 6.42% 8.65% 10.02% 11.25% 12.35% 6 GBM-SV 3.56% 5.56% 6.75% 7.85% 8.75% 9.43% ARIMA-GARCH 4.04% 7.23% 9.59% 11.95% 14.27% 16.43% 7 GBM-SV 2.53% 3.88% 4.56% 5.03% 5.32% 5.64% ARIMA-GARCH 2.57% 4.39% 5.42% 6.31% 7.21% 7.99% 8 GBM-SV 1.72% 2.25% 2.42% 2.46% 2.46% 2.49% ARIMA-GARCH 1.70% 2.40% 2.59% 2.71% 2.76% 2.77% 9 GBM-SV 2.52% 3.73% 4.20% 4.59% 5.03% 5.32% ARIMA-GARCH 2.66% 4.27% 5.24% 5.99% 6.83% 7.50% 10 GBM-SV 2.85% 3.87% 4.26% 4.55% 5.00% 5.44% ARIMA-GARCH 2.71% 4.17% 4.84% 5.17% 5.56% 6.08% 11 GBM-SV 2.98% 4.42% 4.94% 5.00% 5.42% 5.59% ARIMA-GARCH 2.82% 4.41% 4.94% 5.20% 5.67% 6.05% 143 Table 33 Comparing the RMSE Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-19 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 3.88 5.00 5.46 5.16 5.34 5.32 ARIMA-GARCH 4.36 6.16 7.09 7.20 7.79 8.22 2 GBM-SV 3.40 4.48 4.98 5.23 5.31 5.29 ARIMA-GARCH 3.72 5.09 5.58 5.73 5.78 5.85 3 GBM-SV 2.98 4.04 4.39 4.56 4.42 4.28 ARIMA-GARCH 2.90 4.09 4.54 4.70 4.75 4.74 4 GBM-SV 3.14 4.40 4.81 4.91 4.88 4.87 ARIMA-GARCH 3.07 4.31 4.68 4.80 4.85 4.87 5 GBM-SV 19.17 26.25 31.77 32.85 34.29 36.46 ARIMA-GARCH 17.87 26.85 33.78 36.15 36.61 38.45 6 GBM-SV 13.16 20.15 22.38 26.19 28.93 30.44 ARIMA-GARCH 13.60 21.01 24.98 29.29 32.53 35.13 7 GBM-SV 11.03 17.53 21.08 23.24 24.20 25.72 ARIMA-GARCH 10.92 18.44 23.14 27.26 30.50 33.43 8 GBM-SV 5.67 7.28 7.77 7.89 7.90 7.99 ARIMA-GARCH 5.75 7.63 8.11 8.39 8.54 8.61 9 GBM-SV 8.59 12.53 13.61 15.19 17.17 17.96 ARIMA-GARCH 8.68 13.44 15.89 18.59 21.35 23.48 10 GBM-SV 7.90 10.01 10.27 11.49 13.08 14.53 ARIMA-GARCH 7.62 11.13 12.92 15.14 16.86 18.64 11 GBM-SV 17.49 26.69 29.42 27.79 29.79 30.12 ARIMA-GARCH 16.35 25.44 28.40 28.12 29.71 31.43 144 Table 34 Comparing the MPIL Values of the GBM-SV and the ARIMA- GARCH Models (2012-12-19 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 16.25 17.04 17.66 18.17 18.61 18.99 ARIMA-GARCH 17.59 18.98 20.19 21.20 22.13 22.96 2 GBM-SV 16.66 17.58 18.43 19.08 19.61 20.14 ARIMA-GARCH 17.79 19.98 22.15 24.28 26.49 28.77 3 GBM-SV 14.08 14.64 15.09 15.45 15.77 16.04 ARIMA-GARCH 13.58 14.22 14.79 15.30 15.78 16.22 4 GBM-SV 13.22 13.48 13.69 13.89 14.01 14.13 ARIMA-GARCH 12.95 13.42 13.73 13.95 14.11 14.22 5 GBM-SV 41.06 42.78 44.34 45.56 46.62 47.83 ARIMA-GARCH 51.06 54.19 57.38 60.73 64.15 67.70 6 GBM-SV 38.31 39.92 41.29 42.41 43.27 44.12 ARIMA-GARCH 42.00 44.25 46.43 48.58 50.69 52.78 7 GBM-SV 37.76 39.62 41.26 42.61 43.77 44.96 ARIMA-GARCH 40.05 49.10 59.10 70.25 82.83 97.38 8 GBM-SV 30.03 32.63 34.77 36.37 37.53 38.59 ARIMA-GARCH 24.81 25.78 26.64 27.43 28.15 28.82 9 GBM-SV 27.91 30.87 33.35 35.72 37.59 39.53 ARIMA-GARCH 28.56 34.73 41.90 50.31 60.22 71.95 10 GBM-SV 32.63 36.37 39.48 42.26 44.70 46.89 ARIMA-GARCH 25.64 28.08 30.53 32.98 35.53 38.13 11 GBM-SV 54.56 58.14 61.36 63.81 65.85 67.78 ARIMA-GARCH 46.86 49.78 52.62 55.40 58.13 60.83 145 Table 35 Comparing the PICP Values of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 95.83% 89.58% 89.93% 91.32% 91.32% 92.01% ARIMA-GARCH 94.10% 88.89% 87.85% 88.54% 89.93% 90.63% 2 GBM-SV 97.92% 93.40% 94.10% 93.75% 93.40% 94.44% ARIMA-GARCH 97.22% 92.71% 93.75% 96.18% 96.88% 98.96% 3 GBM-SV 97.92% 92.36% 90.97% 89.93% 91.32% 92.71% ARIMA-GARCH 97.22% 90.97% 89.58% 88.89% 89.93% 90.97% 4 GBM-SV 95.49% 91.67% 88.89% 88.89% 89.24% 90.97% ARIMA-GARCH 95.14% 92.36% 90.28% 90.63% 90.97% 92.01% 5 GBM-SV 95.14% 89.93% 83.68% 84.38% 81.94% 83.33% ARIMA-GARCH 95.49% 88.89% 80.56% 78.82% 78.82% 74.31% 6 GBM-SV 94.44% 89.24% 84.72% 84.03% 81.60% 79.86% ARIMA-GARCH 96.53% 87.50% 78.47% 70.49% 60.42% 53.82% 7 GBM-SV 95.14% 88.89% 86.11% 84.72% 83.68% 84.38% ARIMA-GARCH 95.49% 92.01% 90.63% 91.32% 91.32% 93.40% 8 GBM-SV 97.22% 94.10% 94.10% 95.83% 96.53% 96.53% ARIMA-GARCH 94.79% 89.58% 88.89% 89.93% 91.32% 92.01% 9 GBM-SV 97.22% 94.44% 95.83% 95.49% 95.49% 94.44% ARIMA-GARCH 96.88% 93.75% 94.10% 94.44% 95.83% 96.88% 10 GBM-SV 96.18% 92.71% 94.10% 93.40% 92.71% 90.97% ARIMA-GARCH 94.44% 89.93% 87.50% 88.89% 91.67% 93.40% 11 GBM-SV 96.53% 94.44% 94.10% 92.71% 93.06% 93.75% ARIMA-GARCH 96.53% 91.67% 89.58% 89.58% 88.54% 90.97% 146 Table 36 Comparing the PI-Ratio of the GBM-SV and the ARIMA-GARCH Models (2012-12-19 Wednesday) Data Set Model Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 1 GBM-SV 7.04 6.28 6.09 6.01 5.88 5.81 ARIMA-GARCH 6.40 5.61 5.21 5.01 4.88 4.75 2 GBM-SV 7.98 7.22 6.95 6.70 6.50 6.41 ARIMA-GARCH 7.44 6.33 5.79 5.43 5.02 4.73 3 GBM-SV 9.59 8.69 8.30 8.02 7.98 7.98 ARIMA-GARCH 9.90 8.85 8.38 8.04 7.89 7.77 4 GBM-SV 11.43 10.76 10.27 10.13 10.09 10.20 ARIMA-GARCH 11.65 10.91 10.42 10.29 10.22 10.25 5 GBM-SV 3.92 3.61 3.29 3.25 3.11 3.10 ARIMA-GARCH 3.14 2.77 2.39 2.23 2.14 1.93 6 GBM-SV 3.77 3.45 3.20 3.11 2.97 2.87 ARIMA-GARCH 3.51 3.04 2.63 2.28 1.89 1.63 7 GBM-SV 5.49 4.89 4.56 4.35 4.19 4.12 ARIMA-GARCH 5.22 4.12 3.38 2.88 2.45 2.14 8 GBM-SV 7.75 6.91 6.49 6.32 6.17 6.01 ARIMA-GARCH 9.15 8.33 8.01 7.87 7.79 7.67 9 GBM-SV 4.73 4.18 3.93 3.67 3.50 3.30 ARIMA-GARCH 4.62 3.69 3.09 2.59 2.20 1.87 10 GBM-SV 4.07 3.54 3.33 3.10 2.92 2.74 ARIMA-GARCH 5.07 4.41 3.94 3.71 3.55 3.37 11 GBM-SV 4.08 3.76 3.57 3.39 3.31 3.25 ARIMA-GARCH 4.74 4.24 3.92 3.72 3.51 3.45 147 5.4 Chapter Summary This chapter presented a new travel time prediction framework that takes advantage of both the GBM and SV models. Travel time data from 11 freeway segments and the prediction horizons from 1-step-ahead up to 6-step-ahead prediction were used for examining the performance of the models. In order to test the performance of the proposed GBM-SV model, the ARIMA-GARCH model is used as the benchmark model. Through comparing these two modeling approaches under different scenarios, the GBM-SV model shows its considerable advantage over the ARIMA-GARCH model in terms of prediction accuracy and reliability. This conclusion also supports the conclusions of Chapters 3 and 4. In Chapter 3, we demonstrated that the SV model provides more efficient and effective PIs compared with the GARCH model, while the experiment in Chapter 4 indicates that the GBM model provides more accurate predictions compared with the ARIMA model. Therefore, the combination of these two models potentially further improves model performance. The proposed travel time prediction framework in this chapter is an example of how to improve the overall performance of the travel time prediction models. In future research, we could explore more accurate models to estimate the mean part and more efficient and effective models in determining the residual part (construct high quality PIs). 148 Chapter 6: Conclusion and Recommendations 6.1 Summary Travel time prediction is a critical topic in the development of ITS systems. Especially with the rapid development of the Advanced Traveler Information Systems and Advanced Traffic Management Systems, more accurate and reliable travel time information is needed to enable the success of these systems’ development. Apart from its importance, travel time estimation and prediction are complex and challenging tasks. Resulting from the interactions among different vehicle-driver combinations, and exogenous factors such as weather, demand, and roadway conditions, travel time often experiences strong fluctuations across different periods and traffic conditions. These rapid fluctuations are often complex and difficult to predict. Fully understanding these fluctuations and developing accurate travel time prediction algorithms is critical. Inspired by the need of travel time predictions, a wide range of methodologies have been proposed in the literature. As discussed in Chapter 2, existing travel time prediction algorithms can be divided into four major categories: parametric, non- parametric, hybrid and prediction interval based approaches. The parametric methods usually have a well-established theoretical foundation but with lots of strict model assumptions. Comparatively, the non-parametric methods require less model assumptions but some of them may be difficult to interpret. The hybrid methods take advantage of different prediction models but some models may be too complex when 149 making predictions. The travel time interval based algorithms belongs to the category of hybrid methods and it provides not only the mean but also a prediction bound to capture both prediction accuracy and reliability of the model. As it is a relatively new area in travel time prediction, there are limited studies in the literature. In this research, both prediction accuracy and reliability issues have been addressed in freeway travel time prediction. Although most existing travel time prediction models are able to provide accurate predictions during non-peak hours, peak hour travel time prediction is still a challenging topic. Investigating travel time patterns during both non-peak and peak hours and developing a more accurate travel time prediction algorithm is critical. On the other hand, because of the difficulties in predicting travel time, especially during peak hours, another issue that needs to be considered is the reliability issues of the model. The model should consider situations when traffic is highly volatile when a point prediction becomes ‘less accurate’. In this case, the prediction interval based approach provides a prediction bound to indicate how likely it will capture the observed travel time value and therefore is able to indicate how reliable the prediction is. To capture the uncertainty and variations of travel time data, this study proposed two different statistical volatility models: component GARCH and stochastic volatility models. In general, the statistical volatility model predicts future traffic volatility based on its previous volatility values. In a transportation system, travelers respond differently to unexpected changes in travel time. The presence of this volatility in traffic may lead to changes in driving behavior in order to compensate for the resulting changes in expected arrival time. These changes lead to 150 increased traffic volatility with a decreasing rate over time in order to restore the past stability of the system. The volatility models capture this changed traffic patterns over time and make further prediction. The component GARCH models consider situations when seasonal (cyclical) patterns or trends exist in data. For some road segments, when commuters account for a large percentage of the total traffic volume, travel time of these types of segments may show strong cyclical patterns. In this case, the seasonal component should be considered when modeling the data. Through decomposition, the component GARCH models potentially improve the prediction accuracy. Another type of the volatility model, the stochastic volatility model considers the conditional variance of travel time data as an unobserved stochastic process therefore allows for a more flexible application and can account for uncertainties inherent in traffic phenomena. In term of prediction accuracy, Chapter 4 proposed the application of tree based ensemble methods in travel time prediction. The gradient boosted regression tree method was developed to model and make more accurate prediction of travel time. The basic idea of the gradient boosting method is to sequentially generate base learners from a weighted version of the training data to strategically find the optimal combination of trees. In contrast to other machine learning methods that have been treated as black-boxes, tree based ensemble methods provide interpretable results, while requiring little data preprocessing, are able to handle different types of predictor variables, and can fit complex nonlinear relationship. These properties make the tree based ensemble methods good candidates for solving travel time prediction problems. 151 To both consider the prediction accuracy and reliably, Chapter 5 proposed a new travel time prediction framework that combines the gradient boosting tree and statistical volatility model. The new proposed method is able to take advantage of these two models and provide better performance. 6.2 Conclusion The following list provides the conclusions/findings of this research: • Due to the complex nature of travel time prediction problem, the traditional point based prediction approach is unable to perfectly account for uncertainties in traffic. There is often a mismatch between the predicted mean and the observed value. A prediction interval based approach as an alternative way to represent uncertainties associated with travel time prediction has the potential to provide more reliable prediction information. • Volatility-based travel time prediction models relax the constant variation assumption. This kind of method treats the current volatility as a function of its past values and can be used to construct more accurate PIs to capture travel time uncertainty. • The component GARCH models are able to capture the seasonal patterns in travel time volatility. When seasonal (cyclical) patterns exist, the component GARCH model could be a better choice compared with the traditional GARCH models. • The stochastic volatility models consider part of the change in travel time volatility are due to random shocks, while the GARCH type model treat the volatility as time changing but not stochastic process. Through using 152 advanced Monte Carlo Markov Chain estimation method to fit the stochastic volatility model, the model is able to provide more accurate PIs. • The GBM model has considerable advantages in freeway travel time prediction. The capability of the GBM model in handling different types of input variables in modeling complex nonlinear relationship makes it a promising algorithm for travel time prediction. • The new proposed travel time prediction framework GBM-SV model improves both the model accuracy and reliability. The GBM-SV model also provides a framework for future development of travel time prediction models. • As traffic at different freeway segments may show different patterns or characteristics, it is necessary to study the traffic patterns in order to select the appropriate model to predict travel time. 6.3 Future Recommendation Although this research provided contributions to the existing literature in the area of freeway travel time prediction, there are some other research avenues that can be pursued. Future directions of the research are provided below: • In this research, we only utilize travel time information. But with the advanced technology development, more and more data are available to use, such as incident, weather, work zone and so on. Since these events may have significant influence on travel time, utilizing this information could help improve prediction accuracy. For example, traffic congestion is more likely to occur due to inclement weather conditions as the freeway capacity drops while demand does not drop. In this case, if we can include weather 153 conditions as explanatory variables, then the model would potentially capture the weather impact on travel time. As the GBM model is capable of handling different types of input information, it gives the advantage of utilizing different input information to further improve prediction accuracy. When using weather information to predict travel time, the weather forecast information will be used, therefore, how to utilize the weather forecast information and considering its reliability could be another research topic. To sum up, future research should use the information of external impact factors when predicting travel time. • Uncertainties associated with travel time prediction is relatively a new area in travel time prediction, few literature focuses on travel time uncertainty prediction. As indicated in the literature review section, there are generally two types of PIs based approach: ensemble methods and statistical volatility based approach. This research mainly focuses on using statistical volatility methods to model the uncertainty associated with travel time prediction. Two different types of volatility models have been proposed. The study results show that the PIs based approach shows its promising abilities in indicating the uncertainty associated with prediction. While in the future, we can also use the ensemble based algorithms to construct PIs when predicting travel time. It is also beneficial to have a comparison of the PIs constructed by volatility based and ensemble based method and discuss the advantages and disadvantages of each model in addressing uncertainty associated with travel time prediction. 154 • In this study, we only predict travel time for one segment. While, how to utilize the segment travel time information to derive dynamic path travel times can also be studied in the future. As for the purpose of proving travel time information through Advanced Traveler Information Systems, dynamic path information would help the traveler to find the minimum travel time path. By utilizing the method proposed in this study, we can obtain individual segment travel time information and therefore further derive the dynamic path travel time information based on individual segment travel time information. 155 References [1] J. Yeon, L. Elefteriadou, and S. Lawphongpanich, "Travel time estimation on a freeway using Discrete Time Markov Chains," Transportation Research Part B: Methodological, vol. 42, pp. 325-338, 2008. [2] T. Choe, A. Skabardonis, and P. Varaiya, "Freeway performance measurement system: operational analysis tool," Transportation Research Record: Journal of the Transportation Research Board, vol. 1811, pp. 67-75, 2002. [3] M. Yang, Y. Liu, and Z. You, "The reliability of travel time forecasting," Intelligent Transportation Systems, IEEE Transactions on, vol. 11, pp. 162- 171, 2010. [4] M. Yildirimoglu and N. Geroliminis, "Experienced travel time prediction for congested freeways," Transportation Research Part B: Methodological, vol. 53, pp. 45-63, 2013. [5] R. F. Engle and M. E. Sokalska, "Forecasting intraday volatility in the us equity market. multiplicative component garch," Journal of Financial Econometrics, vol. 10, pp. 54-83, 2012. [6] G. Kastner and S. Frühwirth-Schnatter, "Ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models," Computational Statistics & Data Analysis, 2013. [7] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001. 156 [8] E. I. Vlahogianni, J. C. Golias, and M. G. Karlaftis, "Short‐term traffic forecasting: Overview of objectives and methods," Transport reviews, vol. 24, pp. 533-557, 2004. [9] B. Van Arem, H. R. Kirby, M. J. Van Der Vlist, and J. C. Whittaker, "Recent advances and applications in the field of short-term traffic forecasting," International Journal of Forecasting, vol. 13, pp. 1-12, 1997. [10] S. Ishak and H. Al-Deek, "Performance evaluation of short-term time-series traffic prediction model," Journal of Transportation Engineering, vol. 128, pp. 490-498, 2002. [11] J. W. C. Van Lint and C. P. I. J. Van Hinsbergen, "Short-Term Traffic and Travel Time Prediction Models," Artificial Intelligence Applications to Critical Transportation Issues, p. 22, 2012. [12] K. Farokhi Sadabadi, M. Hamedi, and A. Haghani, "Evaluating moving average techniques in short-term travel time prediction using an AVI data set," in Transportation Research Board 89th Annual Meeting, 2010. [13] B. Smith and M. Demetsky, "Traffic Flow Forecasting: Comparison of Modeling Approaches," Journal of Transportation Engineering, vol. 123, pp. 261-266, 1997. [14] B. Williams, P. Durvasula, and D. Brown, "Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models," Transportation Research Record: Journal of the Transportation Research Board, vol. 1644, pp. 132- 141, 1998. 157 [15] V. Stephanedes, P. G. Michalopoulos, and R. A. Plum, "Improved estimation of traffic flow for Real-Time control (Discussion and closure)," Transportation Research Record, 1981. [16] D. Jeffery, K. Russam, and D. Robertson, "Electronic route guidance by AUTOGUIDE: the research background," Traffic engineering & control, vol. 28, pp. 525-529, 1987. [17] I. Kaysi, M. Ben-Akiva, and H. Koutsopoulos, Integrated approach to vehicle routing and congestion prediction for real-time driver guidance, 1993. [18] M. S. Ahmed and A. R. Cook, "Analysis of freeway traffic time-series data by using Box-Jenkins techniques," Transportation Research Record, 1979. [19] M. Levin and Y.-D. Tsao, "On Forecasting Freeway Occupancies and Volumes (Abridgment)," Transportation Research Record, 1980. [20] G. A. Davis, N. L. Nihan, M. M. Hamed, and L. N. Jacobson, "Adaptive forecasting of freeway traffic congestion," Transportation Research Record, 1990. [21] M. Hamed, H. Al-Masaeid, and Z. Said, "Short-Term Prediction of Traffic Volume in Urban Arterials," Journal of Transportation Engineering, vol. 121, pp. 249-254, 1995. [22] Y. Kamarianakis and P. Prastacos, "Space–time modeling of traffic flow," Computers & Geosciences, vol. 31, pp. 119-133, 2005. [23] B. M. Williams, P. K. Durvasula, and D. E. Brown, "Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models," Transportation Research 158 Record: Journal of the Transportation Research Board, vol. 1644, pp. 132- 141, 1998. [24] M. Cetin and G. Comert, "Short-term traffic flow prediction with regime switching models," Transportation Research Record: Journal of the Transportation Research Board, vol. 1965, pp. 23-31, 2006. [25] W. Min and L. Wynter, "Real-time road traffic prediction with spatio- temporal correlations," Transportation Research Part C: Emerging Technologies, vol. 19, pp. 606-616, 2011. [26] M. G. Karlaftis and E. I. Vlahogianni, "Memory properties and fractional integration in transportation time-series," Transportation Research Part C: Emerging Technologies, vol. 17, pp. 444-453, 2009. [27] A. Stathopoulos and M. G. Karlaftis, "A multivariate state space approach for urban traffic flow modeling and prediction," Transportation Research Part C: Emerging Technologies, vol. 11, pp. 121-135, 2003. [28] B. Ghosh, B. Basu, and M. O'Mahony, "Multivariate Short-Term Traffic Flow Forecasting Using Time-Series Analysis," Intelligent Transportation Systems, IEEE Transactions on, vol. 10, pp. 246-254, 2009. [29] J. Whittaker, S. Garside, and K. Lindveld, "Tracking and predicting a network traffic process," International Journal of Forecasting, vol. 13, pp. 51-61, 1997. [30] I. Okutani and Y. J. Stephanedes, "Dynamic prediction of traffic volume through Kalman filtering theory," Transportation Research Part B: Methodological, vol. 18, pp. 1-11, 1984. 159 [31] S. Chien and C. Kuchipudi, "Dynamic Travel Time Prediction with Real-Time and Historic Data," Journal of Transportation Engineering, vol. 129, pp. 608- 616, 2003. [32] C. Nanthawichit, T. Nakatsuji, and H. Suzuki, "Application of probe-vehicle data for real-time traffic-state estimation and short-term travel-time prediction on a freeway," Transportation Research Record: Journal of the Transportation Research Board, vol. 1855, pp. 49-59, 2003. [33] J. W. C. van Lint, "Online learning solutions for freeway travel time prediction," Ieee Transactions on Intelligent Transportation Systems, vol. 9, pp. 38-47, Mar 2008. [34] Y. Wang, M. Papageorgiou, and A. Messmer, "Real-time freeway traffic state estimation based on extended Kalman filter: Adaptive capabilities and real data testing," Transportation Research Part A: Policy and Practice, vol. 42, pp. 1340-1358, 2008. [35] F. Yang, Z. Yin, H. X. Liu, and B. Ran, "Online recursive algorithm for short- term traffic prediction," Transportation Research Record: Journal of the Transportation Research Board, vol. 1879, pp. 1-8, 2004. [36] B. L. Smith, B. M. Williams, and R. Keith Oswald, "Comparison of parametric and nonparametric models for traffic flow forecasting," Transportation Research Part C: Emerging Technologies, vol. 10, pp. 303- 321, 2002. 160 [37] B. Smith and M. Demetsky, "Multiple-Interval Freeway Traffic Flow Forecasting," Transportation Research Record: Journal of the Transportation Research Board, vol. 1554, pp. 136-141, 1996. [38] S. Clark, "Traffic Prediction Using Multivariate Nonparametric Regression," Journal of Transportation Engineering, vol. 129, pp. 161-168, 2003. [39] G. Davis and N. Nihan, "Nonparametric Regression and Short‐Term Freeway Traffic Forecasting," Journal of Transportation Engineering, vol. 117, pp. 178-188, 1991. [40] S. Robinson and J. Polak, "Modeling Urban Link Travel Time with Inductive Loop Detector Data by Using the k-NN Method," Transportation Research Record: Journal of the Transportation Research Board, vol. 1935, pp. 47-56, 2005. [41] J. Myung, D. K. Kim, S. Y. Kho, and C. H. Park, "Travel Time Prediction Using k Nearest Neighbor Method with Combined Data from Vehicle Detector System and Automatic Toll Collection System," Transportation Research Record, pp. 51-59, 2011. [42] N. Zou, J. Wang, G.-L. Chang, and J. Paracha, "Application of Advanced Traffic Information Systems," Transportation Research Record: Journal of the Transportation Research Board, vol. 2129, pp. 62-72, 2009. [43] J. W. van Lint, S. Hoogendoorn, and H. J. van Zuylen, "Freeway travel time prediction with state-space neural networks: Modeling state-space dynamics with recurrent neural networks," Transportation Research Record: Journal of the Transportation Research Board, vol. 1811, pp. 30-39, 2002. 161 [44] H. Yin, S. Wong, J. Xu, and C. Wong, "Urban traffic flow prediction using a fuzzy-neural approach," Transportation Research Part C: Emerging Technologies, vol. 10, pp. 85-98, 2002. [45] S. Ishak, P. Kotha, and C. Alecsandru, "Optimization of dynamic neural network performance for short-term traffic prediction," Transportation Research Record: Journal of the Transportation Research Board, vol. 1836, pp. 45-56, 2003. [46] S. Ishak and C. Alecsandru, "Optimizing traffic prediction performance of neural networks under various topological, input, and traffic condition settings," Journal of Transportation Engineering, vol. 130, pp. 452-465, 2004. [47] X. Jiang and H. Adeli, "Dynamic wavelet neural network model for traffic flow forecasting," Journal of Transportation Engineering, vol. 131, pp. 771- 779, 2005. [48] J. Van Lint, S. Hoogendoorn, and H. J. van Zuylen, "Accurate freeway travel time prediction with state-space neural networks under missing data," Transportation Research Part C: Emerging Technologies, vol. 13, pp. 347- 369, 2005. [49] C. Quek, M. Pasquier, and B. B. S. Lim, "POP-TRAFFIC: A novel fuzzy neural approach to road traffic analysis and prediction," Intelligent Transportation Systems, IEEE Transactions on, vol. 7, pp. 133-146, 2006. [50] W. Zheng, D.-H. Lee, and Q. Shi, "Short-term freeway traffic flow prediction: Bayesian combined neural network approach," Journal of Transportation Engineering, vol. 132, pp. 114-121, 2006. 162 [51] X. Zeng and Y. Zhang, "Development of Recurrent Neural Network Considering Temporal‐Spatial Input Dynamics for Freeway Travel Time Modeling," Computer‐Aided Civil and Infrastructure Engineering, 2013. [52] S. Sun, C. Zhang, and G. Yu, "A Bayesian network approach to traffic flow forecasting," Intelligent Transportation Systems, IEEE Transactions on, vol. 7, pp. 124-132, 2006. [53] W.-C. Hong, "Traffic flow forecasting by seasonal SVR with chaotic simulated annealing algorithm," Neurocomputing, vol. 74, pp. 2096-2107, 2011. [54] M. Danech-Pajouh and M. Aron, "ATHENA: a method for short-term inter- urban motorway traffic forecasting," Recherche Transports Sécurité, 1991. [55] C. Antoniou, H. N. Koutsopoulos, and G. Yannis, "Dynamic data-driven local traffic state estimation and prediction," Transportation Research Part C: Emerging Technologies, vol. 34, pp. 89-107, 2013. [56] M. Van Der Voort, M. Dougherty, and S. Watson, "Combining Kohonen maps with ARIMA time series models to forecast traffic flow," Transportation Research Part C: Emerging Technologies, vol. 4, pp. 307-318, 1996. [57] H. Chen, S. Grant-Muller, L. Mussone, and F. Montgomery, "A Study of Hybrid Neural Network Approaches and the Effects of Missing Data on Traffic Forecasting," Neural computing & applications, vol. 10, pp. 277-286, 2001/12/01 2001. 163 [58] B. Yu, Z. Z. Yang, and K. Chen, "Hybrid model for prediction of bus arrival times at next station," Journal of Advanced Transportation, vol. 44, pp. 193- 204, Jul 2010. [59] A. Stathopoulos, L. Dimitriou, and T. Tsekeris, "Fuzzy modeling approach for combined forecasting of urban traffic flow," Computer‐Aided Civil and Infrastructure Engineering, vol. 23, pp. 521-535, 2008. [60] H. Liu, H. van Zuylen, H. van Lint, and M. Salomons, "Predicting urban arterial travel time with state-space neural networks and Kalman filters," Transportation Research Record: Journal of the Transportation Research Board, vol. 1968, pp. 99-108, 2006. [61] D. Boto-Giralda, F. J. Díaz-Pernas, D. González-Ortega, J. F. Díez-Higuera, M. Antón-Rodríguez, M. Martínez-Zarzuela, and I. Torre-Díez, "Wavelet- Based Denoising for Traffic Volume Time Series Forecasting with Self- Organizing Neural Networks," Computer-Aided Civil and Infrastructure Engineering, vol. 25, pp. 530-545, 2010. [62] Y. Peng, M. Lei, J.-B. Li, and X.-Y. Peng, "A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting," Neural Computing and Applications, pp. 1-8, 2012/12/01 2012. [63] K. Hamad, M. T. Shourijeh, E. Lee, and A. Faghri, "Near-Term Travel Speed Prediction Utilizing Hilbert-Huang Transform," Computer-Aided Civil and Infrastructure Engineering, vol. 24, pp. 551-576, 2009. 164 [64] H. K. Chen and C. J. Wu, "Travel Time Prediction Using Empirical Mode Decomposition and Gray Theory Example of National Central University Bus in Taiwan," Transportation Research Record, pp. 11-19, 2012. [65] J.-L. Deng, "Introduction to grey system theory," The Journal of grey system, vol. 1, pp. 1-24, 1989. [66] Y. Wei and M.-C. Chen, "Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks," Transportation Research Part C: Emerging Technologies, vol. 21, pp. 148-162, 2012. [67] X. Jiang and H. Adeli, "Wavelet Packet-Autocorrelation Function Method for Traffic Flow Pattern Analysis," Computer-Aided Civil and Infrastructure Engineering, vol. 19, pp. 324-337, 2004. [68] Y. Xie, Y. Zhang, and Z. Ye, "Short‐Term Traffic Volume Forecasting Using Kalman Filter with Discrete Wavelet Decomposition," Computer‐ Aided Civil and Infrastructure Engineering, vol. 22, pp. 326-334, 2007. [69] J. Wang and Q. Shi, "Short-term traffic speed forecasting hybrid model based on Chaos–Wavelet Analysis-Support Vector Machine theory," Transportation Research Part C: Emerging Technologies, 2012. [70] G. Leshem and Y. a. Ritov, "Traffic Flow Prediction using Adaboost Algorithm with Random Forests as a Weak Learner," International Journal of Intelligent Technology, vol. 2, 2007. [71] B. Hamner, "Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow," in Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, 2010, pp. 1357-1359. 165 [72] Y. Wang, "Prediction of weather impacted airport capacity using ensemble learning," in Digital Avionics Systems Conference (DASC), 2011 IEEE/AIAA 30th, 2011, pp. 2D6-1-2D6-11. [73] M. M. Ahmed and M. Abdel-Aty, "Application of Stochastic Gradient Boosting Technique to Enhance Reliability of Real-Time Risk Assessment," Transportation Research Record: Journal of the Transportation Research Board, vol. 2386, pp. 26-34, 2013. [74] Y.-S. Chung, "Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees," Accident Analysis & Prevention, vol. 61, pp. 107-118, 2013. [75] C. P. I. van Hinsbergen, J. W. van Lint, and H. Van Zuylen, "Bayesian training and committees of state-space neural networks for online travel time prediction," Transportation Research Record: Journal of the Transportation Research Board, vol. 2105, pp. 118-126, 2009. [76] C. Van Hinsbergen, J. Van Lint, and H. Van Zuylen, "Bayesian committee of neural networks to predict travel times with confidence intervals," Transportation Research Part C: Emerging Technologies, vol. 17, pp. 498- 509, 2009. [77] Y. Zhang and Y. C. Liu, "Analysis of peak and non-peak traffic forecasts using combined models," Journal of Advanced Transportation, vol. 45, pp. 21-37, Jan 2011. [78] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, "Spatio‐Temporal Short‐Term Urban Traffic Volume Forecasting Using Genetically Optimized 166 Modular Networks," Computer‐Aided Civil and Infrastructure Engineering, vol. 22, pp. 317-325, 2007. [79] A. Khosravi, E. Mazloumi, S. Nahavandi, D. Creighton, and J. Van Lint, "Prediction intervals to account for uncertainties in travel time prediction," Intelligent Transportation Systems, IEEE Transactions on, vol. 12, pp. 537- 547, 2011. [80] A. Khosravi, E. Mazloumi, S. Nahavandi, D. Creighton, and J. Van Lint, "A genetic algorithm-based method for improving quality of travel time prediction intervals," Transportation Research Part C: Emerging Technologies, vol. 19, pp. 1364-1376, 2011. [81] J. Van Lint, "Reliable real-time framework for short-term freeway travel time prediction," Journal of Transportation Engineering, vol. 132, pp. 921-932, 2006. [82] X. Fei, C.-C. Lu, and K. Liu, "A bayesian dynamic linear model approach for real-time short-term freeway travel time prediction," Transportation Research Part C: Emerging Technologies, vol. 19, pp. 1306-1318, 2011. [83] R. Li and G. Rose, "Incorporating uncertainty into short-term travel time predictions," Transportation Research Part C: Emerging Technologies, vol. 19, pp. 1006-1018, 2011. [84] R. F. Engle, "Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation," Econometrica: Journal of the Econometric Society, pp. 987-1007, 1982. 167 [85] T. Bollerslev, "Generalized autoregressive conditional heteroskedasticity," Journal of Econometrics, vol. 31, pp. 307-327, 1986. [86] C. Chen, J. Hu, Q. Meng, and Y. Zhang, "Short-time traffic flow prediction with ARIMA-GARCH model," in Intelligent Vehicles Symposium (IV), 2011 IEEE, 2011, pp. 607-612. [87] Y. Kamarianakis, A. Kanas, and P. Prastacos, "Modeling traffic volatility dynamics in an urban network," Transportation Research Record: Journal of the Transportation Research Board, vol. 1923, pp. 18-27, 2005. [88] Y. Zhang, R. Sun, A. Haghani, and X. Zeng, "Univariate Volatility-Based Models for Improving Quality of Travel Time Reliability Forecasting," in Transportation Research Board 92nd Annual Meeting, 2013. [89] T. Tsekeris and A. Stathopoulos, "Real-time traffic volatility forecasting in urban arterial networks," Transportation Research Record: Journal of the Transportation Research Board, vol. 1964, pp. 146-156, 2006. [90] T. Tsekeris and A. Stathopoulos, "Short-term prediction of urban traffic variability: Stochastic volatility modeling approach," Journal of Transportation Engineering, vol. 136, pp. 606-613, 2009. [91] J. Xia, Q. Nie, W. Huang, and Z. Qian, "Reliable Short-Term Traffic Flow Forecasting for Urban Roads Using a Multivariate GARCH Model," in Transportation Research Board 92nd Annual Meeting, 2013. [92] J. Guo and B. M. Williams, "Real-Time Short-Term Traffic Speed Level Forecasting and Uncertainty Quantification Using Layered Kalman Filters," 168 Transportation Research Record: Journal of the Transportation Research Board, vol. 2175, pp. 28-37, 2010. [93] D. L. Shrestha and D. P. Solomatine, "Machine learning approaches for estimation of prediction interval for the model output," Neural Networks, vol. 19, pp. 225-235, 2006. [94] R. H. Shumway and D. S. Stoffer, Time series analysis and its applications : with R examples, 2nd [updated] ed. New York: Springer, 2006. [95] R. J. Hyndman and Y. Khandakar, "Automatic time series for forecasting: the forecast package for R," 2007. [96] R. S. Tsay, Analysis of financial time series, 3rd ed. Cambridge, Mass.: Wiley, 2010. [97] T. Bollerslev, R. Y. Chou, and K. F. Kroner, "ARCH modeling in finance: a review of the theory and empirical evidence," Journal of Econometrics, vol. 52, pp. 5-59, 1992. [98] T. G. Andersen and T. Bollerslev, "Intraday periodicity and volatility persistence in financial markets," Journal of empirical finance, vol. 4, pp. 115-158, 1997. [99] G. Lee and R. Engle, "A Permanent and Transitory Component Model of Stock Return Volatility," Available at SSRN 5848, 1993. [100] S. Kim, N. Shephard, and S. Chib, "Stochastic volatility: likelihood inference and comparison with ARCH models," The Review of Economic Studies, vol. 65, pp. 361-393, 1998. 169 [101] Y. Yu and X.-L. Meng, "To center or not to center: That is not the question— an Ancillarity–Sufficiency Interweaving Strategy (ASIS) for boosting MCMC efficiency," Journal of Computational and Graphical Statistics, vol. 20, pp. 531-570, 2011. [102] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis: CRC press, 2003. [103] A. Ghalanos, "rugarch: Univariate GARCH models," in R package version 1.2-7 ed, 2013. [104] A. Haghani, M. Hamedi, K. F. Sadabadi, S. Young, and P. Tarnoff, "Data collection of freeway travel time ground truth with bluetooth sensors," Transportation Research Record: Journal of the Transportation Research Board, vol. 2160, pp. 60-68, 2010. [105] Z.-H. Zhou, Ensemble methods : foundations and algorithms. Boca Raton, FL: Taylor & Francis, 2012. [106] Y. Koren, "The bellkor solution to the netflix grand prize," Netflix prize documentation, 2009. [107] J. Elith, J. R. Leathwick, and T. Hastie, "A working guide to boosted regression trees," Journal of Animal Ecology, vol. 77, pp. 802-813, 2008. [108] L. Breiman, "Bagging predictors," Machine learning, vol. 24, pp. 123-140, 1996. [109] R. E. Schapire, "The strength of weak learnability," Machine learning, vol. 5, pp. 197-227, 1990. 170 [110] M. Kearns, "Thoughts on hypothesis boosting," Unpublished manuscript, December, 1988. [111] T. K. Ho, "The random subspace method for constructing decision forests," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, pp. 832-844, 1998. [112] T. K. Ho, "Random decision forests," in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, 1995, pp. 278- 282. [113] Y. Amit and D. Geman, "Shape quantization and recognition with randomized trees," Neural computation, vol. 9, pp. 1545-1588, 1997. [114] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The elements of statistical learning vol. 2: Springer, 2009. [115] J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of Statistics, pp. 1189-1232, 2001. [116] J. H. Friedman, "Stochastic gradient boosting," Computational Statistics & Data Analysis, vol. 38, pp. 367-378, 2002. [117] T. C. f. A. T. T. Laboratory. Available: http://www.cattlab.umd.edu/?portfolio=ritis [118] L. Breiman, Classification and regression trees. Belmont, Calif.: Wadsworth International Group, 1984.